A common efficiency technique is for accounting departments to digitize paper documents with a scanner, allowing the documents to be accessible through the corporate computer system. However, it is more difficult to go beyond storage of the scanned image and also extract the data stored on the scanned images.
Most scanning systems come with built-in optical character recognition (OCR) software that extracts data from digital images. Unfortunately, some documents contain handwritten text or are damaged, resulting in incorrect OCR interpretation, sometimes with laughably inaccurate results. An alternative approach that spots these errors is a technological advancement on the old technique of having two data entry operators enter the same information, and checking each other’s work for mistakes. In this case, two different OCR systems can be used to interpret the same image, with differences between the two interpretations being flagged for operator intervention.
In high-volume environments where manual intervention is not possible, some multiple-OCR systems will instead assign an accuracy value to an interpreted character, with the system storing the interpreted value having a higher score. Thus, the OCR engines can vote for which interpretation is correct. This does not mean that the resulting interpretation will actually be correct, but at least that the one with the highest probability of accuracy was retained.
Please keep in mind that multiple OCR engines comprise only one aspect of high-accuracy data interpretation. Other issues include the quality of the scanner, smudges and background color on documents, and scan resolution levels.
An example of a company that produces scanning systems with multiple OCR engines is Datacap (www.datacap.com).
