Optical character recognition (OCR) is the translation of handwritten, typewritten or printed text images into machine-editable text. OCR is a field of research in pattern recognition, artificial intelligence and machine vision. In manufacturing applications, text on documents or products often needs to be "read" by machines to sort the item a certain way, or perform some task based on the identifying letters and numbers. The OCR component of an identification or inspection system is an important element in boosting overall quality.
The United States Postal Service has been using OCR machines to sort mail since 1965. But, aside from document imaging applications, OCR software also is widely used in industries such as semiconductor, automotive and pharmaceutical. Common examples of these applications are wafer identification, PCB inspection, bottle identification and tracking, and bank bill inspection.
Document Data Solutions (DDS, Milford, CT) uses OCR to develop inspection and verification solutions that it sells to mail houses, as well as large utility, insurance, medical and payroll/benefits organizations that facilitate their own large mailings in-house. The DDS systems take video images of each document as it is being processed. This ensures that DDS' clients' mailings include the correct pieces, printed on correct forms, with no missing pages or documents, and are sent to the right addressee. Their systems work by scanning 2-D matrix barcodes and performing OCR on address labels or mail pieces for mass mailing applications.
Regardless of the application type in which OCR is to be used, software developers generally evaluate OCR tools based on a fundamental list of criteria.
The evaluation of an OCR tool should take four main criteria into consideration. First, a flexible OCR tool will help operators select a suitable configuration with respect to the requirements of their application. Second, a major challenge inherent to machine vision is obtaining a good and stable quality image. An OCR tool also is expected to be reliable in order to adapt itself to a variety of imaging conditions. Third, an easy-to-use OCR tool will help design, implement and configure the application quickly and intuitively. Finally, an efficient tool will provide a steady and deterministic identification rate, and consequently increased productivity.
What comes to mind at first is the flexibility of defining a character font. Whether the required font is one of the industry standards (such as SEMI, OCR-A or OCR-B) or user defined, whether it is solid, dotted or discontinuous (like Asian characters), whether it is slanted (italicized) or not, an OCR tool should be able to deal with all these kinds of font attributes at training time.
A second interesting--and often inevitable--criterion consists in the possibility of making a selection among a series of specifically designed algorithms. On the one hand, algorithms based on segmentation technology extract potential characters (objects) from the image and then compare them against the font database. Segmentation-based tools are likely to provide an aggressive execution time as they use a limited number of pre-segmented characters. However, they have proven to be less reliable in cases of high noise and touching characters.
On the other hand, algorithms based on pattern matching technology perform a full search (in the image) for each character present in the font database. Pattern-matching-based tools are very robust. They are appropriate, even mandatory, in conditions of high degradations--in other words when it becomes extremely difficult to separate characters from their background. Despite their insensitivity to noise this category of tools is recognized to be less efficient when the font database contains a large number of characters. Another interesting option is the possibility of assigning each individual character position in the string to a restricted character set. For example, knowing that the string to be read always begins with a letter, the tool can be configured so that the first character must be part of the "alphabetic only" character set. Such a feature can help avoid possible mismatches between similar characters like "8" and "B," "1" and "I," or "0" and "O." Finally, it is greatly desired that the OCR tool be able to read multiple strings in a single execution regardless of the geometrical relations between those strings. It is normally expected that both aligned and non-aligned strings be separated and output as individual entities.
In spite of the significant evolution of machine vision technologies in recent years, finding an OCR tool that is able to deal with the great majority of scenarios may still be considered a formidable task. First of all, OCR is often used on images acquired in a harsh, industrial environment where imaging conditions are not easily controlled. For instance, printing or etching quality is probably the most important factor that influences the quality of the characters presented as an input to the OCR tool. Scratches, spots, cracks, gaps, as well as variations in the background are common examples of defects that reduce the chances for a successful reading.
Other aspects such as low resolution (often due to a large field of view containing small characters shrunk to a few pixels), fuzziness or blurring (sometimes caused by motion), poor contrast (due to inadequate lighting), as well as noise are good examples of constraints demanding a high level of robustness from the OCR tool to ensure a good quality reading.
Aside from image degradations, another class of constraints that also affects reliability is geometry change. Most of the time characters in field of view appear geometrically different from those trained in the font. For example a string may be scanned at a certain angle (caused by a rotation of the object of interest), at a reduced or increased size (due to how close the camera is from the object) and so on.
In summary, a reliable OCR tool should be as invariant as possible to all of the above application constraints so that the environment (choice of camera, lighting, printing technique) does not need to be adapted for the sole purpose of OCR success.
EASE OF USE
In addition to versatility and robustness is the need for Rapid Application Development (RAD), in which ease of use plays a fundamental role. An OCR tool presenting an easy-to-learn application programming interface (API) is likely to facilitate and speed-up integration with all other components of the application. A well-structured and modular API (organized as a hierarchy of classes) may significantly help pick the piece of functionality that will satisfy the application requirements. It also is important to mention that an interface that provides support for multiple languages such as C/ C++, C# and VB may help avoid the pain of inter-language interfacing.
Although one may choose the simplest API, configuring an OCR application comes down to tweaking a rather complex underlying algorithm. Without the presence of a well-defined parameterization the configuration might turn to a tedious task. The best parameter set for any kind of image analysis algorithm such as OCR is composed of a short list of intuitive parameters, with most of them based on values internally computed from the input image. So in order to achieve a good tradeoff between flexibility and ease of use, each parameter should be individually configurable in either manual or automatic mode.
In a world of high productivity, machine vision systems equipped with OCR-based identification software are expected to provide the required speed, but also a deterministic response. For instance, in situations where the input image is degraded to a point that a failure is simply inevitable, the OCR reading process must abort in a predetermined time (timeout) so that subsequent frames are not sacrificed.
Some OCR tools also present one or several parameters from which the operator can adjust a tradeoff between speed and robustness. For example, in the case of a well controlled acquisition environment, where the characters are clear of any artifact, one may want to boost the speed to a maximum by taking the risk--a somewhat low risk in this case--of losing a little reliability. The opposite also is true in scenarios where good image conditions are difficult to obtain but where speed is not an issue.
As part of the design phase of any machine vision system requiring an image analysis algorithm such as OCR, there will be a minimum amount of time to spend on performing a good comparison and evaluation of the OCR tools currently available on the market. Therefore, basing the evaluation on the right criteria is probably the best guide to put operators on the right track.
* A flexible OCR toot will help operators select a suitable configuration with respect to the requirements of their application.
* An OCR toot also is expected to be reliable in order to adapt itself to a variety of imaging conditions.
* Ideally, an OCR tool will help design, implement and configure the application quickly and intuitively.
* An efficient tool will provide a steady and deterministic identification rate, and consequently increased productivity.
Bruno Menard is senior software engineer at Dalsa (Waterloo, Ontario) specializing in imago processing and analysis algorithms. For more information, e mail email@example.com, call (514) 333 1301 or visit www.dalsacom.