First for mac news, reviews and know-how
SEARCH FOR:   Advanced Search
Guest  Level 00    Register Log in

Product Reviews

Utilities
OmniPage 7.0  [MacUser]
COMPANY: Caere Corporation PRICE: £395  
RATING: ISSUE: 13 5  DATE: Mar 97
   

Caere Corporation's market-leading optical character recognition software, OmniPage Pro, has just been released in version 7.0, sporting a new interface which greatly aids in the management of the OCR process, plus new tools for the handling of common OCR bugbears such as custom formatting and special characters.

The starting point of this recognition process is the image file - the bitmapped representation of the printed document. There are usually two sources for this: electronic faxes and desktop-scanned documents. Scanned documents can be acquired using normal scanning software or by scanning from within OmniPage. The latter is usually best, since it provides options such as Auto-Brightness and 3D-OCR (for grey-scale scanners only) that can enhance the recognition of difficult originals (news print or text on coloured backgrounds, for example). OmniPage provides integrated drivers for most of the popular makes of scanner, or alternatively, the Manual Brightness command can be selected to allow the user to fine-tune the scanning process.

Once the image files have been acquired (in either TIFF or PICT format), they are brought into OmniPage via the Add/Load dialog box, which allows multiple image files to be loaded at once. Once in the program, the new interface displays the images in a scrolling thumbnail view. From here pages can be re-ordered, deleted or selected for recognition.

A floating Auto-OCR toolbar provides easy access to the main functions. It consists of four main buttons for loading, zoning, OCR and exporting, and each of these has its precise function determined by an associated drop-down menu, removing the need to make a lot of trips to the menu bar. Also new are the tool palette and the Zone Info palette. The zone is the basic building block for managing the OCR process. Clicking on a thumbnail loads that image into the image window, and then clicking on the zone button in the toolbar defines each page's blocks of text and graphics as one of four zone types - Automatic, Text, Graphics and Ignore.

If the recognition engine knows a particular zone on the page contains only text or graphics, this greatly accelerates the recognition process. The default setting for zoning is Automatic, where OmniPage decides which part of the image is text and which is image.

Generally, the feature is intelligent enough to make the distinction 95% of the time, although text on coloured backgrounds, such as column headers and boxouts, is often mistaken for graphics. This can be remedied using OmniPage's own 3D-OCR scanning option, or by manually re-assigning the zones after auto zoning. The manual zoning option allows you to specify precisely which areas are which, but usually you'll just need to use the auto zoning feature and then tidy up afterwards.

The next stage is to click on the OCR button. OmniPage analyses the image file, giving feedback on its progress by turning the zone blocks from light to dark grey to black. All the while it displays sections of the characters it's working on, so you can judge image quality. The converted image then appears as editable text
 
 
ADVERTISEMENT
in the text window.

Suspect words are highlighted in green and rejected characters are replaced by red tildes ('~'). You then use the built-in spell checker to correct characters which haven't been recognised properly. OmniPage helpfully displays the original image text along with any suspect word, enabling you to cross-check. The text can then be exported to a wide variety of word processor and generic text formats, including Mac Write II and Pro, Microsoft Word 5.0 and 6.0, RTF 1.0 and 2.0, WordPerfect 2.0 to 3.5, and WriteNow 3.0. Other formats supported are Excel 3.0 and 4.0, FrameMaker 5.0 and 6.0, and MIF. It will even have a stab at translating the converted page into HTML.

OmniPage 7.0 earns its 'Pro' status thanks to its tools for managing the contents of zones, which enable the implementation of its format and font retention, its support for multiple-language recognition, and its automation. At the first level, zones can be associated with particular styles. These are similar to the style sheets you'd find on a word processor, controlling the indentation, margins and indentation.

Zone styles can then be built up into style sets, which can determine the format of an entire page. This is very useful if you regularly receive reports or forms with the same formatting. A particular style set can be applied for the zoning process, automating the entire recognition operation. But for one-offs or short run jobs in a particular format, this can be overkill, and you're better off recognising the raw text and reformatting it in your favourite word processor or DTP program.

OmniPage also comes with built-in style sets which can't be altered or deleted. The two most useful are True Page, which makes a good job of retaining all font, formatting and layout information, and Plain Format, which strips out text with no text or page formatting. Font Mapping can be performed on all style sets (even the built-in ones) via the Settings menu. Basically, OmniPage recognises four different styles of font - proportional serif, non-serif, mono-spaced serif and non-serif. These can be mapped to particular fonts to reflect in-house styles. The OCR process can also be trained to recognise uncommon characters such as '(R)', or the 'ff' and 'fl' ligatures.

Automation is a strong feature of version 7.0, and gives the ability to defer OCR until a specific time. This enables the batch scanning of multiple-page documents (up to 256 pages from an auto-sheet feeder or 256 OmniPage files) to be delayed until a certain time (after work, for instance). It's also possible to set up an automated input/output system. If you regularly receive electronic documents such as faxes, you can nominate an input folder which OmniPage will check every 30 seconds. Recognised files are then placed in a nominated output folder. OmniPage supports AppleScript, which will further fine-tune the process.

Another automation feature allows OmniPage to be used as an adjunct to a word processor. A direct input item is installed in the Apple menu and made available to any program that supports cut and paste. This launches OmniPage, scans and recognises a text document, and pastes the scanned text at the insertion point of the word processor cursor. It's even possible to mail recognised text via Power Talk.

OmniPage 7.0 is fast, slick and mature, and with its new, well-thought-out interface, it makes the process of managing and optimising OCR a breeze. At £395, it isn't cheap (and just misses a four-mice rating because of this), and you'll have to decide if the extra functionality is worth the extra price (and the learning time) over the cut-down version which is bundled with certain scanners.

By Tim Danaher


Related Reviews