Verdict:
Contains features galore, and its improved accuracy takes the OCR crown back from ABBYY FineReader 6. An impressive and significant addition to the OmniPage ancestral line.
Despite all of ScanSoft's recent acquisitions, OmniPage Pro remains its flagship OCR package, with TextBridge (see opposite) as the low-cost offering. Recent new releases of OmniPage haven't been terribly exciting, but OmniPage Pro 12 is a different story altogether.
It's undoubtedly a major update and has earned itself the suffix of Office. Notable changes include improved PDF conversion - one of the few notable new features offered by OmniPage Pro 11 - enhanced accuracy and improved support for network use. There are also some better enterprise features, as well as refined proofing and added automation for unattended document processing.
ScanSoft says it has worked hard to deliver the enhancements because of the increasing use of PDF documents and multifunction devices. OmniPage Pro 12 can also leverage the Internet with its new automatic processing capabilities.
Don't expect greatly enhanced all-round accuracy, though. As OCR technology has matured, we've come to expect less dramatic improvements with new versions. A claimed 50 per cent improvement in accuracy might sound a lot, but if accuracy is already at 99 per cent - given a particular level of source document image quality - this amounts to less than 1 percentage point improvement in total.
The target for OCR development has been mainly focused on translating poor-quality originals. OmniPage Pro 12 boasts three OCR engine technologies, originating from ScanSoft acquisitions Caere, Calera and Xerox TextBridge, from which ScanSoft itself originates. A unique system of engine 'voting' optimises the OCR process, enabling the most effective engine to be used on parts of a given document.
There are also signs that ScanSoft's other recent acquisition - Lernout & Hauspie's speech-recognition technology - is influencing OmniPage's development. Not only does Pro 12 use dictionaries, including sector-specific ones that address legal and other vocabularies, but it also employs a degree of context checking. On top of that, the remarkably human-sounding, Lernout & Hauspie-developed RealSpeak Text-to-Speech voice synthesis package is bundled with OmniPage Pro 12.
Getting the words right is one thing, but ScanSoft has made some important steps to make better use of them when laying out the recognised page. Our complex layout test page showed a marked improvement in text accuracy and formatting compared to OmniPage Pro 11. Far less text was incorrectly identified as part of a graphic, and multiple columns are now linked. So if you decide to edit the document afterwards, the text flows naturally from one column to another.
ScanSoft says, with some conviction, that this is the difference between looking right and working right. OmniPage Pro 12 is also good at retaining the font size and style, as well as both text and graphic object positioning. The only thing it wouldn't attempt to do on our test page was reproduce the vertical and horizontal lines that separate subheadings, columns and boxed sections.
Incidentally, there's also a new book-processing feature that enables opposing pages to be scanned and processed at the same time.
Until we'd seen OmniPage Pro 12, ABBYY FineReader 6 Professional (see Reviews, issue 96, p138) was the clear leader in overall OCR performance. However, OmniPage Pro 12 produces a more coherent result from difficult originals, and FineReader 6 also lacks the convenience
ADVERTISEMENT
of providing text flowing from column to column.
It's interesting to see that the areas FineReader 6 had difficulty in reading correctly were often the areas in which OmniPage Pro 12 excelled, and vice versa. But on balance, OmniPage Pro 12 is the better performer in terms of accuracy and layout retention. It also implements multithreading to enable the simultaneous scanning of new pages and the recognition of previous ones. This is handy for large-volume batches, as it isn't particularly quick at processing pages during the recognition stage.
Scheduled batch processing of documents isn't new to OmniPage Pro 12, but this has been extended to include Wizard-configured automatic processing. You can, for example, configure OmniPage Pro 12 to automatically process a document image once it arrives in a nominated folder and deliver the resulting editable document to a specified destination. It could be in a different folder, or even delivered by email to a nominated person.
We were surprised to see PDF conversion in OmniPage Pro 11, although it seems so obvious now that we realise its benefits. However, Pro 11's PDF-conversion capabilities were only the start. Development in this area now means that OmniPage Pro 12 can process both bitmap-based PDFs and searchable text PDFs. For example, now you're able to convert a bitmap PDF page into editable text and then republish that document as a searchable PDF. It's even possible to retain the original document's bitmap representation and overlay the searchable text. The result looks like you've magically been given the ability to search text inside a graphic.
Apart from word-processing format, HTML and PDF, OmniPage Pro 12 provides export to XML-format documents and Open eBook formats. The latter includes specific support for Microsoft Pocket PC and Tablet PC platforms.
It's old news that good-quality source documents can be recognised with near enough perfect accuracy. However, more complex documents, and ones that aren't scanned so neatly, will require a degree of proofing. The proofing stage is often the most time consuming and OCR is no help if proofing takes as long as retyping the page.
Luckily, ScanSoft has added some valuable features to speed up the overall process. You can now re-zone parts of a page on the fly, without the need to rescan it. A feature called IntelliTrain Proofing will apply a correction to the same errors throughout a document, minimising repetition. Context checking is now implemented, and whenever a suspect word is found you see it as it was scanned so you're able to dynamically verify it. Especially troublesome characters or symbols can be identified, and OmniPage Pro 12 can be trained to deal with them correctly in future. RealSpeak Text-to-Speech also refines the proofing process .
While OmniPage Pro has always been targeted at the higher end of the market, there are plenty of enhancements in version 12 to attract potential enterprise customers. A new network install provision makes deployment of OmniPage Pro 12 easier for system administrators. There's also a new volume seat pricing structure for network customers. ODMA connectivity is supported, making it easier to integrate OmniPage Pro 12 with corporate document management systems. ScanSoft has even provided OLE hooks, for developers to complete the relatively simple task of integrating OmniPage Pro 12 with enterprise applications.
All these improvements mean that OmniPage Pro 12 Office is an impressive and significant addition to the long OmniPage ancestral line. Much less expensive alternatives like ABBYY FineReader 6 were beginning to make older OmniPage versions look very ordinary, but OmniPage Pro 12 has built on its strengths and added carefully targeted new features to make it a package to be reckoned with. The retail box price remains an astonishing £369, but remember that you can upgrade from any OCR package (say, the one bundled with your scanner) for the considerably discounted price of £130.
By Ian Burley
SPECIFICATIONS:
Pentium or higher, 64MB of RAM, 110MB of hard disk space, TWAIN-compatible scanner, Windows 98 SE, ME, NT 4 (SP 6), 2000 or XP.