Digitising the British Library
Posted on 17 Jan 2008 at 11:42
Even with Microsoft's experience of book digitisation, the Library took great care to test the system. It ran a pilot test of the scanning technology in September 2006, after which it decided to plump for a high-volume, semi-automated scanning process. "We also had to run a book-movement pilot to check we could physically service the project," reveals Microsoft digitisation project manager Neil Fitzgerald. The Library ran a final three-month pilot last summer, after which it was decided to award the contract to Content Conversion Specialists (CCS), a third party jointly selected by the Library and Microsoft.
Starting to scan
With the small team of five needing to scan 50,000 pages a day to meet the project's target of 100,000 books in two years, speed is of the essence. But with 200-year-old books that look as though they'd fall apart if someone as much as sneezed in their direction, a great deal of care also needs to be taken. "We know we're handling treasures," says Richard Helle, managing director of CCS. "Every staff member has gone through extensive training on how to handle the books."
Each day, a new trolley of tomes arrives for the team to start scanning. Only books smaller than 28 x 35.5cm can fit into the semi-automatic scanners, which means 20 to 30% of them miss the cut. Before being placed into the scanner, each book is given a visual inspection to ensure there are no obviously loose or torn pages that could get mangled in the process.
The open books are then placed under a lectern, above which two Canon lenses are mounted, ready to take a digital image of the facing pages underneath. The operator turns the first few pages of each book by hand, after which the machine takes over. An arm extends from the left of the scanning machine and uses gentle suction tolift and turn the page automatically. However, the operator remains on guard at the lectern to ensure none of the pages are torn and to intervene quickly if the machine accidentally takes two pages at once, for example.
Once in automatic mode, the pages are scanned and turned at a frenetic rate of one every two or three seconds. It's hard to believe that mishaps don't occur, even with the operator standing by, but when we visited the Library a few weeks into the project Helle insisted the team had only slightly damaged one book. "We're talking about non-destructive scanning," he claims. "If we run into a problem, we communicate with Library staff." Nevertheless, his company has to be insured against damages.

A screen beside the operator's lectern displays previews of the scanned pages, so they can see immediately if there's a problem and have a page rescanned. The images are saved locally on a PC mounted beneath the operator's workstation before being sent to the project's 12 CPU blade server, which has 40TB of storage, all fully mirrored.
Around 1% of the pages scanned will be fold-outs, often containing illustrations or diagrams, that can't be scanned by the conventional machine. The operator makes a note of such pages on the computer system, and after the book is completed the fold-outs are scanned on a separate, larger overhead scanner to ensure all the pages in the book are retained. The computer software later integrates the separate fold-out files with the rest of the book pages.
Making sure the fold-out images are inserted back in the right place isn't the only challenge pictures present - the scanning environment also has to be monitored to ensure the colours are reproduced accurately. "Just one degree in temperature changes the light tuning and requires colour adjustments," says Helle. Consequently, there's no natural daylight in the air-conditioned, restricted-access bunker in which the scanning takes place, and all the scans made while we photographed the equipment had to be discarded to ensure the flashes didn't distort the images.
advertisement
- Microsoft to pay News Corp to stay off Google
- Christmas sales surge knocks out eBay search
- Windows 8 set for 2012 release
- Q&A: Why Conficker was a victim of its own success
- App developers losing faith in Android
- Biz Stone: Murdoch's Google veto will "fail fast"
- Google adds automatic captions to YouTube
- China ramps up cyber spying
- Mozilla maintains dependence on Google
- Windows 7 flying off the shelves
- Office 2010 Beta – 32-bit or 64-bit – The Choice is Clear
- Why Britain's watchdogs have fewer teeth than goldfish
- Tabbed documents: how to make Office 2010 great
- Outlook 2010 People Pane – does it spell death to Xobni
- Microsoft Outlook 2010 screenshots
- Co-Authoring in Word 2010 and SharePoint Foundation 2010
- Microsoft Outlook 2010 screenshots: Backstage view
- Flash 10.1: Developing for Desktop and Device
- Microsoft Office 2010 screenshots: Recover unsaved items
- Microsoft Word 2010 screenshots: Text Effects
- Getting to grips with Microsoft's IT Health Environment Scanner
- Virtualise your servers
- The changing face of travel gadgets
- Build your own distributed file system
- The bulletproof Dell that costs an arm and a leg
- Microsoft Office 2010 Technical Preview: Q&A
- Lawnmowers, the TyTN II and one odd insurance request
- There'll never be a bulletproof OS
- How far can we trust apps?
- Five nice touches in Outlook 2010
advertisement
Printed from www.pcpro.co.uk


