Digitising the British Library
Posted on 17 Jan 2008 at 11:42
Even with Microsoft's experience of book digitisation, the Library took great care to test the system. It ran a pilot test of the scanning technology in September 2006, after which it decided to plump for a high-volume, semi-automated scanning process. "We also had to run a book-movement pilot to check we could physically service the project," reveals Microsoft digitisation project manager Neil Fitzgerald. The Library ran a final three-month pilot last summer, after which it was decided to award the contract to Content Conversion Specialists (CCS), a third party jointly selected by the Library and Microsoft.
Starting to scan
With the small team of five needing to scan 50,000 pages a day to meet the project's target of 100,000 books in two years, speed is of the essence. But with 200-year-old books that look as though they'd fall apart if someone as much as sneezed in their direction, a great deal of care also needs to be taken. "We know we're handling treasures," says Richard Helle, managing director of CCS. "Every staff member has gone through extensive training on how to handle the books."
Each day, a new trolley of tomes arrives for the team to start scanning. Only books smaller than 28 x 35.5cm can fit into the semi-automatic scanners, which means 20 to 30% of them miss the cut. Before being placed into the scanner, each book is given a visual inspection to ensure there are no obviously loose or torn pages that could get mangled in the process.
The open books are then placed under a lectern, above which two Canon lenses are mounted, ready to take a digital image of the facing pages underneath. The operator turns the first few pages of each book by hand, after which the machine takes over. An arm extends from the left of the scanning machine and uses gentle suction tolift and turn the page automatically. However, the operator remains on guard at the lectern to ensure none of the pages are torn and to intervene quickly if the machine accidentally takes two pages at once, for example.
Once in automatic mode, the pages are scanned and turned at a frenetic rate of one every two or three seconds. It's hard to believe that mishaps don't occur, even with the operator standing by, but when we visited the Library a few weeks into the project Helle insisted the team had only slightly damaged one book. "We're talking about non-destructive scanning," he claims. "If we run into a problem, we communicate with Library staff." Nevertheless, his company has to be insured against damages.

A screen beside the operator's lectern displays previews of the scanned pages, so they can see immediately if there's a problem and have a page rescanned. The images are saved locally on a PC mounted beneath the operator's workstation before being sent to the project's 12 CPU blade server, which has 40TB of storage, all fully mirrored.
Around 1% of the pages scanned will be fold-outs, often containing illustrations or diagrams, that can't be scanned by the conventional machine. The operator makes a note of such pages on the computer system, and after the book is completed the fold-outs are scanned on a separate, larger overhead scanner to ensure all the pages in the book are retained. The computer software later integrates the separate fold-out files with the rest of the book pages.
Making sure the fold-out images are inserted back in the right place isn't the only challenge pictures present - the scanning environment also has to be monitored to ensure the colours are reproduced accurately. "Just one degree in temperature changes the light tuning and requires colour adjustments," says Helle. Consequently, there's no natural daylight in the air-conditioned, restricted-access bunker in which the scanning takes place, and all the scans made while we photographed the equipment had to be discarded to ensure the flashes didn't distort the images.
From around the web
For more details about purchasing this feature and/or images for editorial usage, please contact Jasmine Samra on pictures@dennis.co.uk
advertisement
- Windows 8 on ARM to run desktop apps... but only Office
- Windows 8 pauses desktop apps to save energy
- Mobiles boost Apple profits... and there's more to come
- Ubuntu rips up drop-down menus
- RIM founders fall on their swords
- Microsoft to tweak Windows 8 Start screen
- Weak PC sales expected to hit Microsoft's profits
- 802.11ac routers to hit 800Mbit/sec this year
- Asus Transformer Prime gets HD upgrade
- Netgear brings apps to routers for “smart networks”
- Chrome's shine getting lost in translation
- BytePac: the cardboard hard disk enclosure
- How tech loosens our grip on reality
- Hokum watch: Safer Internet Day
- Why I'm deleting Adobe from my PC
- Prepare to be patronised: it's Safer Internet Day
- Dear Sony, Samsung and every other tech company in the world: stop trying to be Apple
- Will Apple's Final Cut Pro X update placate the pros?
- Smartr Contacts for iPhone review
- Switching to Office 365's Outlook Web App
advertisement

