Features
Digitising the British Library
So what does Microsoft get out of the deal? In return for financing the project, Microsoft can host the collection on its Live Search Books site (http://books.live.com), and will probably have the collection live before the Library manages to update its own website. The Library wouldn't reveal for how long Microsoft has the licensing rights, but for such an expensive project it's safe to assume this isn't a short-term deal.
Even with Microsoft's experience of book digitisation, the Library took great care to test the system. It ran a pilot test of the scanning technology in September 2006, after which it decided to plump for a high-volume, semi-automated scanning process. "We also had to run a book-movement pilot to check we could physically service the project," reveals Microsoft digitisation project manager Neil Fitzgerald. The Library ran a final three-month pilot last summer, after which it was decided to award the contract to Content Conversion Specialists (CCS), a third party jointly selected by the Library and Microsoft.
Starting to scan
With the small team of five needing to scan 50,000 pages a day to meet the project's target of 100,000 books in two years, speed is of the essence. But with 200-year-old books that look as though they'd fall apart if someone as much as sneezed in their direction, a great deal of care also needs to be taken. "We know we're handling treasures," says Richard
ADVERTISEMENT |
|
Each day, a new trolley of tomes arrives for the team to start scanning. Only books smaller than 28 x 35.5cm can fit into the semi-automatic scanners, which means 20 to 30% of them miss the cut. Before being placed into the scanner, each book is given a visual inspection to ensure there are no obviously loose or torn pages that could get mangled in the process.
The open books are then placed under a lectern, above which two Canon lenses are mounted, ready to take a digital image of the facing pages underneath. The operator turns the first few pages of each book by hand, after which the machine takes over. An arm extends from the left of the scanning machine and uses gentle suction tolift and turn the page automatically. However, the operator remains on guard at the lectern to ensure none of the pages are torn and to intervene quickly if the machine accidentally takes two pages at once, for example.
Once in automatic mode, the pages are scanned and turned at a frenetic rate of one every two or three seconds. It's hard to believe that mishaps don't occur, even with the operator standing by, but when we visited the Library a few weeks into the project Helle insisted the team had only slightly damaged one book. "We're talking about non-destructive scanning," he claims. "If we run into a problem, we communicate with Library staff." Nevertheless, his company has to be insured against damages.
A screen beside the operator's lectern displays previews of the scanned pages, so they can see immediately if there's a problem and have a page rescanned. The images are saved locally on a PC mounted beneath the operator's workstation before being sent to the project's 12 CPU blade server, which has 40TB of storage, all fully mirrored.






