Skip to navigation
Analysis

Digitising the British Library

Posted on 17 Jan 2008 at 11:42

Even with Microsoft's experience of book digitisation, the Library took great care to test the system. It ran a pilot test of the scanning technology in September 2006, after which it decided to plump for a high-volume, semi-automated scanning process. "We also had to run a book-movement pilot to check we could physically service the project," reveals Microsoft digitisation project manager Neil Fitzgerald. The Library ran a final three-month pilot last summer, after which it was decided to award the contract to Content Conversion Specialists (CCS), a third party jointly selected by the Library and Microsoft.

Starting to scan

With the small team of five needing to scan 50,000 pages a day to meet the project's target of 100,000 books in two years, speed is of the essence. But with 200-year-old books that look as though they'd fall apart if someone as much as sneezed in their direction, a great deal of care also needs to be taken. "We know we're handling treasures," says Richard Helle, managing director of CCS. "Every staff member has gone through extensive training on how to handle the books."

Each day, a new trolley of tomes arrives for the team to start scanning. Only books smaller than 28 x 35.5cm can fit into the semi-automatic scanners, which means 20 to 30% of them miss the cut. Before being placed into the scanner, each book is given a visual inspection to ensure there are no obviously loose or torn pages that could get mangled in the process.

The open books are then placed under a lectern, above which two Canon lenses are mounted, ready to take a digital image of the facing pages underneath. The operator turns the first few pages of each book by hand, after which the machine takes over. An arm extends from the left of the scanning machine and uses gentle suction tolift and turn the page automatically. However, the operator remains on guard at the lectern to ensure none of the pages are torn and to intervene quickly if the machine accidentally takes two pages at once, for example.

Once in automatic mode, the pages are scanned and turned at a frenetic rate of one every two or three seconds. It's hard to believe that mishaps don't occur, even with the operator standing by, but when we visited the Library a few weeks into the project Helle insisted the team had only slightly damaged one book. "We're talking about non-destructive scanning," he claims. "If we run into a problem, we communicate with Library staff." Nevertheless, his company has to be insured against damages.

A screen beside the operator's lectern displays previews of the scanned pages, so they can see immediately if there's a problem and have a page rescanned. The images are saved locally on a PC mounted beneath the operator's workstation before being sent to the project's 12 CPU blade server, which has 40TB of storage, all fully mirrored.

Around 1% of the pages scanned will be fold-outs, often containing illustrations or diagrams, that can't be scanned by the conventional machine. The operator makes a note of such pages on the computer system, and after the book is completed the fold-outs are scanned on a separate, larger overhead scanner to ensure all the pages in the book are retained. The computer software later integrates the separate fold-out files with the rest of the book pages.

Making sure the fold-out images are inserted back in the right place isn't the only challenge pictures present - the scanning environment also has to be monitored to ensure the colours are reproduced accurately. "Just one degree in temperature changes the light tuning and requires colour adjustments," says Helle. Consequently, there's no natural daylight in the air-conditioned, restricted-access bunker in which the scanning takes place, and all the scans made while we photographed the equipment had to be discarded to ensure the flashes didn't distort the images.

1 2 3 4
Subscribe to PC Pro magazine. We'll give you 3 issues for £1 plus a free gift - click here

From around the web

Be the first to comment this article

You need to Login or Register to comment.

(optional)

For more details about purchasing this feature and/or images for editorial usage, please contact Jasmine Samra on pictures@dennis.co.uk

advertisement

Latest News StoriesSubscribe to our RSS Feeds
Latest Blog Posts Subscribe to our RSS Feeds

advertisement

Sponsored Links
 
SEARCH
SIGN UP

Your email:

Your password:

remember me

advertisement


Hitwise Top 10 Website 2010
 
 

PCPro-Computing in the Real World Printed from www.pcpro.co.uk

Register to receive our regular email newsletter at http://www.pcpro.co.uk/registration.

The newsletter contains links to our latest PC news, product reviews, features and how-to guides, plus special offers and competitions.