Computing in the real world
SEARCH FOR: IN:
Guest  Level 00    Register Log in

Columns

Technolog:

David Fearon [PC Pro]
The path to creating computing appliances is getting ever easier, says David Fearon

And so here's a factor not often talked about in relation to the open-source movement: guilt.

I've spent the last couple of days transferring PC Pro's archive server over to a dedicated machine. The archive holds a searchable index of our past issues and the issues themselves, accessible over the intranet for easy reference. Up until recently, it had simply been running on my Windows XP desktop PC, which had started to complain under the load.

The archive is built on the three mainstays of modern web-based information search and retrieval systems: the Apache web server (www.apache.org), generating pages dynamically using the PHP language module (www.php.net), hooked up to a keyword search index hosted on a MySQL database (www.mysql.com). All three of these packages are freely available and distributed under non-restrictive licences, allowing their use by anyone, for basically any purpose, at no monetary cost.

The fact that all this is free isn't even the major advantage. For example, the engine for searching past issues is based on phpdig (www.phpdig.net), another open-source project that I plumped for after a day or two of looking around for something that would fit the bill. As it stood, it was close to what I wanted but needed tailoring to the job in hand.

No problem there, though: rolling up my sleeves and rummaging through the source
 
 
ADVERTISEMENT
code for a while I finally came up with a tweaked version able to index issues correctly and deliver search results in exactly the way I wanted them delivered. This is a key advantage that closed-source proprietary software is simply unable to deliver. There's no imperative for the protection of intellectual property, and no need to make it hard for competing packages to interoperate.

The archive's use of freely available open-source tools doesn't stop there. The pages of the magazines are stored as Adobe Acrobat PDF files, the very same files we send to the printing press: this gives us a definitive reference. We're constrained to sending PDFs - a proprietary format, albeit one whose specifications are published by Adobe - because that's what the printers accept. I've given my opinion of the Acrobat format in the past and won't bother deriding it any more now. Suffice to say that writing a tool to parse out the raw text from a PDF is an unnecessarily difficult job, but it turns out I didn't need to: the pdftotext utility, part of the xpdf tool suite (www.foolabs.com/xpdf), can do it for me and pass the results directly to phpdig as it crawls through indexing the pages. Yep, it's open-source and distributed under the terms of the GNU public licence.

That's the indexing part sorted out: what about generating nice little thumbnails so users can get a visual overview of each issue? The thumbnails are generated by a bit of PHP code that sends commands to ImageMagick (www.imagemagick.com). No surprise that this is an open-source image-processing and manipulation package, able to take a PDF and spit out a resized thumbnail JPEG image.

The final joyous part of all this - and a genuine poke in the eye for platform-bound proprietary software - is that although this system initially ran on my Windows machine, it took just a day or so to transfer the whole thing to an entirely new operating system in the form of Fedora Core 5 (http://fedora.redhat.com), my new favourite Linux distribution. And yes! This is open source and free.

Continued....


Related News
Related Reviews
Related Columns