Skip to navigation

PCPro-Computing in the Real World Printed from www.pcpro.co.uk

Register to receive our regular email newsletter at http://www.pcpro.co.uk/registration.

The newsletter contains links to our latest PC news, product reviews, features and how-to guides, plus special offers and competitions.

Real World Computing
Chrome

Build your own distributed file system

Posted on 11 Oct 2009 at 15:13

Simon Brock marvels at Google's resilient distributed file system, and builds his own from simple, cheap hardware and open-source software

One of the crucial components that makes Google so effective is its distributed file system, which underlies all the apps such as Mail, Documents and the Picasa photo service.

The Google File System divides user files into packets and distributes them across a collection of servers, which are themselves distributed across many hosting centres.

Many people would like such a reliable file system for their own apps, but Google doesn’t currently allow direct API access (although Amazon’s S3 service is quite similar).

Until recently, most reliable file systems centralised the data onto storage nodes built from expensive dedicated hardware – for example storage area networks with duplicated controllers, power supplies and disk drives, using fibre channel interconnect running at gigabit speeds – whereas Google’s setup employs cheap, commodity hardware as building blocks.

We can now look to distribute storage across very cheap servers, which probably only have one processor socket but a terabyte of disk space

Recent advances in technology are making such dedicated nodes feel obsolete. Where you might have made up a 1TB resilient node from a dozen 250GB disks (five drives as a RAID5 plus one hot spare, duplicated into a RAID10 mirror), this year 1TB single drives are on sale and bigger ones will soon be available.

We can now look to distribute storage across very cheap servers, which probably only have one processor socket but a terabyte of disk space.

Most modern applications don’t actually use a great deal of storage: we may have multi-terabyte disks, but rarely use all that space. For example, I started writing these 3,000-word columns a few years ago and have used three new versions of Word in that time, but the document size has barely increased at all – a far slower rate than drive capacity.

So here’s our new model for building resilience – do a Google using simple, cheap hardware and open-source software to bind it into a file system that can be used by existing applications. We don’t want to have to rewrite our applications to use this file system, as is the case with some massively parallel file systems.

Light the FUSE

FUSE (File system in USE space) is an open-source package for creating new file systems, which is happiest on Linux but can be run under Mac OS X and, in some cases, even Windows.

File systems are normally created and managed via the OS in “kernel” or “privileged” memory space, but while FUSE itself employs a module that runs under control of the OS kernel, it enables ordinary user applications to create file systems that can be used by other applications.

We’ve mentioned FUSE before in this column as the basis of various useful file system extensions: for example, for Unix systems there are FUSE-based utilities to enable encryption (useful for safely managing USB sticks); access to archive files (treat a zip as a set of directories); and foreign file systems such as NTFS images inside Unix.

By implementing a particular interface to the OS kernel, FUSE enables applications to appear and behave as file systems, while monitoring what those applications do to the file system.

This last aspect is going to be very important to our implementation, as it will be the key to distributing the file data. Say our application is a mail server that’s accessing a collection of files on our resilient file system – to ensure resilience we need to make sure each file is duplicated at least once, and that happens onto a separate machine.

1 2 3 4
User comments

Excellent overview..

Now that's what I call a technical article, well written and not completely over my head :)
This one sent me scurrying to wikipedia for half an hour looking at file systems (which I know zip about). We need more like this guys..

By pinero50 on 18 Nov 2009

Leave a comment

You need to Login or Register to comment.

(optional)

Simon Brock

Simon Brock

Simon runs UK-based Wide Area Communications, the company behind websites such as The Spectator. He's a contributing editor to PC Pro and a fervent believer in open-source technologies..

Read more More by Simon Brock

advertisement

Most Commented Real World Articles
Latest Real World Computing
Latest Blog Posts Subscribe to our RSS Feeds
Latest News Stories Subscribe to our RSS Feeds
Latest Reviews Subscribe to our RSS Feeds

advertisement

Sponsored Links
 
SEARCH
SIGN UP

Your email:

Your password:

remember me

advertisement


Hitwise Top 10 Website 2008