Build your own distributed file system
Posted on 11 Oct 2009 at 15:13
Simon Brock marvels at Google's resilient distributed file system, and builds his own from simple, cheap hardware and open-source software
One of the crucial components that makes Google so effective is its distributed file system, which underlies all the apps such as Mail, Documents and the Picasa photo service.
The Google File System divides user files into packets and distributes them across a collection of servers, which are themselves distributed across many hosting centres.
Many people would like such a reliable file system for their own apps, but Google doesn’t currently allow direct API access (although Amazon’s S3 service is quite similar).
Until recently, most reliable file systems centralised the data onto storage nodes built from expensive dedicated hardware – for example storage area networks with duplicated controllers, power supplies and disk drives, using fibre channel interconnect running at gigabit speeds – whereas Google’s setup employs cheap, commodity hardware as building blocks.
We can now look to distribute storage across very cheap servers, which probably only have one processor socket but a terabyte of disk space
Recent advances in technology are making such dedicated nodes feel obsolete. Where you might have made up a 1TB resilient node from a dozen 250GB disks (five drives as a RAID5 plus one hot spare, duplicated into a RAID10 mirror), this year 1TB single drives are on sale and bigger ones will soon be available.
We can now look to distribute storage across very cheap servers, which probably only have one processor socket but a terabyte of disk space.
Most modern applications don’t actually use a great deal of storage: we may have multi-terabyte disks, but rarely use all that space. For example, I started writing these 3,000-word columns a few years ago and have used three new versions of Word in that time, but the document size has barely increased at all – a far slower rate than drive capacity.
So here’s our new model for building resilience – do a Google using simple, cheap hardware and open-source software to bind it into a file system that can be used by existing applications. We don’t want to have to rewrite our applications to use this file system, as is the case with some massively parallel file systems.
Light the FUSE
FUSE (File system in USE space) is an open-source package for creating new file systems, which is happiest on Linux but can be run under Mac OS X and, in some cases, even Windows.
File systems are normally created and managed via the OS in “kernel” or “privileged” memory space, but while FUSE itself employs a module that runs under control of the OS kernel, it enables ordinary user applications to create file systems that can be used by other applications.
We’ve mentioned FUSE before in this column as the basis of various useful file system extensions: for example, for Unix systems there are FUSE-based utilities to enable encryption (useful for safely managing USB sticks); access to archive files (treat a zip as a set of directories); and foreign file systems such as NTFS images inside Unix.
By implementing a particular interface to the OS kernel, FUSE enables applications to appear and behave as file systems, while monitoring what those applications do to the file system.
This last aspect is going to be very important to our implementation, as it will be the key to distributing the file data. Say our application is a mail server that’s accessing a collection of files on our resilient file system – to ensure resilience we need to make sure each file is duplicated at least once, and that happens onto a separate machine.
Excellent overview..
Now that's what I call a technical article, well written and not completely over my head :)
This one sent me scurrying to wikipedia for half an hour looking at file systems (which I know zip about). We need more like this guys..
By pinero50 on 18 Nov 2009 
Simon Brock
Simon runs UK-based Wide Area Communications, the company behind websites such as The Spectator. He's a contributing editor to PC Pro and a fervent believer in open-source technologies..
advertisement
- Getting to grips with Microsoft's IT Health Environment Scanner
- Virtualise your servers
- The changing face of travel gadgets
- The bulletproof Dell that costs an arm and a leg
- Microsoft Office 2010 Technical Preview: Q&A
- Lawnmowers, the TyTN II and one odd insurance request
- There'll never be a bulletproof OS
- How far can we trust apps?
- Five nice touches in Outlook 2010
- Building a better Google
- Why Britain's watchdogs have fewer teeth than goldfish
- Tabbed documents: how to make Office 2010 great
- Outlook 2010 People Pane – does it spell death to Xobni
- Microsoft Outlook 2010 screenshots
- Co-Authoring in Word 2010 and SharePoint Foundation 2010
- Microsoft Outlook 2010 screenshots: Backstage view
- Flash 10.1: Developing for Desktop and Device
- Microsoft Office 2010 screenshots: Recover unsaved items
- Microsoft Word 2010 screenshots: Text Effects
- Microsoft Word 2010: inserting screenshots
- Q&A: Why Conficker was a victim of its own success
- App developers losing faith in Android
- Biz Stone: Murdoch's Google veto will "fail fast"
- Google adds automatic captions to YouTube
- China ramps up cyber spying
- Mozilla maintains dependence on Google
- Windows 7 flying off the shelves
- Google Chrome OS: full details unveiled
- AOL slashes 2,500 jobs
- YouTube begins streaming full-length shows
advertisement
Printed from www.pcpro.co.uk


