Skip to navigation
Real World Computing
Chrome

Build your own distributed file system

Posted on 10 Nov 2009 at 15:13

Simon Brock marvels at Google's resilient distributed file system, and builds his own from simple, cheap hardware and open-source software

Whenever our application talks to the resilient file system this duplication could be managed in one of two ways: either it could talk to one node that then passes on any changes to other nodes, or it could talk to all nodes at the same time.

The advantage of talking to a single node is simplicity, but there are a number of disadvantages. For example, if our application is talking to a node that suddenly goes down, how do we know whether the changes we made to that node were ever propagated to the others?

The multinode method avoids such problems, guaranteeing that if a file is on one node then it must be on some others too, and any node that doesn’t have a copy of the file knows where to find it.

While that might tempt you to dismiss the single-node route as flaky, many applications – particularly web servers – contain files that rarely change, which are written only once but read many times.

That means that read resilience is more important than write: even when a file does change, various levels of caching between the user’s web browser and the server may cause it to be some time before that change is reflected to all users.

Some database administrators will already recognise the scenarios just described: MySQL, for example, supports two ways of replicating changes across a collection of database servers. The simpler is to use binary log file replication, where changes to a table on one server are sent to the other databases in the cluster.

The important point is that the application making a change will think it’s succeeded once it has finished on one node, which works well enough but can lead to small replication delays and problems when a server fails.

The more resilient mode is to use MySQL Cluster, but that requires more resources and requires that any change must succeed on all nodes before the application will be told it has completed.

MySQL FS and SeznamFS

If the replication methods employed by a resilient file system are so similar to those employed by a database, why not use a database as the back-end to our file system?

This is the basic idea behind MySQL FS, which uses a MySQL database to hold all the file system information. If we ignore the replication aspect, then at first glance using a database to underpin a file system sounds downright stupid, as after all file systems need to be fast while databases are slow. However, on reflection it may not be as stupid as you think, for several reasons.

First, various operating systems over the years have attempted such a fusion between file system and database. In the dinosaur world of IBM mini-computers System/38 led the way, while more recently Microsoft hinted about combining SQL Server with the file system for Windows Server, although in the end it was never released.

MySQL FS isn’t as sophisticated as either of those systems, but it does work. A modern file system has to implement a very similar storage architecture to that employed by a database – it supports directories organised in a tree-like hierarchy, while a database has tables linked together via relations.

There’s also a similarity between the way an application must walk over this directory structure or access data held in linked tables. In both cases, remembering the results of expensive operations can lead to substantially improved performance – a database caches results from query table indices, while an OS caches results from querying directory structures.

1 2 3 4
Subscribe to PC Pro magazine. We'll give you 3 issues for £1 plus a free gift - click here

From around the web

User comments

Excellent overview..

Now that's what I call a technical article, well written and not completely over my head :)
This one sent me scurrying to wikipedia for half an hour looking at file systems (which I know zip about). We need more like this guys..

By pinero50 on 18 Nov 2009

Agreed, but

I agree, but at the same time, it would be useful to have a follow on article on how to setup GlusterFS for example on Ubuntu.

Especially since Simon seems to be suggesting GlusterFS is the 'winner' out of all the distributes FS's he's written about in this article.

I'd also be interested to know if it would be possible to run this on pc's that are used as desktop computers to provide some kind of cloud backup...

By GAZZAT5 on 22 Nov 2009

anthonysjones

Way Hay! An article that doesn't involve some Windows feature or bug, some laptop with a shiny case or a graphics card. Finally something I can tinker with myself and say "hay I built that".

How about some articles on setting up and connecting databases to PHP sites? I'm on chapter one of Oracle 10g.

By anthonysjones on 24 Nov 2009

Leave a comment

You need to Login or Register to comment.

(optional)

Simon Brock

Simon Brock

Simon runs UK-based Wide Area Communications, the company behind websites such as The Spectator. He's a contributing editor to PC Pro and a fervent believer in open-source technologies..

Read more More by Simon Brock

advertisement

Latest Real World Computing
Latest Blog Posts Subscribe to our RSS Feeds
Latest News Stories Subscribe to our RSS Feeds
Latest ReviewsSubscribe to our RSS Feeds

advertisement

Sponsored Links
 
SEARCH
SIGN UP

Your email:

Your password:

remember me

advertisement


Hitwise Top 10 Website 2010
 
 

PCPro-Computing in the Real World Printed from www.pcpro.co.uk

Register to receive our regular email newsletter at http://www.pcpro.co.uk/registration.

The newsletter contains links to our latest PC news, product reviews, features and how-to guides, plus special offers and competitions.