Skip to navigation
Real World Computing
Chrome

Build your own distributed file system

Posted on 10 Nov 2009 at 15:13

Simon Brock marvels at Google's resilient distributed file system, and builds his own from simple, cheap hardware and open-source software

When you use Gluster for replication you can have any number of nodes in your file system, and all file operations become “atomic” across that file system – in other words, if your application changes a file via one node then all other nodes see that change immediately.

What’s very clever about the implementation of this in Gluster is that there’s no requirement for a node to be running to see the changes: if a node has been down for a while, when it rejoins the cluster it will heal its own version of the file system.

As an application tries to access a file the cluster will decide which files it should see, and if a particular brick doesn’t have the correct files they will be sent from other nodes.

This is particularly important because each individual node can act as both a client and a server/brick. An application will always try to read files from its local copy first, rather than going out to the other nodes in the network.

The GlusterFS architecture is layered, allowing different translators to be inserted at different points

The GlusterFS architecture is layered, allowing different translators to be inserted at different points – so, for example, caches can be introduced on both the client and server to speed up accesses.

A striping translator can be introduced to stripe files across a cluster of bricks, which can be useful for building high-performance clusters, and similarly it’s possible to deploy different network fabrics to improve performance: GlusterFS has native support for 10Gb Ethernet but can fall back to slower Gigabit.

Gluster is simple to install too, and can either be compiled from source or installed from packages. We’ve set up a number of different GlusterFS systems now and have found them to be very reliable once configured.

We did have some problems, but these were mostly due to us experimenting with our configuration. We’d recommend that you decide before you start what you want to achieve and then configure for that specific setup, rather than setting up one configuration and then trying to change it to suit. For example, adding more nodes to a working GlusterFS proved unreliable for us: better to start with as many as you need.

Gluster is certainly the most powerful of the products we looked at, but by using any one of them it’s now possible to build a reliable clustered file system for your applications for just a few hundred pounds, which would have cost many thousands in hardware just a few years ago.

1 2 3 4
Subscribe to PC Pro magazine. We'll give you 3 issues for £1 plus a free gift - click here

From around the web

User comments

Excellent overview..

Now that's what I call a technical article, well written and not completely over my head :)
This one sent me scurrying to wikipedia for half an hour looking at file systems (which I know zip about). We need more like this guys..

By pinero50 on 18 Nov 2009

Agreed, but

I agree, but at the same time, it would be useful to have a follow on article on how to setup GlusterFS for example on Ubuntu.

Especially since Simon seems to be suggesting GlusterFS is the 'winner' out of all the distributes FS's he's written about in this article.

I'd also be interested to know if it would be possible to run this on pc's that are used as desktop computers to provide some kind of cloud backup...

By GAZZAT5 on 22 Nov 2009

anthonysjones

Way Hay! An article that doesn't involve some Windows feature or bug, some laptop with a shiny case or a graphics card. Finally something I can tinker with myself and say "hay I built that".

How about some articles on setting up and connecting databases to PHP sites? I'm on chapter one of Oracle 10g.

By anthonysjones on 24 Nov 2009

Leave a comment

You need to Login or Register to comment.

(optional)

Simon Brock

Simon Brock

Simon runs UK-based Wide Area Communications, the company behind websites such as The Spectator. He's a contributing editor to PC Pro and a fervent believer in open-source technologies..

Read more More by Simon Brock

advertisement

Latest Real World Computing
Latest Blog Posts Subscribe to our RSS Feeds
Latest News Stories Subscribe to our RSS Feeds
Latest ReviewsSubscribe to our RSS Feeds

advertisement

Sponsored Links
 
SEARCH
SIGN UP

Your email:

Your password:

remember me

advertisement


Hitwise Top 10 Website 2010
 
 

PCPro-Computing in the Real World Printed from www.pcpro.co.uk

Register to receive our regular email newsletter at http://www.pcpro.co.uk/registration.

The newsletter contains links to our latest PC news, product reviews, features and how-to guides, plus special offers and competitions.