Understanding the "NoSQL movement"
Posted on 11 Feb 2010 at 14:28
Ian Wrigley takes a look at an alternative to relational databases: the so-called "NoSQL Movement"
I’m currently employed by Sun Microsystems as a senior instructor, teaching topics related to MySQL – database administration, performance tuning and the like. That makes me, in theory at least, something of an expert on MySQL, which is now the world’s largest open-source relational database management system.
However, I also keep an eye on whatever else is happening in the database world, and recently an interesting trend has started to emerge, driven by the needs of large websites such as Facebook.
This trend has come to be known as the “NoSQL movement”, and it attempts to address those problems that arise when the attempt to massively scale up a relational database management system (RDBMS) starts to break down.
Here’s the problem in a nutshell: it’s very difficult to continue scaling up an RDBMS once you pass a certain size, especially if you’re trying to distribute your datacenters geographically at the same time. The largest players such as Google and Amazon were all forced to face up to this problem some time ago, while websites such as Facebook, Digg and others are only now hitting the same barrier.
To understand why it’s such a problem, it’s worth taking a look at some relatively simple scaling strategies to see why they fail.
When scaling up a traditional RDBMS you’ll typically start off with a single database server (I’ll use MySQL as my example here, but the principles are similar, if not identical, whichever RDBMS you prefer). This works well for a while, but as your website becomes more and more popular, the loading on this server increases to a point where it starts to perform unacceptably slowly.
Your next step is normally to hang one or more slave servers off the master database server as shown in the diagram to the right.
All data on the master server is automatically replicated down onto these slaves, so that any application that wants to read some data can connect to any one of the slaves that’s available to do so. However, all writes must go to the master, because the data flow is one-way from master to slaves, and writing to a slave makes no sense.
This one-way traffic starts to become a problem if the dataset needs to be frequently updated: think of Facebook as an example, where people need to update their status and profile, upload pictures, comment on others’ pages and so on. Reading is no longer a problem since you can just hang more and more slaves off the master, but as the write traffic increases it will start to swamp the master server all over again.
At this point, you might want to introduce extra master servers, with master-master replication in addition to master-slave replication, and perhaps hang some slaves off each of these masters as shown in the diagram to the right.
This kind of architecture can help to handle the loads, although it turns out it isn’t nearly as perfect as it looks. You might think that once you have two masters in place you can do twice the writes per second, but sadly, that isn’t the case thanks to various problems (to do with file locking and parallel updates) that I don’t have the space to cover in adequate detail here. Such a multimaster scenario may help to some extent, but you’ll soon pass the point of steeply diminishing returns.
Horizontal and vertical partitioning
The next strategy to try is horizontal scaling – or “sharding” – the data. The idea is to keep certain subsets of your data on one server and some on a different one. For instance, perhaps you could store the data for all those people whose surnames begin with the letters A-M on one server and those beginning N-Z on another.
Excellent article considering I am currently studying SQL at university. Thanks :D
By 00lissauers on 11 Feb 2010
So good I registered to leave a comment
Thanks Ian for a really clear and easy to understand explanation of the problems and how we might tackle them.
By Crystallise on 18 Feb 2010
Can't really add anything original to the comment from Crystallise. Nicely written, Ian, and a great lead into the new 'object database approaches'. New territory for me as an old SQL hand. Will be digging deeper ! Thanks again.
By triballus on 19 Feb 2010
Ironic - There's nothing new under the sun.
I spent eight years working in the UK and Sweden with a proprietary Swedish product (try Goggling: tieto trip) that exactly fits the "noSQL" description described here and its been around since the 70s!
At the time I called it 'one of Sweden's best kept secrets' and it is now, despite the media bias in the blurb, used by companies and some Scandinavian governments as an archiving system for large amounts of unstructured data.
By lokash20 on 1 Mar 2010
Forgot to add...
I just wish it were open sourced.
By lokash20 on 1 Mar 2010
There's nothing new under the sun (again!)
Lotus Domino was doing this kind of stuff 20 years ago, and is still doing it today.
The guy who created CouchDB, in fact, was on the Domino team before he left to do CouchDB. He once described CouchDB as "Domino, built for the Web from the ground up" or something like that.
He was able to leave behind some of Domino's limitations (not to mention the much-unloved Lotus Notes client software).
By BrownieBoy6 on 11 Jul 2011
- Headings vs headers: how to use both in Word
- Windows Server 2012 R2: how the Datacenter edition could change SMBs
- Invoices and VAT: how to set up your documents correctly
- Nexus 5 vs Samsung Galaxy S4 Active: the best phone for avoiding screen burn
- How much is a social user worth?
- The key to choosing a secure password
- Thunderbolt Bridge: a fast Mac migration tool
- Should you advertise on Twitter?
- How to track a lost smartphone
- Self-publishing success: the best way to sell your book
- CeBit 2014 diary: Cameron comes to town
- The 5 most interesting UK businesses at SXSW
- Quickest way to upload 1GB? Hop on a train
- Move over Delia: IBM Watson is cooking tonight
- Eric Schmidt on the double-edged smartphone: friend and foe
- Getty joins the race to the bottom
- Hour of Code: five steps to learn how to code
- Sony Xperia Z2 Tablet review: first look
- Sony Xperia Z2 review: first look
- Samsung Galaxy Gear 2 review: first look
- Sony revives optical discs with 1TB Archival Disc
- IDC: iPad intertia opens door for Windows tablets
- Office 365 goes social with "Oslo" news feed
- Windows XP: upgrading 30,000 PCs in 30 days
- LibreOffice: ignore Microsoft's "nonsense" on government's open source plans
- Intel Xeon E7 v2 servers support 6TB of RAM
- Microsoft promises video calls between Skype and Lync
- Office for iPad due before July
- Windows 7 on business PCs gets an extension
- Windows apps land on Chromebooks with VMware