Real World Computing
Spreading the load
Application-level load balancing
The HAProxy package is a good general-purpose load balancer, and does have some application-layer load-balancing abilities, but we also wanted to take a look at the MySQL proxy package, which sits right at the other end of the load-balancing spectrum, being completely dedicated to a single application. Our reason for discussing it here is that it shows what can be done, in the way of state-of-the-art load balancing, for an important database product that's very widely used in open-source web applications - and it will almost certainly be copied by other database systems.
All load balancers are also proxy servers: you connect to the load balancer and it proxies that connection to the back-end servers. In its simplest form, this is what the MySQL proxy does - it can be used, for example, to spread database connections and hence queries across a collection of different database servers. However, it has one rather special trick up its sleeve: it comes with a built-in scripting engine that can hook into the connections. The language it uses - LUA - is specifically designed to be used in embedded applications and is already widely used.
You can write a script not only to monitor the queries that are being executed in your database, but also to change those queries. The MySQL website contains a number of examples of how the proxy can be used, and we tried a few different scripts to check various features. We first used the proxy simply to monitor what queries were being executed, which is particularly useful when debugging a database application. Then we used the proxy's ability to implement read/write splitting. A typical web-based content-serving application sees a very large number of read queries, but a rather small number of write queries, so it's often useful to spread the read queries across a number of different servers while directing all the write queries at one server that replicates the updates to the read-only servers. Using the built-in scripting language, we were able to implement read/write splitting and direct queries to different machines. Obviously, this feature could also be used to split queries for different databases to different machines, and to implement other sophisticated content-aware forms of load balancing. Moreover, it's possible with this proxy to rewrite the original query, or even to inject completely different queries into the stream, which opens interesting possibilities for optimising applications from the "outside" without having to look inside them to see how or when a query gets created.
At the time of writing, the MySQL proxy is still in alpha release, but it's already very usable. We've been using it for months on production systems and had no problems yet. It's fiddly, and we do recommend trying it in a development environment before taking it live, but it's a very useful addition to the toolkit.
In conclusion
We've looked at two very different load balancers here, but there are plenty of other open-source projects very similar to HAProxy. All, except for the MySQL proxy, require you to be running a Unix system, and in fairness all of them are fairly tricky to configure. Having said that, we've configured a few standalone commercial alternatives and found them no easier. Obviously, all the Linux-based load balancers can be teamed with a number of other Linux facilities, enabling you to build yourself a high-availability load balancer for about 20% of the cost of a single low-availability commercial load balancer.





