Computing in the real world
SEARCH FOR: IN:
Guest  Level 00    Register Log in

Real World Computing

Spreading the load

6th May 2008 [PC Pro]

If the load balancer really does understand the application-layer protocol then it may be able to change it in various ways, switching not only to a different server based on what the request wants, but how it wants it. It might be able to rewrite the request from the client before it reaches the server and then rewrite the response.

Open-source load balancing

For the rest of this column, we're going to look at two different open-source load-balancing systems. The first, HAProxy, is a true load balancer that can work at link, network and application layers to build resilient solutions. The second is the MySQL proxy, which is purely an application-layer solution but can carry out a number of different operations.

HAProxy has been around for several years and now offers a very complete solution with a feature set similar to commercial load balancers such as those from Foundry Systems. It runs on most Unix variants, and we tested it on a CentOS 5 machine. One of HAProxy's greatest strengths is that it supports a number of extensions for the HTTP web protocol so that, for example, it can route requests based on the URL and values in cookies. It can also monitor the health of various web services and reroute requests as appropriate. In particular, if a request is sent to a server which then fails, that request can be rerouted to another server.

The initial set up of HAProxy was very straightforward, although you can make different hardware choices. The first thing to consider is whether or not you want your load balancer to be transparent. A load balancer intercepts requests and forwards them to back-end servers, and it can do this either in a transparent fashion, which means the back-end servers see only the original IP address of the client, or alternatively the load balancer can use its own IP address. The transparent model has advantages if the application servers need to know the IP address of the client - either for deciding which service to provide or maybe for logging purposes. However, on Linux systems it requires a kernel patch, the tproxy patch, which we hadn't installed, so we ran HAProxy in non-transparent mode, where the application servers saw its own IP address.

Setting up the servers at the network layer level is simple - you literally say that the load balancer is going to accept connections on this port and forward requests to this collection of servers. Obviously, this setup can be applied to any TCP network service.

A commercial load balancer looks like a network switch, allowing many machines to be connected to it. Although multiport Ethernet cards do exist, we used the built-in Ethernet ports on our test load balancer, which meant we had to use another switch placed downstream from our load balancer to connect the servers to. Having sorted out all the cabling and got something running, we then had to test it. It was fairly obvious that we had the load balancer working, but the real question we wanted answered was whether the load balancer imposed a noticeable overhead on the network connection. Obviously, connection via a load balancer must be slower than a direct connection, but is the slow-down noticeable?

One would expect the overhead of a commercial load balancer to be quite low, since they use special-purpose embedded operating systems optimised for switching network packets and often run on specially designed hardware. Our load balancer running HAProxy on general-purpose Linux on stock hardware might therefore be expected to be a lot slower. We spent quite some time trying various loading options and timing techniques, but none could detect any appreciable overhead - any overhead only made the service a few percentage points slower.

Continued....