Skip to navigation

PCPro-Computing in the Real World Printed from www.pcpro.co.uk

Register to receive our regular email newsletter at http://www.pcpro.co.uk/registration.

The newsletter contains links to our latest PC news, product reviews, features and how-to guides, plus special offers and competitions.

Real World Computing

Monitoring with Nagios

Posted on 29 Mar 2005 at 15:32

Simon Brock relates the woes of hardware problems on a Friday night, and provides a solution for system monitoring

Last Friday wasn't a particularly bad day for us, but it certainly was for one of our clients. The previous evening, half their network had disappeared, at least from where I was sitting, and the next morning they told us a network switch had broken and been replaced. Later that afternoon, some of their services disappeared again and we discovered a router was wrongly configured. It was six o'clock on Friday evening by now and no-one was answering the phone, so we sent them an email and went down the pub. An hour later, I received a call on my mobile phone; I was told the router was fixed and they asked me to check everything was working. Normally, we would ask if it could wait until we got home or returned to the office, but we would just moved all our monitoring over to a piece of software called Nagios, which has a WAP interface. We checked that all their services were working from the comfort of the pub.

However, Nagios is much more than a belated justification for WAP phones. One day, we woke up to discover we had more servers than we knew what to do with and that our business depended on machines we weren't monitoring very well. How could we find out what all those servers were doing, know what had gone wrong and when? We already ran a bunch of scripts on various machines to check matters like disk space usage, but these were becoming increasingly hard to maintain and didn't check enough things. We also needed to monitor service quality as well as availability. Our company creates and hosts websites, and we need to know that pages are being served quickly from both within and without our own network.

We had to take a decision between writing more software ourselves, finding an open-source solution or buying in monitoring software. The first route wasn't really viable, as we needed a cross-platform solution to monitor both Windows and Unix servers. We could write easily enough for Unix, but weren't confident about writing for Windows. There is commercial software out there that seemed to do what we wanted, but it is often expensive (we hate to spend more on monitoring software than we paid for the server itself). That left open-source as our only option.

We soon uncovered a large collection of open-source monitoring software. However, unlike some applications where you know which solution is best - for example, the dominant open-source web server is Apache - in network monitoring there is no clear winner. We had to devise a way to choose, so we applied these three criteria, which we tend to use with all open-source solutions:

1. Does it work with what we have already?

2. Do other people use it?

3. Are many people working on it?

On the first point, we needed to monitor both Unix and Windows servers, and to monitor those parameters that interest us: resource usage (CPU and disk); service availability (is the web server up?); and service quality (how fast is this page generated?). It would also be good if we could monitor the hardware, to check that it is not getting too hot. We didn't want to have to install much other software either. For example, we use MySQL for virtually all our database work and didn't want to install a different database server (say, PostgreSQL) just to monitor the network.

The second and third points about other users and developer status are equally important. On open-source sites like SourceForge, you will see a lot of projects that seem to do what you want, but either aren't used by anyone else or have had no development done on them for years. For software you are going to rely on, it is important to know there is a community that can help. A good way to check is to examine any mailing list that comes with the package. If there is not one, or it shows no postings for a year or so, there may be a reason.

1 2 3 4
Be the first to comment this article

You need to Login or Register to comment.

(optional)

advertisement

Latest Real World Computing
Latest Blog Posts Subscribe to our RSS Feeds
Latest News Stories Subscribe to our RSS Feeds
Latest Reviews Subscribe to our RSS Feeds

advertisement

Sponsored Links
 
SEARCH
SIGN UP

Your email:

Your password:

remember me

advertisement


Hitwise Top 10 Website 2008