Monitoring with Nagios
Posted on 29 Mar 2005 at 15:32
Simon Brock relates the woes of hardware problems on a Friday night, and provides a solution for system monitoring
Last Friday wasn't a particularly bad day for us, but it certainly was for one of our clients. The previous evening, half their network had disappeared, at least from where I was sitting, and the next morning they told us a network switch had broken and been replaced. Later that afternoon, some of their services disappeared again and we discovered a router was wrongly configured. It was six o'clock on Friday evening by now and no-one was answering the phone, so we sent them an email and went down the pub. An hour later, I received a call on my mobile phone; I was told the router was fixed and they asked me to check everything was working. Normally, we would ask if it could wait until we got home or returned to the office, but we would just moved all our monitoring over to a piece of software called Nagios, which has a WAP interface. We checked that all their services were working from the comfort of the pub.
However, Nagios is much more than a belated justification for WAP phones. One day, we woke up to discover we had more servers than we knew what to do with and that our business depended on machines we weren't monitoring very well. How could we find out what all those servers were doing, know what had gone wrong and when? We already ran a bunch of scripts on various machines to check matters like disk space usage, but these were becoming increasingly hard to maintain and didn't check enough things. We also needed to monitor service quality as well as availability. Our company creates and hosts websites, and we need to know that pages are being served quickly from both within and without our own network.
We had to take a decision between writing more software ourselves, finding an open-source solution or buying in monitoring software. The first route wasn't really viable, as we needed a cross-platform solution to monitor both Windows and Unix servers. We could write easily enough for Unix, but weren't confident about writing for Windows. There is commercial software out there that seemed to do what we wanted, but it is often expensive (we hate to spend more on monitoring software than we paid for the server itself). That left open-source as our only option.
We soon uncovered a large collection of open-source monitoring software. However, unlike some applications where you know which solution is best - for example, the dominant open-source web server is Apache - in network monitoring there is no clear winner. We had to devise a way to choose, so we applied these three criteria, which we tend to use with all open-source solutions:
1. Does it work with what we have already?
2. Do other people use it?
3. Are many people working on it?
On the first point, we needed to monitor both Unix and Windows servers, and to monitor those parameters that interest us: resource usage (CPU and disk); service availability (is the web server up?); and service quality (how fast is this page generated?). It would also be good if we could monitor the hardware, to check that it is not getting too hot. We didn't want to have to install much other software either. For example, we use MySQL for virtually all our database work and didn't want to install a different database server (say, PostgreSQL) just to monitor the network.
The second and third points about other users and developer status are equally important. On open-source sites like SourceForge, you will see a lot of projects that seem to do what you want, but either aren't used by anyone else or have had no development done on them for years. For software you are going to rely on, it is important to know there is a community that can help. A good way to check is to examine any mailing list that comes with the package. If there is not one, or it shows no postings for a year or so, there may be a reason.
From around the web
advertisement
- Why virtualisation hasn't slowed the growth of data
- How to make Google AdWords work for your business
- The curse of sloppily written software
- Paying for your crimes with Bitcoin
- Behind the scenes: tech support for Formula 1
- The security risk of fat fingers
- Why Windows Phone 7 isn't quite ready for business
- When will Microsoft stop fiddling with Windows 8?
- Flash down the pan?
- Metro Style apps vs desktop applications
- Chrome's shine getting lost in translation
- BytePac: the cardboard hard disk enclosure
- How tech loosens our grip on reality
- Hokum watch: Safer Internet Day
- Why I'm deleting Adobe from my PC
- Prepare to be patronised: it's Safer Internet Day
- Dear Sony, Samsung and every other tech company in the world: stop trying to be Apple
- Will Apple's Final Cut Pro X update placate the pros?
- Smartr Contacts for iPhone review
- Switching to Office 365's Outlook Web App
- VeriSign slammed for security breach cover-up
- SAP willing to share HANA with Oracle
- Why using a tablet could harm your health
- New RIM boss: no need for drastic change
- RIM founders fall on their swords
- Slow economy helps boost Red Hat revenue by 23%
- Google+ pages get multiple admins
- One in five companies lack card industry compliance
- Oil industry warns hacking attacks could kill
- British workers fear email monitoring
advertisement

