Monitoring with Nagios
Posted on 29 Mar 2005 at 15:32
Simon Brock relates the woes of hardware problems on a Friday night, and provides a solution for system monitoring
The next steps
So you have managed to monitor the machine Nagios is currently running on, but you need it to monitor other machines across the network. Assuming you have direct access to those machines, there are two ways of doing this, depending on whether the service is visible from the monitoring host. For example, if you want to monitor a remote web server, you can run the 'check_http' plug-in and tell it to inspect the other machine for a web service. However, if you want to monitor the disk space usage on a remote machine that may not be directly observable, you need to run the Nagios Remote Plug-in Executor (NRPE), which runs the plug-ins on the remote machine. NRPE comes in two parts: a daemon that runs on the remote machine and a plug-in that runs on the monitoring host. Having installed both parts, your check commands on the monitoring host use the 'check_nrpe' plug-in to run plug-ins on the remote host, which solves the problem.
However, you may be able to monitor a remote server without installing NRPE if that server offers an SNMP (Simple Network Management Protocol) service. Most Unix systems start an SNMP daemon by default, and SNMP is available under Windows as well. If you know your way around SNMP, you can monitor most resources this way, and there are Nagios plug-ins that check specific SNMP-monitored events.
With a bit of thought, you can even write your own plug-ins. A Nagios plug-in is just a program that returns a message saying what the current state of a service is (say, okay, warning or critical) with some useful information, and sets its return code according to the state. For example, we have written a plug-in that checks the state of our Dell servers via their OpenManage SNMP interface, which allows us to check fan speeds, temperatures and voltages remotely and notifies us if there is a problem.
More advanced topics
There are two main areas we have not yet spoken about: how do you monitor Windows machines, and how do you monitor a machine you cannot access directly? There are various ways of monitoring Windows machines, but the one we chose was the NC_Net, which comes with its own plug-in to be run on the monitoring host and an easy-to-install package for the Windows machines. Once it is up and running, you can check CPU load, uptime, disk space, processes and various other Windows metrics.
To monitor machines that you cannot access seems, at first glance, like a mildly stupid idea, but it happens all the time. We are talking about machines behind firewalls, which can probably talk to the monitoring host, but to which the monitoring host may not be able to reply directly. To make this work, you have to set up slightly different checks and deploy another piece of software called Nagios Service Check Acceptor (NSCA). The service checks we have discussed so far have been called 'active' checks, which means Nagios checks them itself regularly using plug-ins. We now want to define 'passive' checks, where Nagios processes the results but does not actually do the checking. To make this work, you run another copy of Nagios behind the firewall that performs active checking. This copy does not normally carry out notifications but forwards its results to the main Nagios monitoring host using NSCA. That instance of Nagios then processes the service checks it receives as though it had generated them itself and issues the appropriate notifications.
Before you ask, Nagios can tell whether it has received these service checks from behind a firewall and will inform you that there may be a problem if the information it has is stale. You can also use these passive service checks to handle events. For example, your backup software might send an SNMP trap if there is no tape drive online. These can be fed into Nagios to send out notifications.
From around the web
advertisement
- Why virtualisation hasn't slowed the growth of data
- How to make Google AdWords work for your business
- The curse of sloppily written software
- Paying for your crimes with Bitcoin
- Behind the scenes: tech support for Formula 1
- The security risk of fat fingers
- Why Windows Phone 7 isn't quite ready for business
- When will Microsoft stop fiddling with Windows 8?
- Flash down the pan?
- Metro Style apps vs desktop applications
- Chrome's shine getting lost in translation
- BytePac: the cardboard hard disk enclosure
- How tech loosens our grip on reality
- Hokum watch: Safer Internet Day
- Why I'm deleting Adobe from my PC
- Prepare to be patronised: it's Safer Internet Day
- Dear Sony, Samsung and every other tech company in the world: stop trying to be Apple
- Will Apple's Final Cut Pro X update placate the pros?
- Smartr Contacts for iPhone review
- Switching to Office 365's Outlook Web App
- VeriSign slammed for security breach cover-up
- SAP willing to share HANA with Oracle
- Why using a tablet could harm your health
- New RIM boss: no need for drastic change
- RIM founders fall on their swords
- Slow economy helps boost Red Hat revenue by 23%
- Google+ pages get multiple admins
- One in five companies lack card industry compliance
- Oil industry warns hacking attacks could kill
- British workers fear email monitoring
advertisement

