Monitoring with Nagios
Posted on 29 Mar 2005 at 15:32
Simon Brock relates the woes of hardware problems on a Friday night, and provides a solution for system monitoring
The next steps
So you have managed to monitor the machine Nagios is currently running on, but you need it to monitor other machines across the network. Assuming you have direct access to those machines, there are two ways of doing this, depending on whether the service is visible from the monitoring host. For example, if you want to monitor a remote web server, you can run the 'check_http' plug-in and tell it to inspect the other machine for a web service. However, if you want to monitor the disk space usage on a remote machine that may not be directly observable, you need to run the Nagios Remote Plug-in Executor (NRPE), which runs the plug-ins on the remote machine. NRPE comes in two parts: a daemon that runs on the remote machine and a plug-in that runs on the monitoring host. Having installed both parts, your check commands on the monitoring host use the 'check_nrpe' plug-in to run plug-ins on the remote host, which solves the problem.
However, you may be able to monitor a remote server without installing NRPE if that server offers an SNMP (Simple Network Management Protocol) service. Most Unix systems start an SNMP daemon by default, and SNMP is available under Windows as well. If you know your way around SNMP, you can monitor most resources this way, and there are Nagios plug-ins that check specific SNMP-monitored events.
With a bit of thought, you can even write your own plug-ins. A Nagios plug-in is just a program that returns a message saying what the current state of a service is (say, okay, warning or critical) with some useful information, and sets its return code according to the state. For example, we have written a plug-in that checks the state of our Dell servers via their OpenManage SNMP interface, which allows us to check fan speeds, temperatures and voltages remotely and notifies us if there is a problem.
More advanced topics
There are two main areas we have not yet spoken about: how do you monitor Windows machines, and how do you monitor a machine you cannot access directly? There are various ways of monitoring Windows machines, but the one we chose was the NC_Net, which comes with its own plug-in to be run on the monitoring host and an easy-to-install package for the Windows machines. Once it is up and running, you can check CPU load, uptime, disk space, processes and various other Windows metrics.
To monitor machines that you cannot access seems, at first glance, like a mildly stupid idea, but it happens all the time. We are talking about machines behind firewalls, which can probably talk to the monitoring host, but to which the monitoring host may not be able to reply directly. To make this work, you have to set up slightly different checks and deploy another piece of software called Nagios Service Check Acceptor (NSCA). The service checks we have discussed so far have been called 'active' checks, which means Nagios checks them itself regularly using plug-ins. We now want to define 'passive' checks, where Nagios processes the results but does not actually do the checking. To make this work, you run another copy of Nagios behind the firewall that performs active checking. This copy does not normally carry out notifications but forwards its results to the main Nagios monitoring host using NSCA. That instance of Nagios then processes the service checks it receives as though it had generated them itself and issues the appropriate notifications.
Before you ask, Nagios can tell whether it has received these service checks from behind a firewall and will inform you that there may be a problem if the information it has is stale. You can also use these passive service checks to handle events. For example, your backup software might send an SNMP trap if there is no tape drive online. These can be fed into Nagios to send out notifications.
advertisement
- Getting to grips with Microsoft's IT Health Environment Scanner
- Virtualise your servers
- The changing face of travel gadgets
- Build your own distributed file system
- The bulletproof Dell that costs an arm and a leg
- Microsoft Office 2010 Technical Preview: Q&A
- Lawnmowers, the TyTN II and one odd insurance request
- There'll never be a bulletproof OS
- How far can we trust apps?
- Five nice touches in Outlook 2010
- ATI Radeon HD 5970: 42% more expensive in the UK
- Office 2010 Beta – 32-bit or 64-bit – The Choice is Clear
- Why Britain's watchdogs have fewer teeth than goldfish
- Tabbed documents: how to make Office 2010 great
- Outlook 2010 People Pane – does it spell death to Xobni
- Microsoft Outlook 2010 screenshots
- Co-Authoring in Word 2010 and SharePoint Foundation 2010
- Microsoft Outlook 2010 screenshots: Backstage view
- Flash 10.1: Developing for Desktop and Device
- Microsoft Office 2010 screenshots: Recover unsaved items
- Apple "refuses to repair smokers' Macs"
- Spotify arrives on Symbian
- Chrome OS and Android to "converge over time"
- Microsoft to pay News Corp to stay off Google
- Christmas sales surge knocks out eBay search
- Windows 8 set for 2012 release
- Q&A: Why Conficker was a victim of its own success
- App developers losing faith in Android
- Biz Stone: Murdoch's Google veto will "fail fast"
- Google adds automatic captions to YouTube
advertisement
Printed from www.pcpro.co.uk


