How do Nagios clients communicate?

2006-06-01 00:00:00

I know that, the first time I started using Nagios, I got confused a little when it came to monitoring systems other than the one running Nagios. To shed a little light on the subject for the beginning Nagios user, here's a discussion of the various methods of talking to Nagios clients.

First off, let me make it absolutely clear that, in order to monitor systems other than the one running Nagios, you are indeed going to have to communicate with them in some fashion. Unfortunately very few things in the Sysadmin trade are magical, and Nagios is unfortunately not one of them.

So first off, let's look at the -wrong- way of doing things. When I first started with Nagios (actually I made this mistake on my second day with the software) I wrote something like this:

define service{

   host_name remote-host

   service_description D_ROOT

   check_command check_disk!85!95!/

}

The problem with this setup is that I was using a -local- check and said it belonged to remote-host. Now this may look alright on the status screen ("Hey! It's green!"), but naturally you're not monitoring the right thing ^_^

So how -do- you monitor remote resources? Here's a table comparing various methods. After that I'll give examples on how you can correct the mistake I made above with each method.

PLEASE NOTE: the following discussion will not cover the monitoring of systems other than the various UNIX flavours. Later on I'll write a similar article covering Windows and stuff like Cisco.


A quick comparison

SSH

NRPE

SNMP

SNMP traps

NSCA

Connection

initiation

Srv -> Clnt

Srv -> Clnt

Srv -> Clnt

Clnt -> Srv

Clnt -> Srv

Security

Encryption

TCP wrappers

Key pairs

Encryption

Access List

TCP wrappers

Access List (v2)

Password (v3)

Access List (v2)

Password (v3)

TCP wrappers

Encryption

Access List

TCP wrappers

Configuration

On server

On client

On client

On client and On server

On client

Difficulty

Easy

Moderate

Hard

Hard

Moderate


SSH

Just about everyone should already have SSH running on their servers (except for those few who are still running telnet or, horror or horrors!, rsh). So it's safe to assume that you can immediately start using this communications method to check your clients. You will need to:

You can now set up your services.cfg in such a way that each remote service is checked like so:

define service{

   host_name remote-host

   service_description D_ROOT

   check_command check_disk_by_ssh!85!95!/

}

Your check command definition would look something like this:

define command {

   command_name check_disk_by_ssh

   command_line /usr/local/nagios/libexec/check_by_ssh -H $HOSTADDRESS$ -C "/usr/local/nagios/libexec/check_disk -w $ARG1$ -c $ARG2$ $ARG3$"

}

Working this way will allow you to do most of your configuring centrally (on the Nagios server), thus saving you a lot of work on each client system. All you have to do over there is make sure that there's a working user account and that all the scripts are in place. Quite convenient... The only drawback being that you're making a relatively open account which has full access to the system (sometimes even with sudo access).



NRPE

As a replacement for the SSH access method, Ethan also wrote the NRPE daemon. Using NRPE requires that you:

You can now set up your services.cfg in such a way that each remote service is checked like so:

define service{

   host_name remote-host

   service_description D_ROOT

   check_command check_nrpe!check_root

}

And in /usr/local/nagios/etc/nrpe.cfg on the client you would need to include:

command[check_root]=/usr/local/nagios/libexec/check_disk 85 95 /

Good thing is that you won't have a semi-open account lying about. Bad things are that, if you want to change the configuration of your client, you're going to have to login. And you're going to have yet another piece of software to keep up to date.



SNMP

Whoo boy! This is something I'm working on right now at $CLIENT and let me tell you: it's hard! At least much harder than I was expecting.

SNMP is a network management protocol used by the more advanced system administrators. Using SNMP you can access just about -any- piece of equipment in your server room to read statistics, alarms and status messages. SNMP is universal, extensible, but it is also quite complicated. Not for the faint of heart.

To make proper use of monitoring through SNMP you'll need to:

The reason why point C tells you to register a private EID, is because the SNMP tree has a very rigid structure. Technically speaking you -could- just plonk down your results at a random place in the tree, but it's likely that this will screw up something else at a later time. IANA allows each company to have only one private EID, so first check if your company doesn't already have one on the IANA list.

Ufortunately the check_snmp script that comes with Nagios isn't flexible enough to let you monitor custom SNMP objects in a nice way. This is why I wrote the retrieve_custom_nagios script, which is available from the menu. Your service definition would look like this:

define service{

   host_name remote-host

   service_description D_ROOT

   check_command retrieve_custom_snmp!.1.3.6.1.4.1.6886.4.1.4

}

And in this case your snmpd.conf would contain a line like this:

exec .1.3.6.1.4.1.6886.4.1.4 check_d_root /usr/local/nagios/libexec/check_disk -w 85 -c 95 /

Up to now things are actually not that different from using NRPE, are they? Well, that's because we haven't even started using all the -real- features of SNMP. Point is that using SNMP you can dig very deeply into your system to retrieve all kinds of useful information. And -that's- where things get complicated because you're going to have to dig up all the object IDs (OIDs) that you're going to need. And in some cases you're going to have to install vendor specific sub-agents that know how to speak to your specific hardware.

One of the best features of SNMP though are the so-called traps. Using traps the SNMP daemon will actively undertake action when something goes wrong in your system. So if for instance your hard disk starts failing, it is possible to have the daemon send out an alert to your Nagios server! Awesome! But naturally this will require a boatload of additional configuration :(

So... SNMP is an awesomely powerful tool, but you're going to have to pay through the nose (in effort) to get it 100% perfect.



SNMP traps

SNMP doesn't involve polling alone. SNMP enabled devices can also be configured to automatically send status updates do a so-call trap host. The downside to receiving SNMP traps with Nagios is that it takes quite a lot of work to get them into Nagios :D

To make proper use of monitoring through SNMP you'll need to:

There are -many- ways to get the SNMP traps translated for Nagios' purposes, 'cause there's many roads that lead to Rome. Unfortunately none of them are very easy to use.



NSCA

And finally there's NSCA. This daemon is usually used by distributed Nagios servers to send their results to the central Nagios server, which gathers them as so-called "passive checks". It is however entirely possible to install NSCA on each of your Nagios clients, which will then get called to send in the results of local checks. In this case you'll need to:

On your Nagios server things would look like this:

define service{

   host_name remote-host

   service_description D_ROOT

   check_command check_disk!85!95!/

   passive_checks_enable 1

   active_checks_enable 0

}

For the configuration on the client side I recommend that you read up on NSCA. It's a little bit too much to show over here.

The upside to this is that you won't have to run any daemon on your client to accept incoming connections. This will allow you to lock down your system in a hard way.


Naturally you are absolutely free to combine two or more of the methods described above. You could poll through NRPE and receive SNMP traps in one environment. This will have both ups and downs, but it's up to your own discretion. Use the tools that feel natural to you, or use those that are already standard in your environment.

I realise I've rushed through things a little bit, but I was in a slight hurry :) I will go over this article a second time RSN, to apply some polish.


kilala.nl tags: , , , ,

View or add comments (curr. 1)