Nagios script: check_log2

2006-06-01 00:00:00

This script was written at the time I was hired by UPC / Liberty Global.

Improved log checker for Solaris, with state retention.

I found that the version of check_log included in the default monitor package doesn't work perfectly on Solaris: it needs a bit of tweaking... Which is what I've done for the script.

Also, I've added state retention. It's a bit of a hack, but hey! I needed a quick solution.

The original script sends a Critical when it detects the string you've queried the log file for, but it clears that same Critical immediately if the same message is not repeated once the monitor runs again. Meaning that, if there are no updates to your log file, the Critical will only be around until the next time the monitor runs.

Not very handy if the Critical occurs during the night.

This new version of the script creates a file called $oldlog.STATE in /usr/local/nagios/var (which should be 755, nagios:nagios), which contains the exit status for the last detected _changed_ status... If there are no changes detected in your log file, this old exit state is repeated.

The script has been tested on Solaris 8, Mac OS X 10.4 and Redhat ES3.

UPDATE 19/06/2006:

Cleaned up the script a bit and added some checks that are considered the Right Thing to do. Should have done this -way- earlier!

Also stomped out a few horrendous bugs! I'm very sorry for putting out such a buggy script earlier... If you've started using the script in your environment, please download the latest version. Thanks to Ali Khan for pointing out these mistakes.


#!/bin/bash
#
# Log file pattern detector plugin for Nagios
# Written by Ethan Galstad (nagios@nagios.org)
# Last Modified: 07-31-1999
# Updated by Thomas Sluyter (nagiosATkilalaDOTnl)
# Last Modified: 19-06-2006
#
# Usage: ./check_log2 -F log_file -O old_log_file -Q pattern
#
# Description:
#
# This plugin will scan a log file (specified by the log_file option)
# for a specific pattern (specified by the pattern option).  Successive
# calls to the plugin script will only report *new* pattern matches in the
# log file, since an copy of the log file from the previous run is saved
# to old_log_file.
#
# Output:
#
# On the first run of the plugin, it will return an OK state with a message
# of "Log check data initialized".  On successive runs, it will return an OK
# state if *no* pattern matches have been found in the *difference* between the
# log file and the older copy of the log file.  If the plugin detects any 
# pattern matches in the log diff, it will return a CRITICAL state and print
# out a message is the following format: "(x) last_match", where "x" is the
# total number of pattern matches found in the file and "last_match" is the
# last entry in the log file which matches the pattern.
#
# Notes:
#
# If you use this plugin make sure to keep the following in mind:
#
#    1.  The "max_attempts" value for the service should be 1, as this
#        will prevent Nagios from retrying the service check (the
#        next time the check is run it will not produce the same results).
#
#    2.  The "notify_recovery" value for the service should be 0, so that
#        Nagios does not notify you of "recoveries" for the check.  Since
#        pattern matches in the log file will only be reported once and not
#        the next time, there will always be "recoveries" for the service, even
#        though recoveries really don't apply to this type of check.
#
#    3.  You *must* supply a different old_file_log for each service that
#        you define to use this plugin script - even if the different services
#        check the same log_file for pattern matches.  This is necessary
#        because of the way the script operates.
#
#    4.  Changes to the script were made by Thomas Sluyter (nagios@kilala.nl).
#	 The first set of changes will allow the script to run properly on Solaris, which
#	 it did not do by default. The second set of changes will allow the following:
#	 * State retention. If a NOK was generated at point A in time and it is not repeated
# 	   at A+1, then an OK is sent to Nagios. Not something that you would like to happen.
#	   I've added the $oldlog.STATE trigger file which retains the last exitstatus. Should
# 	   there be no new lines added to the log, check_log will simply repeat the last state
#	   instead of give an OK.
#
# Examples:
#
# Check for login failures in the syslog...
#
#   check_log -F /var/log/messages -O /usr/local/nagios/var/check_log.badlogins.old -Q "LOGIN FAILURE"
#
# Check for port scan alerts generated by Psionic's PortSentry software...
#
#   check_log -F /var/log/messages -O /usr/local/nagios/var/check_log.portscan.old -Q "attackalert"
#

# Paths to commands used in this script.  These
# may have to be modified to match your system setup.

PATH="/usr/bin:/usr/sbin:/bin:/sbin"

PROGNAME=`basename $0`
PROGPATH=`echo $0 | sed -e 's,[\\/][^\\/][^\\/]*$,,'`

#. $PROGPATH/utils.sh
. /usr/local/nagios/libexec/utils.sh

print_usage() {
    echo "Usage: $PROGNAME -F logfile -O oldlog -Q query"
    echo "Usage: $PROGNAME --help"
}

print_help() {
    echo ""
    print_usage
    echo ""
    echo "Log file pattern detector plugin for Nagios"
    echo ""
    support
}

# Make sure the correct number of command line
# arguments have been supplied

if [ $# -lt 6 ]; then
    print_usage
    exit $STATE_UNKNOWN
fi

# Grab the command line arguments

exitstatus=$STATE_WARNING #default
while test -n "$1"; do
    case "$1" in
        --help)
            print_help
            exit $STATE_OK
            ;;
        -h)
            print_help
            exit $STATE_OK
            ;;
        -F)
            logfile=$2
            shift
            ;;
        -O)
            oldlog=$2
            shift
            ;;
        -Q)
            query=$2
            shift
            ;;
        *)
            echo "Unknown argument: $1"
            print_usage
            exit $STATE_UNKNOWN
            ;;
    esac
    shift
done

# If the source log file doesn't exist, exit

if [ ! -e $logfile ]; then
    echo "Log check error: Log file $logfile does not exist!"
    exit $STATE_UNKNOWN
    echo $STATE_UNKNOWN > $oldlog.STATE
fi

# If the oldlog file doesn't exist, this must be the first time
# we're running this test, so copy the original log file over to
# the old diff file and exit

if [ ! -e $oldlog ]; then
    cat $logfile > $oldlog
    if [ `tail -1 $logfile | grep -i $query | wc -l` -gt 0 ]
    then
        echo "Log check data initialized... Last line contained error message."
        echo $STATE_CRITICAL > $oldlog.STATE
	exit $STATE_CRITICAL
    else
        echo "Log check data initialized..."
        echo $STATE_OK > $oldlog.STATE
        exit $STATE_OK
    fi
fi

# A bug which was caught very late:
# If newlog is shorter than oldlog, the diff used below will return
# false positives for the query because the will be in $oldlog. Why?
# Because $oldlog is not rolled over / rotated, like $newlog. I need 
# to fix this in a kludgy way.

if [ `wc -l $logfile|awk '{print $1}'` -lt `wc -l $oldlog|awk '{print $1}'` ]
then
    rm $oldlog
    cat $logfile > $oldlog
    if [ `tail -1 $logfile | grep -i $query | wc -l` -gt 0 ]
    then
        echo "Log check data re-initialized... Last line contained error message."
        echo $STATE_CRITICAL > $oldlog.STATE
	exit $STATE_CRITICAL
    else
        echo "Log check data re-initialized..."
        echo $STATE_OK > $oldlog.STATE
        exit $STATE_OK
    fi
fi

# Everything seems fine, so compare it to the original log now

# The temporary file that the script should use while
# processing the log file.
if [ -x mktemp ]; then
    tempdiff=`mktemp /tmp/check_log.XXXXXXXXXX`
else
    tempdate=`/bin/date '+%H%M%S'`
    tempdiff="/tmp/check_log.${tempdate}"
    touch $tempdiff
fi

diff $logfile $oldlog > $tempdiff

if [ `wc -l $tempdiff|awk '{print $1}'` -eq 0 ]
then
     rm $tempdiff
     touch $oldlog.STATE
     exitstatus=`cat $oldlog.STATE`
     echo "LOG FILE - No status change detected. Status = $exitstatus"
     exit $exitstatus
fi

# Count the number of matching log entries we have
count=`grep -c "$query" $tempdiff`

# Get the last matching entry in the diff file
lastentry=`grep "$query" $tempdiff | tail -1`

rm -f $tempdiff
cat $logfile > $oldlog

if [ "$count" = "0" ]; then # no matches, exit with no error
    echo "Log check ok - 0 pattern matches found"
    exitstatus=$STATE_OK
else # Print total matche count and the last entry we found
#    echo "($count) $lastentry"
    echo "Log check NOK - $lastentry"
    exitstatus=$STATE_CRITICAL
    echo $STATE_CRITICAL > $oldlog.STATE
fi

exit $exitstatus


echo "Starting clean"
rm /tmp/foobar /usr/local/nagios/var/foobar*
/usr/local/nagios/libexec/check_log2 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -Q neko
echo $?
echo ""

echo "Starting normally"
echo "normal"
echo "normal" >> /tmp/foobar
/usr/local/nagios/libexec/check_log2 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -Q neko
echo $?
echo ""
echo "normal"
echo "normal" >> /tmp/foobar
/usr/local/nagios/libexec/check_log2 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -Q neko
echo $?
echo ""
echo "critical"
echo "neko" >> /tmp/foobar
/usr/local/nagios/libexec/check_log2 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -Q neko
echo $?
echo ""
echo "normal"
echo "baka" >> /tmp/foobar
/usr/local/nagios/libexec/check_log2 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -Q neko
echo $?
echo ""

echo "Log rotation with crit"
rm /tmp/foobar
/usr/local/nagios/libexec/check_log2 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -Q neko
echo $?
echo ""
echo "critical"
echo "neko" >> /tmp/foobar
/usr/local/nagios/libexec/check_log2 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -Q neko
echo $?
echo ""
echo "normal"
echo "baka" >> /tmp/foobar
/usr/local/nagios/libexec/check_log2 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -Q neko
echo $?
echo ""

echo "Normal log rotation"
rm /tmp/foobar
/usr/local/nagios/libexec/check_log2 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -Q neko
echo $?
echo ""
echo "normal"
echo "baka" >> /tmp/foobar
/usr/local/nagios/libexec/check_log2 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -Q neko
echo $?
echo ""
echo "normal"
echo "baka" >> /tmp/foobar
/usr/local/nagios/libexec/check_log2 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -Q neko
echo $?
echo ""


kilala.nl tags: , , ,

View or add comments (curr. 2)