« Themes for Thingamacomment | Main | Updates to TEMPer Articles »

Sunday, March 21, 2010

Service Monitoring with the TEMPer USB thermistor

In the first part of this tutorial, we looked at the perl modules for the TEMPer device. I also wanted a way to use this for monitoring and logging, and as I don't happen to be proficient in perl, I decided to write a bash script that will call the perl script. In this part, we'll look at the bash script, a php program that can act as a central logger or notification tool, and also how to use the bash script as a plugin for nagios. Please keep in mind that this document is written for Unbuntu 9.10, this may also work with other operating systems and versions, but this isn't tested elsewhere. Also, please note that this is a revised version, I've made some significant improvements, I think.

Bash: check_temper

The bash script is named check_temper (to mesh with nagios's naming convention, and is listed here. First off, download the bash script and check the script:

wget -O check_temper http://www.cs.unc.edu/~hays/dev/bash/temper/check_temper
chmod a+x check_temper
sudo ./check_temper 

You'll likely get an error like this:

/usr/local/bin/temper_mon.pl not found

check_temper looks for the perl script, temper_mon.pl in /usr/local/bin/, which is the standard local for user installed software on most linux systems. In the first part of this, we left the temper_mon.pl file in the home dir. It would be better to have check_temper in /usr/local/bin/ also, so let's move that:

sudo cp temper_mon.pl /usr/local/bin

Now try the check_temper file again:

sudo ./check_temper 

You should get back something like:

Critical: Temperature is 78.8 F

If you take a look at the script, you'll see that there are some variables at the top that define what a warning temperature is, and what a critical temperature is. You can also control whether output is in fahrenheit or celcius.

If you run the program again, and then check the error code, you should see that a critical temp reports an error code of 2, a warning an error code of 1 (2 is also used if something is not found or didn't work), and 0 if the temp is low enough. For example:

bil@test:~$ sudo ./check_temper 
Critical: Temperature is 78.8 F
bil@test:~$ echo $?
2

Now copy check_temper to /usr/local/bin so it's available on the path:

sudo mv check_temper /usr/local/bin

The way the script works, it looks at the output from the temper_mon.pl script, and compares the temperature to the warning and critical values. If the temperature is below the warning level, the program exits with a 0 error code. Anything between the two generates a 1 error code, and if the temperature is above the critical level, the program exits with a 2 error code. This behavior is consistent with nagios as we will see, but it also allows you to use this script with other programs.

PHP Monitor

If you look in the variable section of check_temper, you'll see that you can specify a url. The idea here is that the script can pass data to a program on a web server, which will then do some logging or send an alert. The bash script uses wget to GET a web page from the server and passes variables to the server, specifically the temperature, error code, and the message. If you don't want to use this feature, you can just leave that line commented out.

I've written a pretty simple php program that can process the data passed in the GET. What it does is take in the data, and then writes out a log file with the ip address of the machine that send the get, with a time and data stamp, the ip address, the temperature, the error code, and the message. I'm not going to tell you how to set up a php server, but it's not that hard to do. But when you do, do please make sure to restrict access to it if you use this program, it doesn't do much sanitizing of user data, and since it does send email, I'm not going to claim that it's secure. All we want here is for a machine to be able to report to a server, so restricting access by password and ip numbers should be sufficient. You can download a zip of the php directory I use. The package includes an emailer function I use, and a file directory where the logs get written--you need to make sure the web server can write to that directory.

If the temperature is too high, it will also write out two additionals files, a WARNING file that acts as a log of all the warnings seen for that IP address, and an ALERT file that acts as a marker for when a warning was sent. If the ALERT file isn't stale, no email is sent (this allows one to limit the number of warnings sent via email.) The interval that defines staleness is in the variable section of the php program. There's also a default high temperature setting, and a case statement that allows one to tailor temperatures for other devices.

Now, this php program is a bit unusual--it does not put out any html. There are some echos and whatnot that are commented out, you can uncomment these and call the page with a get statement to test things out to get it working. But the program is really just designed to act as a logger/warning. Of course, at some point I'll write a php program to read the data being logged, but that's a job for laterman. What the program does do is to write out a file in the files directory with the ip number of the host reporting as the file name, and the data (timestamp, ip, temp, error code and message) in tab delimited format.

So, if you want to use just this php program as your monitor, that should work fine--what you'd do is set up a cron or an upstart to call this however often you want it to run. If you're new at this stuff, the cron's the easier way to go.

Nagios

The way the bash program is written, it can also serve as a nagios plugin via nrpe. This will not do you any good unless you already have a nagios server setup and running. If you don't know what I'm talking about, you can either run off and learn nagios, or just skip this part. Nagios isn't trivial, but I'll wait here until you come back.

To use this, first install the nrpe server for nagios:

sudo apt-get install nagios-nrpe-server
On a system that's bound to an external NIS database for users and groups I got the following error:
Unpacking nagios-nrpe-server (from .../nagios-nrpe-server_2.12-3ubuntu1_i386.deb) ...
addgroup: The group `nagios' already exists as a system group. Exiting.
usermod: user 'nagios' does not exist in /etc/passwd

I imagine that Any system bound to an external user system such as LDAP might get a similar message, this is a known bug. So I added:

nagios:x:114:121::/var/lib/nagios:/bin/false

as the last line in /etc/passwd, since that's the line that this installation puts into that file on a standalone machine. That let me run the installation, and afterwards, I just removed that line. I don't think this will hurt anything, but if you know better, please let me know. You won't need to do this on a stand along machine, the nrpe server install will create the nagios account for you.

To make things easier to troubleshoot, we'll run our nagios plugin as a user named nrpe. The reason for this is that the nagios nrpe debugging information is a bit sparse, and for troubleshooting, it's handy to be able to login to the same account as nagios will use to run the plugin.

sudo adduser nrpe

Next link the check_temper file to the nagios plugins directory:

sudo ln -s /usr/local/bin/check_temper /usr/lib/nagios/plugins/ 

Make sure to double check the permissions to make sure it's only writable by root. Now, edit the /etc/nagios/nrpe_local.cfg file:

sudo vi /etc/nagios/nrpe_local.cfg

and add this line (and please note this is a change from the previous verson of this article, I found that using the sudo command prefix in the nrpe.cfg file interferes with other plugins):

command[check_temper]=sudo /usr/lib/nagios/plugins/check_temper

This tells the nrpe service to run /usr/lib/nagios/plugins/check_temper when nagios contacts it and asks for check_temper. Please note, you can also make or use a directory, /etc/nagios/nrpe.d and put your config file there--if there are any files that are in that directory, nrpe will try to use them. This is useful if you're adminning multiple machines with similar configurations.

Next, edit the config file /etc/nagios/nrpe.cfg

sudo vi /etc/nagios/nrpe.cfg

Replace with the IP address of localhost that is running the Nagios monitoring server:

#allowed_hosts=127.0.0.1
allowed_hosts=your.nagios.server.address 
Then comment out the nrpe=nagios line and add a line making the nrpe user to nrpe:
#nrpe_user=nagios
nrpe_user=nrpe
And then set the group:
#nrpe_group=nagios
nrpe_group=nrpe

Also, uncomment the line command_prefix=/usr/bin/sudo Don't do this after all, it's easier to use the sudo command in the .cfg file.

# This lets the nagios user run all commands in that directory (and only them)
# without asking for a password.  If you do this, make sure you don't give
# random users write access to that directory or its contents!
#command_prefix=/usr/bin/sudo 

If you running nagios and this temperature monitor on the same machine you can uncomment the server_address=127.0.0.1 line:

Finally, check the bottom of the file to make sure that the include for the nrpe_local.cfg is enabled:

# local configuration:
#       if you'd prefer, you can instead place directives here
include=/etc/nagios/nrpe_local.cfg

Next, open the /etc/nagios/nrpe_local.cfg file:

sudo vi /etc/nagios/nrpe_local.cfg
And add the line telling nrpe where check_temper is located:
command[check_temper]=/usr/lib/nagios/plugins/check_temper

Then save the file and restart the nrpe service:

sudo /etc/init.d/nagios-nrpe-server restart

If you're new to nagios, you might look into other commands that work through npre. For example, you can use npre to check load:

command[check_light_load]=/usr/lib/nagios/plugins/check_load -w 6.00,6.00,6.00 -c 10.00,10.00,10.00

or look for zombies:

command[check_zombies]=/usr/lib/nagios/plugins/check_procs -s Z -w 3 -c 6

Next, softlink the check_temper file to the /usr/lib/nagios/plugins dir:

sudo ln -s /usr/local/bin/check_temper /usr/lib/nagios/plugins

So far so good. Open a ssh connection to your nagios server. If you don't have nrpe support, you need to install it:

sudo apt-get install nagios-nrpe-plugin

Now try running the check_nrpe plugin from the nagios server, pointing it at your temperature monitor:

/usr/lib/nagios/plugins/check_nrpe -H test -c check_temper

You should get something like this:

-sh-3.2$ /usr/lib/nagios/plugins/check_nrpe -H test -c check_temper
check_temper failed to log temp F

That's ok, we know from this that the check_temper program got called, and somehow failed.

The reason it didn't report back a temperature is probably because the check_temper isn't allowed to run in privileged mode, and thus cannot query the device.

If, on the other hand, you get:

CHECK_NRPE: Error - Could not complete SSL handshake.

that probably means you didn't restart the nrpe service on your remote system, or didn't add your nagios server as an allowed host.

Or, if you get:

NRPE: Command 'check_temper' not defined

it's probably a problem with the line in the nrpe_local.cfg file.

What we need to do now is added nrpe to the sudoers file so that it can run commands without a password. To do this we run the visudo command:

sudo visudo -f /etc/sudoers

Add this line to the file at the bottom and try it out:

nrpe  ALL=(ALL) NOPASSWD: /usr/lib/nagios/plugins/

Now, log in as nrpe, and try to run the plugin with out a password:

sudo /usr/lib/nagios/plugins/check_temper

It should run without you having to provide a password. Should's a big word though. If it asks for a password, double check your /etc/sudoers file.

Now, go back to your nagios server and try calling the plugin again:

/usr/lib/nagios/plugins/check_nrpe -H test -c check_temper

Hopefully, this time you got good output.

Now, what you should probably do is go back to the /etc/nagios/nrpe.cfg file and change the user and group from nrpe to nagios, so that your installation is in line with a standard installation. If you do this, you'll also need to use visudo to edit the /etc/sudoers file removing nrpe's privileges and grant the same priviledges to nagios. But if you don't want to bother, at least disable nrpe's ability to login and use a shell, so as to safeguard your system.

At this point all you need to do is add this service into the service.cfg and host.cfg files in /etc/nagios.

Example of host.cfg entry:

define host{
use                     generic-server
host_name               test.unc.edu
alias                   TEST
address                 test.unc.edu
}

Example of services.cfg entry:

define service{
use                             generic-service
host_name                       test.unc.edu
service_description             CHECK-TEMPERATURE
contact_groups                  server-admins
check_command                   check_nrpe!check_temper
}

So there you have it, a simple set of ways to leverage the USB TEMPer module to set up an alert system. Please let me know if you run into any problem with this, and best of luck.

Posted by bil at 4:38 PM
Edited on: Saturday, May 08, 2010 11:15 AM
Categories: My Software, Other Software, Work
Comment by Joseph Dyland - Thursday 29th April 2010 11:40:15 PM

Great post, Very helpful and informative, I had purchased two of these units and was at a standstill for quite a while.

Would you happen to have any tips on getting the temperature to populate the performance data field in Nagios?, I would love to be able to graph temperature overtime and am so close now!
Comment by bil - Friday 30th April 2010 08:36:53 AM

I'll look into it, thanks,
Comment by Anonymous Coward - Friday 30th April 2010 08:51:36 AM

I ended up figuring it out, I found that performance data is taken from any thing after |

So I simply added a pipe to the area where the message is being printed out, adding temperature= then the variable of the temperature it self, followed by minimum and maximum values, It is now working exactly as I have wanted!

message="${message} Sensor ${iteration} Temp is ${x} ${temp_ind}|temperature=${x};55;60"

In nagios it looks like this,

Status Information: Normal: Sensor 0 Temp is 25.5 C
Performance Data: temperature=25.5;55;60

Now I do have one more question, Do you know how I could add a second TEMPer sensor using the methods you have provided?

Comment by bil - Friday 30th April 2010 09:41:45 AM

I haven't written up the docs yet, but I updated the check_temper bash script to support multiple sensors. So if you download it again, it should just work with 2-3 sensors.
Comment by Riccardo - Wednesday 08th September 2010 09:48:22 AM

Great Post !
Only a note, the PHP directory structure zip file is not more available !

Bye
Riccardo
Comment by bil - Thursday 16th September 2010 01:35:27 PM

Thanks for letting me know, it's up now. Expect a new version in a few weeks.. ..