From collectd Wiki
Revision as of 15:57, 17 May 2011 by Tokkee (talk | contribs) (suggestions by yann2)

Jump to: navigation, search

This is a small tool to interact with the monitoring suite Nagios. It reads values from collectd using the unixsock plugin and compares them with the specified ranges. collectd-nagios then terminates with an error code according to the Nagios plugin development guidelines.

The following includes a few suggested improvements open for discussion:

Querying values

As of now (version 5.0), a single dataset (possibly composed of multiple data-sources) may be queried from collectd. This is fairly unflexible. E.g., it does not allow to check the percentage of used space on a disk (starting with collectd 5.0 which uses multiple datasets for this information).

The idea is to support a more flexible syntax when specifying a dataset using the -n (value spec) option:

  • mark -d (data-source) as deprecated and, optionally, append the data-source to the value spec (-n option) separated by another slash, e.g. load/load/midterm
  • support basic arithmetic operations, e.g. df-root/df_complex-used / (df-root/df_complex-used + df-root/df_complex-free); note: backward compatibily is preserved as specifying a single dataset still works just like before
    • using / as divisor might cause problems; another approach might be to use functions like DIV, SUM, etc. --yann2 on IRC
  • -d is forbidden if -n does not specify a single dataset
  • support simple functions like MIN, MAX, ...

Output formatting

As of now (version 5.0), the output of collectd-nagios is hard-wired into the tool, e.g. OKAY: 0 critical, 0 warning, 3 okay. In a lot of cases, some more information (to be displayed in the Nagios frontend) might be desirable (especially when querying multiple values.

The idea is to make the output configurable through a configuration file and by specifing a format string with placeholders. The syntax might look like the following:

 <Service "disk_percent-root">
   # none of the following options are required
   Output "DISK {status} - free space /: {value:df-root/df_complex-free} ({value}%)"
   PerfData "'/'={value}:{warn}:{crit}:0:{value:df-root/df_complex-free + df-root/df_complex-used}"
   # may be overwritte on the command line:
   Query "df-root/df_complex-used / (df-root/df_complex-used + df-root/df_complex-free)"
   Warn "80"
   Crit "90"
 <Host "hostname">
   # settings applying to a specific host only
   <Service "foo">
     # ...

The service name may then be specified using a newly added command line option.