Difference between revisions of "Collectd-nagios"

From collectd Wiki
Jump to: navigation, search
(output format: some more config options)
(suggestions by yann2)
Line 11: Line 11:
 
* mark -d (data-source) as deprecated and, optionally, append the data-source to the value spec (-n option) separated by another slash, e.g. <code>load/load/midterm</code>
 
* mark -d (data-source) as deprecated and, optionally, append the data-source to the value spec (-n option) separated by another slash, e.g. <code>load/load/midterm</code>
 
* support basic arithmetic operations, e.g. <code>df-root/df_complex-used / (df-root/df_complex-used + df-root/df_complex-free)</code>; note: backward compatibily is preserved as specifying a single dataset still works just like before
 
* support basic arithmetic operations, e.g. <code>df-root/df_complex-used / (df-root/df_complex-used + df-root/df_complex-free)</code>; note: backward compatibily is preserved as specifying a single dataset still works just like before
 +
** using <code>/</code> as divisor might cause problems; another approach might be to use functions like <code>DIV</code>, <code>SUM</code>, etc. --yann2 on IRC
 
* -d is forbidden if -n does not specify a single dataset
 
* -d is forbidden if -n does not specify a single dataset
 
* support simple functions like <code>MIN</code>, <code>MAX</code>, ...
 
* support simple functions like <code>MIN</code>, <code>MAX</code>, ...

Revision as of 15:57, 17 May 2011

This is a small tool to interact with the monitoring suite Nagios. It reads values from collectd using the unixsock plugin and compares them with the specified ranges. collectd-nagios then terminates with an error code according to the Nagios plugin development guidelines.

The following includes a few suggested improvements open for discussion:

Querying values

As of now (version 5.0), a single dataset (possibly composed of multiple data-sources) may be queried from collectd. This is fairly unflexible. E.g., it does not allow to check the percentage of used space on a disk (starting with collectd 5.0 which uses multiple datasets for this information).

The idea is to support a more flexible syntax when specifying a dataset using the -n (value spec) option:

  • mark -d (data-source) as deprecated and, optionally, append the data-source to the value spec (-n option) separated by another slash, e.g. load/load/midterm
  • support basic arithmetic operations, e.g. df-root/df_complex-used / (df-root/df_complex-used + df-root/df_complex-free); note: backward compatibily is preserved as specifying a single dataset still works just like before
    • using / as divisor might cause problems; another approach might be to use functions like DIV, SUM, etc. --yann2 on IRC
  • -d is forbidden if -n does not specify a single dataset
  • support simple functions like MIN, MAX, ...

Output formatting

As of now (version 5.0), the output of collectd-nagios is hard-wired into the tool, e.g. OKAY: 0 critical, 0 warning, 3 okay. In a lot of cases, some more information (to be displayed in the Nagios frontend) might be desirable (especially when querying multiple values.

The idea is to make the output configurable through a configuration file and by specifing a format string with placeholders. The syntax might look like the following:

 <Service "disk_percent-root">
   # none of the following options are required
   Output "DISK {status} - free space /: {value:df-root/df_complex-free} ({value}%)"
   PerfData "'/'={value}:{warn}:{crit}:0:{value:df-root/df_complex-free + df-root/df_complex-used}"
   # may be overwritte on the command line:
   Query "df-root/df_complex-used / (df-root/df_complex-used + df-root/df_complex-free)"
   Warn "80"
   Crit "90"
 </Service>
 
 <Host "hostname">
   # settings applying to a specific host only
   <Service "foo">
     # ...
   </Service>
 </Host>

The service name may then be specified using a newly added command line option.