Skip to content

larsks/check_ganglia

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

check_ganglia

Author: Lars Kellogg-Stedman
Email:lars@seas.harvard.edu

This is a Nagios plugin that checks values collected by Ganglia. It can poll either gmond or use the interactive query interface provided by gmetad.

Copyright (C) 2010 Lars Kellogg-Stedman

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.

-h, --help show this help message and exit
-w WARN, --warn=WARN
 Warn threshold.
-c CRITICAL, --critical=CRITICAL
 Critical threshold.
-v, --verbose Make output more verbse.
-g GANGLIA_SERVER, --ganglia-server=GANGLIA_SERVER
 Address of gmond/gmetad host.
-H HOST, --host=HOST
 Host for which we want metrics.
-l, --list List available metrics on the target host.
-m METRIC, --metric=METRIC
 Metric to compare against threshold values.
-X EXPRESSION, --expression=EXPRESSION
 Expression to compare against threshold values.
-q, --query Use gmetad query interface instead of gmond.
-C CLUSTER, --cluster=CLUSTER
 Cluster name for gmetad query.
-x EXTRA_METRICS, --extra-metrics=EXTRA_METRICS
 Additional metrics to return as performance data.
-M MISSING, --missing=MISSING
 Exit status on connection failure, missing host or missing metric (default WARN).
-p PORT, --port=PORT
 Port on which to communicate w/ gmond/gmetad

Generate a WARN status if cpu_wio is >= 30% or CRITICAL if cpu_wio is >= 50%:

check_gmond -H www.example.com -m cpu_wio -w 30 -c 50

Generate a WARN status if cpu_idle is < 70 or CRITICAL if cpu_idle < 50:

check_gmond -H www.example.com -m cpu_idle -w :70 -c :50

Generate a CRITICAL status if os_relase is not "2.6.32.9-70.fc12.i686":

check_gmond -H www.example.com -m os_release -c "!2.6.32.9-70.fc12.i686"

Gmetad provides an interactive query interface that allows for efficiently fetching a subtree of the XML data. For environments with large numbers of hosts this offer a substantial performance advantage.

Use the --query flag to activate gmetad support. In additional to the parameters you provide when using gmond, you will also need to provide the appropriate Ganglia cluster name with --cluster (-C). For example:

check_gmond -q -C 'HPC Monitoring' \
  -H www.example.com -m cpu_wio -w 30 -c 50

If you fail to provide a cluster name or if you mistype the cluster name, gmetad will behave essentially just like gmond -- that is, it will dump the entire XML tree.

You can include additional metrics as performance data in the check result using the '-x' flag. This is useful if you are using Nagios/Icinga to process performance data (e.g, using pnp4nagios).

For example:

# check_ganglia -q -C 'My cluster' -H host.example.com \
  -m cpu_wio -x cpu_idle -x cpu_aidle -x cpu_nice \
  -x cpu_user -x cpu_system
cpu_wio OKAY: 1.8 | cpu_wio=1.8; cpu_idle=92.7; cpu_aidle=90.0;
  cpu_nice=0.0; cpu_user=1.0; cpu_system=4.5;

(Notice that the output has been wrapped here for display purposes, but will actually show up all on one line).

In some cases, the value provided by Ganglia is not, by itself, to meet your monitoring needs. You can ask check_ganglia to evaluate an arbitrary Python expression to compute the value of a metric with the --expression option. The host dictionary is available to this expression, the keys of which are the values available from Ganglia.

For example, if we want to adjust the value of load_five by dividing it by the number of cores in the system, we could call check_ganglia like this:

check_ganglia -q -C 'My cluster' -H host.example.com \
  -m load_five --expression 'host["load_five"]/host["cpu_num"]'

(This is extracted from check_gmond.checkval; see the embedded documentation for the most current version).

The arguments to the -w and -c options use the following syntax:

  • 5 -- match if v >= 5
  • 3:5 -- match if 3 <= v <= 5
  • :5 -- match if v <=5
  • 1,2,3 -- match if v in (1,2,3)
  • foo -- match if v == foo
  • foo,bar -- match if v in (foo, bar)

You can negate a threshold expression by preceding it with '!'. For example:

  • !5 -- match if v < 5
  • !3:5 -- match if v<3 || v>5
  • !1,2,3 -- match if v not in (1,2,3)

About

Nagios plugin for checking values in gmond

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages