nad-checks
is a Chef cookbook for installing additional Circonus nad
checks. nad
is the node agent daemon
,
a process that exposes scripts (sh, perl, c, etc) to an external port so that they can be aggregated on a
Circonus enterprise broker.
For more information, see http://circonus.com
- A way of installing nad. See the metadata file for more info.
This cookbook has only been tested on SmartOS.
This recipe will install checks for PostgreSQL replication status. It can be applied to a master or a hot standby (depending on PostgreSQL version), and will provide whatever information it can.
nad_checks.pg_replication.pg_version
- Array - 9.2 - the version of Postgres running. This determines what types of checks are available.nad_checks.pg_replication.user
- String - postgres - what user to connect with via psqlnad_checks.pg_replication.path_additions
- Array - /opt/local/bin - update this if yourpsql
account is in a different directory
nad_checks.pg_replication.9_1.master.host
- String,NilClass - nil - master IP or FQDN to get upstream xlog locationnad_checks.pg_replication.9_1.replica.host
- String - 127.0.0.1 - replica host to get applied xlog location
When running against a 9.1 version of Postgres, the easiest application is to set nad_checks.pg_replication.pg_version
to 9.1
and apply the recipe to each hot standby. The check will connect the master host to get its current xlog
location, then will connect locally to see the applied xlog location. The provided nad metrics will be:
pg_sql:0:replication:master_xlog
- string - the current xlog on the masterpg_sql:0:replication:replay_xlog
- string - the last applied xlog on the replicapg_sql:0:replication:bytes_diff
- signed double - number of bytes betweenmaster_xlog
andreplay_xlog
Note that depending on network settings (as well as the time it takes to run the various checks), bytes_diff
may be
negative at times.
When running a 9.2 version of Postgres, set nad_checks.pg_replication.pg_version
to 9.2
and apply to all hosts,
including the master. On the master (or hot standbys cascading replication to other standbys), the following metric will
become available for each replication host:
pg_replication:$host:master:xlog_sent_diff
- signed double - bytes different between the xlog sent to the standby and the xlog written
On a standby, the following metrics will be available:
pg_replication:0:replica:xlog_applied_diff
- signed double - bytes different between the xlog received and what has been successfully applied, is a measure of replication lag in bytes.pg_replication:0:replica:time_since_commit
- signed double - seconds since the last commit or rollback applied, this is a measure of the replication lag in time, though it is not exact.
If xlog_sent_diff
grows on the master, but xlog_applied_diff
is low, then this implies that there is latency in the
network. If xlog_sent_diff
is low but xlog_applied_diff
is high, then something on the standby is introducing latency
in applying wal logs (such as disk write time or disk wait).