Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add node metrics provided by management API #55

Merged
merged 1 commit into from
Sep 26, 2018

Conversation

dcorbacho
Copy link
Contributor

@dcorbacho dcorbacho commented Sep 25, 2018

Some users (see #37 and #13) requested node metrics such as file descriptors, sockets, memory and disk alarms to be available in Prometheus.

All these are accessible through rabbit_mgmt_db:augment_nodes, and this PR allows them to be displayed as part of prometheus_rabbitmq_nodes_collector data. Detailed node stats are disabled by default, and can be configured to local or all. local will provide the detailed stats from the node that is being queried and all for the whole cluster. The last setting is only recommended if metrics are being pulled from a single node. When all nodes in the cluster are configured as prometheus targets, local should be used.

{prometheus, [{rabbitmq_exporter, [{detailed_node_stat_enabled, local}]}]}

Memory and disk alarms can be configured in prometheus as rules, such as:

groups:
- name: example
  rules:
  - alert: NodeMemAlarm
    expr: rabbitmq_node_mem_used > rabbitmq_node_mem_limit
    labels:
     severity: page
    annotations:
     summary: Memory alarm

mem_alarm and disk_free_alarm can be used instead, which display a boolean indicating whether an alarm is active in the cluster.

This PR might also reduce the need for prometheus_process_collector for monitoring, or provide an alternative in some situations, which can be problematic because of the inclusion of NIFS (see issue #12).

Closes 37 and 13.

Example of scrape:

# TYPE rabbitmq_node_up untyped
# HELP rabbitmq_node_up Node runnning status
rabbitmq_node_up{name="rabbit2@mars",type="disc"} 1
rabbitmq_node_up{name="rabbit@mars",type="disc"} 1
# TYPE rabbitmq_node_partitions gauge
# HELP rabbitmq_node_partitions Partitions detected in the cluster.
rabbitmq_node_partitions{name="rabbit@mars",type="disc"} 0
# TYPE rabbitmq_node_fd_total gauge
# HELP rabbitmq_node_fd_total File descriptors available.
# TYPE rabbitmq_node_sockets_total gauge
# HELP rabbitmq_node_sockets_total Sockets available.
# TYPE rabbitmq_node_mem_limit gauge
# HELP rabbitmq_node_mem_limit Memory usage high watermark.
# TYPE rabbitmq_node_mem_alarm untyped
# HELP rabbitmq_node_mem_alarm Set to 1 if a memory alarm is in effect in the node.
# TYPE rabbitmq_node_disk_free_limit gauge
# HELP rabbitmq_node_disk_free_limit Free disk space low watermark.
# TYPE rabbitmq_node_disk_free_alarm untyped
# HELP rabbitmq_node_disk_free_alarm Set to 1 if a memory alarm is in effect in the node.
# TYPE rabbitmq_node_proc_total gauge
# HELP rabbitmq_node_proc_total Erlang processes limit.
# TYPE rabbitmq_node_uptime counter
# HELP rabbitmq_node_uptime Time in milliseconds since node start.
# TYPE rabbitmq_node_run_queue gauge
# HELP rabbitmq_node_run_queue Runtime run queue.
# TYPE rabbitmq_node_processors gauge
# HELP rabbitmq_node_processors Logical processors.
# TYPE rabbitmq_node_net_ticktime gauge
# HELP rabbitmq_node_net_ticktime Network tick time between pairs of Erlang nodes.
# TYPE rabbitmq_node_mem_used gauge
# HELP rabbitmq_node_mem_used Memory used in bytes
# TYPE rabbitmq_node_fd_used gauge
# HELP rabbitmq_node_fd_used File descriptors used.
# TYPE rabbitmq_node_sockets_used gauge
# HELP rabbitmq_node_sockets_used Sockets used.
# TYPE rabbitmq_node_proc_used gauge
# HELP rabbitmq_node_proc_used Erlang processes used.
# TYPE rabbitmq_node_disk_free gauge
# HELP rabbitmq_node_disk_free Disk free in bytes
# TYPE rabbitmq_node_gc_num counter
# HELP rabbitmq_node_gc_num GC runs.
# TYPE rabbitmq_node_gc_bytes_reclaimed counter
# HELP rabbitmq_node_gc_bytes_reclaimed Bytes reclaimed by GC.
# TYPE rabbitmq_node_context_switches counter
# HELP rabbitmq_node_context_switches Context switches since node start.
# TYPE rabbitmq_node_io_read_count counter
# HELP rabbitmq_node_io_read_count Read operations since node start.
# TYPE rabbitmq_node_io_read_bytes counter
# HELP rabbitmq_node_io_read_bytes Bytes read since node start.
# TYPE rabbitmq_node_io_read_avg_time gauge
# HELP rabbitmq_node_io_read_avg_time Average time of read operations.
# TYPE rabbitmq_node_io_write_count counter
# HELP rabbitmq_node_io_write_count Write operations since node start.
# TYPE rabbitmq_node_io_write_bytes counter
# HELP rabbitmq_node_io_write_bytes Bytes written since node start.
# TYPE rabbitmq_node_io_write_avg_time gauge
# HELP rabbitmq_node_io_write_avg_time Average time of write operations.
# TYPE rabbitmq_node_io_sync_count counter
# HELP rabbitmq_node_io_sync_count Sync operations sync node start.
# TYPE rabbitmq_node_io_sync_avg_time gauge
# HELP rabbitmq_node_io_sync_avg_time Average time of sync operations.
# TYPE rabbitmq_node_io_seek_count counter
# HELP rabbitmq_node_io_seek_count Seek operations since node start.
# TYPE rabbitmq_node_io_seek_avg_time gauge
# HELP rabbitmq_node_io_seek_avg_time Average time of seek operations.
# TYPE rabbitmq_node_io_reopen_count counter
# HELP rabbitmq_node_io_reopen_count Times files have been reopened by the file handle cache.
# TYPE rabbitmq_node_mnesia_ram_tx_count counter
# HELP rabbitmq_node_mnesia_ram_tx_count Mnesia transactions in RAM since node start.
# TYPE rabbitmq_node_mnesia_disk_tx_count counter
# HELP rabbitmq_node_mnesia_disk_tx_count Mnesia transactions in disk since node start.
# TYPE rabbitmq_node_msg_store_read_count counter
# HELP rabbitmq_node_msg_store_read_count Read operations in the message store since node start.
# TYPE rabbitmq_node_msg_store_write_count counter
# HELP rabbitmq_node_msg_store_write_count Write operations in the message store since node start.
# TYPE rabbitmq_node_queue_index_journal_write_count counter
# HELP rabbitmq_node_queue_index_journal_write_count Write operations in the queue index journal since node start.
# TYPE rabbitmq_node_queue_index_write_count counter
# HELP rabbitmq_node_queue_index_write_count Queue index write operations since node start.
# TYPE rabbitmq_node_queue_index_read_count counter
# HELP rabbitmq_node_queue_index_read_count Queue index read operations since node start.
# TYPE rabbitmq_node_io_file_handle_open_attempt_count counter
# HELP rabbitmq_node_io_file_handle_open_attempt_count File descriptor open attempts.
# TYPE rabbitmq_node_io_file_handle_open_attempt_avg_time gauge
# HELP rabbitmq_node_io_file_handle_open_attempt_avg_time Average time of file descriptor open attempts.
# TYPE rabbitmq_node_metrics_gc_queue_length_channel_closed gauge
# HELP rabbitmq_node_metrics_gc_queue_length_channel_closed Message queue length of GC process for channel metrics
rabbitmq_node_metrics_gc_queue_length_channel_closed{name="rabbit@mars",type="disc"} 0
# TYPE rabbitmq_node_metrics_gc_queue_length_connection_closed gauge
# HELP rabbitmq_node_metrics_gc_queue_length_connection_closed Message queue length of GC process for connection metrics
rabbitmq_node_metrics_gc_queue_length_connection_closed{name="rabbit@mars",type="disc"} 0
# TYPE rabbitmq_node_metrics_gc_queue_length_consumer_deleted gauge
# HELP rabbitmq_node_metrics_gc_queue_length_consumer_deleted Message queue length of GC process for consumer metrics
rabbitmq_node_metrics_gc_queue_length_consumer_deleted{name="rabbit@mars",type="disc"} 0
# TYPE rabbitmq_node_metrics_gc_queue_length_exchange_deleted gauge
# HELP rabbitmq_node_metrics_gc_queue_length_exchange_deleted Message queue length of GC process for exchange metrics
rabbitmq_node_metrics_gc_queue_length_exchange_deleted{name="rabbit@mars",type="disc"} 0
# TYPE rabbitmq_node_metrics_gc_queue_length_node_node_deleted gauge
# HELP rabbitmq_node_metrics_gc_queue_length_node_node_deleted Message queue length of GC process for node-node metrics
# TYPE rabbitmq_node_metrics_gc_queue_length_queue_deleted gauge
# HELP rabbitmq_node_metrics_gc_queue_length_queue_deleted Message queue length of GC process for queue metrics
rabbitmq_node_metrics_gc_queue_length_queue_deleted{name="rabbit@mars",type="disc"} 0
# TYPE rabbitmq_node_metrics_gc_queue_length_vhost_deleted gauge
# HELP rabbitmq_node_metrics_gc_queue_length_vhost_deleted Message queue length of GC process for vhost metrics
rabbitmq_node_metrics_gc_queue_length_vhost_deleted{name="rabbit@mars",type="disc"} 0
# TYPE rabbitmq_node_metrics_gc_queue_length_channel_consumer_deleted gauge
# HELP rabbitmq_node_metrics_gc_queue_length_channel_consumer_deleted Message queue length of GC process for consumer metrics
rabbitmq_node_metrics_gc_queue_length_channel_consumer_deleted{name="rabbit@mars",type="disc"} 0

@deadtrickster
Copy link
Owner

Hi, thank you! Can I use this text in the README? From the beginning to Closes 37 and 13. Wouldn't mind if you just drop it somewhere too :-)

@michaelklishin
Copy link
Contributor

@deadtrickster I'd be happy to update the docs once this is in. I don't think this PR's description verbatim is a good fit for the README.

@deadtrickster deadtrickster merged commit 1c59805 into deadtrickster:master Sep 26, 2018
@deadtrickster
Copy link
Owner

@michaelklishin I still don't mind reusing this PR's text in the README. Do you think you'll have time for docs update?

@jperville
Copy link
Contributor

jperville commented Nov 16, 2018

@deadtrickster would it be possible to cut a new release that includes this merged feature? thank you very much. Latest official release (v3.7.2.3) is from 16th June while this PR was merged to master 10 days later.

@deadtrickster
Copy link
Owner

@jperville done - https://github.com/deadtrickster/prometheus_rabbitmq_exporter/releases/tag/v3.7.2.4

please let me know if you'll have issues with it

DXist pushed a commit to DXist/prometheus_rabbitmq_exporter that referenced this pull request Dec 28, 2018
Add node metrics provided by management API
@zwxk14
Copy link

zwxk14 commented Jan 9, 2019

@deadtrickster All the metrics are important, but it seems that only merged to v3.7.x, and released for only v3.7 too. What about v3.6.x? Please do not abandon the users of old version . T_T

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants