Skip to content

lcrownover/prometheus-slurm-exporter

Repository files navigation

Prometheus Slurm Exporter

Prometheus collector and exporter for metrics extracted from the Slurm resource scheduling system.

This project was forked from https://github.com/vpenso/prometheus-slurm-exporter and, for now, aims to be backwards-compatible from SLURM 23.11 forward. This means the existing Grafana Dashboard should plug directly into this exporter and work roughly the same.

Unlike previous slurm exporters, this project leverages the SLURM REST API (slurmrestd) for data retreival. Due to that difference, you are no longer required to run this exporter on a cluster node, as the exporter does not depend on having SLURM installed or connected to the head node! I will be releasing containerized versions of this exporter soon.

Installation

This repository contains precompiled binaries for the three most recent major versions of SLURM (Note: currently only two versions, but will be three when 24.11 releases). In the releases page, download the newest version of the exporter that matches your SLURM version. The included systemd file assumes you've saved this binary to /usr/local/sbin/prometheus-slurm-exporter, so drop it there or take note to change the systemd file if you choose to use it.

Configuration

The expoter requires several environment variables to be set:

  • SLURM_EXPORTER_LISTEN_ADDRESS

    This should be the full address for the exporter to listen on.

    Default: 0.0.0.0:8080

  • SLURM_EXPORTER_API_URL

    This is the URL to your slurmrestd server.

    Example: http://head1.domain.edu:6820

  • SLURM_EXPORTER_API_USER

    The user specified in the token command.

  • SLURM_EXPORTER_API_TOKEN

    This is the SLURM token to authenticate against slurmrestd.

    The easiest way to generate this is by running the following line on your head node:

    scontrol token username=myuser lifespan=someseconds

    myuser should probably be the slurm user, or some other privileged account.

    lifespan is specified in seconds. I set mine for 1 year (lifespan=31536000).

Systemd

A systemd unit file is included for ease of deployment.

This unit file assumes you've written your environment variables to /etc/prometheus-slurm-exporter/env.conf in the format:

SLURM_EXPORTER_API_URL="http://head.domain.edu:6820"
SLURM_EXPORTER_API_USER="root"
SLURM_EXPORTER_API_TOKEN="mytoken"

Don't forget to chmod 600 /etc/prometheus-slurm-exporter/env.conf!

Prometheus Server Scrape Config

This is an example scrape config for your prometheus server:

scrape_configs:
  - job_name: 'slurm_exporter'
    scrape_interval:  30s
    scrape_timeout:   30s
    static_configs:
      - targets: ['exporter_host.domain.edu:8080']

Grafana Dashboard

The dashboard published by the previous author should work the same with this exporter. I will be releasing a new version of the dashboard soon that will receive new features.

Status of the Nodes

Status of the Jobs

SLURM Scheduler Information

Contributing

Check out the CONTRIBUTING.md document.