A simple python based Spectrum Scale/GPFS filesystem statistics and quota exporter.

Data collected

The exporter collects:

GPFS IO statistics
GPFS pool sizes and limits
GPFS fileset, user and group quotas for all local GPFS filesystems.

It can be ran on any member of the cluster.

GPFS IO metrics

All of the following metrics will have fs label containing the name of the filesystem, e.g. fs="bigdata.storage".

Metric name	Type	Description
gpfs_bytes_read	counter	GPFS bytes read
gpfs_bytes_write	counter	GPFS bytes written
gpfs_requests_open	counter	GPFS open call requests including create
gpfs_requests_close	counter	GPFS close call requests
gpfs_requests_read	counter	GPFS number of read requests
gpfs_requests_write	counter	GPFS number of write requests
gpfs_requests_readdir	counter	GPFS number of readdit requests
gpfs_inode_updates	counter	GPFS number of inode updates to disk

GPFS Pool metrics

In addition to the fs label these metrics also provide these labels:

pool_name, e.g. "system"
pool_id, e.g. "0",
pool_type, e.g. "data,meta"
block_size, e.g. 4 MB"

Metric name	Type	Description
gpfs_pool_data_size_bytes	gauge	GPFS Pool Data Total Size
gpfs_pool_data_free_bytes	gauge	GPFS Pool Data Free Size
gpfs_pool_meta_size_bytes	gauge	GPFS Pool Meta Total Size
gpfs_pool_meta_free_bytes	gauge	GPFS Pool Meta Free Size

GPFS Fileset/User/Group Quota Metrics

In addition to the fs these metrics can have labels:

quota_type, e.g. USR or GRP or FILESET
uid, e.g. "0"
gid, e.g. "0"
fileset_id, e.g. "0" (only for FILESET quota types)
fid="", e.g. "" or a number for the fileset
filesetname, e.g. "root"
quota, e.g. "on"
def_quota, e.g. "off"
remarks, e.g. "i"

Metric name	Type	Description
gpfs_quota_block_usage_bytes	gauge	GPFS Block Quota Usage
gpfs_quota_block_limit_soft_bytes	gauge	GPFS Block Quota Soft Limit
gpfs_quota_block_limit_hard_bytes	gauge	GPFS Block Quota Hard Limit
gpfs_quota_block_usage_in_doubt_bytes	gauge	GPFS Block Quota Usage In Doubt
gpfs_quota_block_grace_in_seconds	gauge	GPFS Block Quota Grace In Seconds with 0=expired,1=ok or seconds+1
gpfs_quota_files_usage	gauge	GPFS Number Of Files Quota Usage
gpfs_quota_files_limit_soft	gauge	GPFS Number Of Files Quota Soft Limit
gpfs_quota_files_limit_hard	gauge	GPFS Number Of Files Quota Hard Limit
gpfs_quota_files_usage_in_doubt	gauge	GPFS Number Of Files Quota Usage In Doubt
gpfs_quota_files_grace_in_seconds	gauge	GPFS Number Of Files Grace In Seconds with 0=expired, 1=ok or seconds+1

Running

It is designed to run under systemd via gpfs-exporter socket and unit service, found in systemd/ directory in this repository.

For debugging purposes run it directly. It will output on the stdout and exit immediately.

Basic Prometheus Configuration

For example, with cluster called speedy:

---
global:
  scrape_interval: 15s
  evaluation_interval: 15s
scrape_configs:
- job_name: prometheus
  scrape_interval: 10s
  scrape_timeout: 10s
  metrics_path: "/prometheus/metrics"
  static_configs:
  - targets:
    - localhost:9090
    labels:
      alias: Prometheus
- job_name: Speedy GPFS
  static_configs:
  - targets:
    - speedy-member:9001
    labels:
      cluster: speedy
      service: gpfs

Grafana dashboard

You can find the dashboard used at Princeton University in the grafana subdirectory. It has two static variables that are used for easy filtering. One with the list of compute clusters (we add cluster label at collection time) and the other with gpfs filesystem names.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Data collected

GPFS IO metrics

GPFS Pool metrics

GPFS Fileset/User/Group Quota Metrics

Running

Basic Prometheus Configuration

Grafana dashboard

Files

README.md

Latest commit

History

README.md

File metadata and controls

Data collected

GPFS IO metrics

GPFS Pool metrics

GPFS Fileset/User/Group Quota Metrics

Running

Basic Prometheus Configuration

Grafana dashboard