Skip to content

v4.0.0

Latest
Compare
Choose a tag to compare
@rezib rezib released this 28 Nov 08:21
· 68 commits to main since this release
v4.0.0

Added

  • Support Slurm 24.11 and Slurm REST API v0.0.40 (#366#400).
  • agent:
    • Return RacksDB infrastructure name and a boolean to indicate if metrics feature is enabled in /info endpoint, in addition to the cluster name.
    • Add optional /metrics endpoint with various Slurm metrics in OpenMetrics format designed to be scraped by Prometheus or compatible (#274).
    • Add possibility to query metrics from Prometheus database with /v<version>/metrics/<metric> endpoint.
    • Add possibility to filter jobs which are allocated a specific node with node query parameter on /v<version>/jobs endpoint.
  • gateway:
    • Return RacksDB infrastructure name and boolean metrics feature flag of every clusters in /clusters endpoint.
    • Return optional markdown login service message as rendered HTML page with /messages/login endpoint.
    • Proxy metrics requests to agent through /api/agents/<cluster>/metrics/<metric> endpoint.
  • frontend:
    • Request RacksDB with the infrastructure name provided by the gateway (#348).
    • Display time limit of running jobs in job details page (#352).
    • Display service message below login form if defined (#253).
    • Add dependency on charts.js and luxon adapter to draw charts with timeseries metrics.
    • Display charts of resources (nodes/cores) status and jobs queue in dashboard page based on metrics from Prometheus (#275).
    • Display list of jobs which have resources allocated on the node in node details page (#292).
    • Display hash near all jobs fields in job details page to generate link to highlight specific field (#251).
    • Represent terminated jobs with colored bullet in job status badge, using respectively green for completed (ie. successful) jobs, red for failed jobs and dark orange for timeout jobs (#354).
  • conf:
    • Add racksdb > infrastructure parameter for the agent.
    • Add metrics > enabled parameter for the agent.
    • Add metrics > restrict parameter for the agent.
    • Add metrics > host parameter for the agent.
    • Add metrics > job parameter for the agent.
    • Add ui > templates, message_template, message_login parameters for the gateway.
    • Select alloc_cpus and alloc_idle_cpus nodes fields on slurmrestd /slurm/*/nodes and /slurm/*/node/<node> endpoints.
    • Select nodes jobs field on slurmrestd /slurm/*/jobs endpoint.
    • Introduce service message template.
  • show-conf: Introduce slurm-web-show-conf utility to dump current configuration settings of gateway and agent components with their origin, which can either be configuration definition file or site override (#349).
  • docs:
    • Add manpage for slurm-web-show-conf command.
    • Add metrics feature configuration documentation page.
    • Mention metrics optional feature in quickstart guide.
    • Mention metrics export and charts feature in overview page.
    • Mention possible Prometheus integration in architecture page.
    • Mention login service message feature in overview page.
    • Mention jobs badges to visualize job status in overview page.
    • Add page to document Service Messages configuration.
    • Mention support of Fedora 41.
  • pkgs:
    • Introduce gateway Python extra package.
    • Add requirement on markdown external library for gateway extra package.
    • Add dependency on prometheus-client for the agent.
    • Add direct dependency on ClusterShell for the agent.

Changed

  • agent: Bump minimal required Slurm version from 23.02.0 to 23.11.0.
  • gateway: Change error message when unable to parse agent info fields.
  • docs:
    • Update configuration reference documentation.
    • Update dashboard screenshot in overview page with example of resource chart.
    • Replace mention of Slurm REST API version v0.0.39 by v0.0.40.
    • Mention requirement of Slurm >= 23.11 and dropped support of Slurm 23.02.
  • conf:
    • Convert [cache] > password agent parameter from string to password type.
    • Convert [ldap] > bind_password gateway parameter from string to password type.
    • Bump [slurmrestd] > version default value from 0.0.39 to 0.0.40 in agent configuration for compatibility with Slurm 24.11.
  • pkgs:
    • Add requirement on RFL.core >= 1.1.0.
    • Add requirement on RFL.settings >= 1.1.1.

Fixed

  • agent:
    • Fix retrieval of terminated jobs only available in accounting service with an option to ignore 404 for specific slurmrestd requests.
    • Fix compatibility issue with Requests >= 2.32.2 (#350).
    • Return HTTP/404 not found with meaningful error message when requesting unexisting node.
  • gateway:
    • Catch generic requests.exceptions.RequestException when retrieving information from agents to avoid AttributeError with more specific exceptions on old versions on Requests library (#391).
    • Catch JSONDecodeError from simpleson external library and json standard library module not managed by Requests < 2.27.
  • frontend:

Removed

  • Support of Slurm 23.02 and Slurm REST API v0.0.39.
  • conf:
    • Remove unused required from default selected jobs field on slurmrestd /slurm/*/jobs endpoint.
    • Remove unused state_reason from default selected job field on slurmrestd /slurm/*/job/<id> endpoint.
  • docs: Remove mention of Fedora 39 support.