Skip to content

Conversation

sjpb
Copy link
Collaborator

@sjpb sjpb commented Oct 9, 2025

  1. For nodes with non-MIG/vGPU NVIDIA GPUs and an image built including the cuda role, GRES can now be configured by setting:

    # environments/site/inventory/group_vars/all/openhpc.yml:
    openhpc_gres_autodetect: nvml

    Note that:

    • Setting GresTypes in openhpc_config_extra is no longer ever required.
    • For nvml autodectection (only), the conf option in gres entries in openhpc_nodegroups is no longer required.
  2. Enables nvml autodetection for the .caas environment, so Azimuth Slurm clusters only need an appropriate image built to autoconfigure NVIDIA GPUs.

For full details see stackhpc/ansible-role-openhpc#202.

@sjpb
Copy link
Collaborator Author

sjpb commented Oct 10, 2025

CI failed b/c this needs to be rebased on top of #818.

@sjpb sjpb marked this pull request as ready for review October 10, 2025 11:15
@sjpb sjpb requested a review from a team as a code owner October 10, 2025 11:15
@sjpb sjpb changed the title wip - bump openhpc role for testing Support automatic GRES configuration for NVIDIA GPUs Oct 10, 2025
@sjpb sjpb marked this pull request as draft October 10, 2025 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant