FAQ | Troubleshooting | Glossary
Google Cloud Platform instances require a source image or source image family which the instance will boot from. SchedMD provides public images for Slurm instances, which contain an HPC software stack for HPC ready images. Otherwise, custom images can be created and used instead.
slurm-gcp
generally supports images built on these OS families:
Project | Image Family | Arch |
---|---|---|
cloud-hpc-image-public | hpc-centos-7 | x86_64 |
cloud-hpc-image-public | hpc-rocky-linux-8 | x86_64 |
debian-cloud | debian-11 | x86_64 |
ubuntu-os-cloud | ubuntu-2004-lts | x86_64 |
ubuntu-os-cloud | ubuntu-2204-lts-arm64 | ARM64 |
- Slurm
- 24.05.3
- lmod
- openmpi
- v4.1.x
- cuda
- Limited to x86_64 only
- Latest CUDA and NVIDIA
- lustre
- Only supports x86_64
- Client version 2.12-2.15 depending on the package available for the image OS.
SchedMD releases public images on Google Cloud Platform that are minimal viable images for deploying Slurm clusters through all method and configurations.
NOTE: SchedMD generates images using the same process as documented in custom images but without any additional software and only using clean minimal base images for the source image (e.g.
ubuntu-os-cloud/ubuntu-2004-lts
).
For the TPU nodes docker images are also released.
Project | Image Family | Arch | Status |
---|---|---|---|
schedmd-slurm-public | slurm-gcp-6-8-debian-11 | x86_64 | Supported |
schedmd-slurm-public | slurm-gcp-6-8-hpc-rocky-linux-8 | x86_64 | Supported |
schedmd-slurm-public | slurm-gcp-6-8-ubuntu-2004-lts | x86_64 | Supported |
schedmd-slurm-public | slurm-gcp-6-8-ubuntu-2204-lts-arm64 | ARM64 | Supported |
Project | Image Family | Status |
---|---|---|
schedmd-slurm-public | tpu:slurm-gcp-6-8-tf-2.12.1 | Supported |
schedmd-slurm-public | tpu:slurm-gcp-6-8-tf-2.13.0 | Supported |
schedmd-slurm-public | tpu:slurm-gcp-6-8-tf-2.13.1 | Supported |
schedmd-slurm-public | tpu:slurm-gcp-6-8-tf-2.14.0 | Supported |
schedmd-slurm-public | tpu:slurm-gcp-6-8-tf-2.14.1 | Supported |
schedmd-slurm-public | tpu:slurm-gcp-6-8-tf-2.15.0 | Supported |
To create slurm_cluster compliant images yourself, a custom Slurm image can be created. Packer and Ansible are used to orchestrate custom image creation.
Custom images can be built from a supported private or public image (e.g. hpc-centos-7, centos-7). Additionally, ansible roles or scripts can be added into the provisioning process to install custom software and configure the custom Slurm image.
Install software dependencies and build images from configation.
See slurm-gcp packer project for details.
Before you build your images with packer, you can modify how the build will happen. Custom packages and other image configurations can be added by a few methods. All methods below may be used together in any combination, if desired.
- Role scripts runs all scripts globbed from scripts.d. This method is intended for simple configuration scripts.
- Image configuration can be extended by specifying extra custom playbooks using
the input variable
extra_ansible_provisioners
. These playbooks will be applied after Slurm installation is complete. For example, the following configuration will run a playbook without any dependencies on extra Ansible Galaxy roles:extra_ansible_provisioners = [ { playbook_file = "/home/username/playbooks/custom.yaml" galaxy_file = null extra_arguments = ["-vv"] user = null }, ]
- The Slurm image can be built on top of an existing image. Configure the
pkrvars file with
source_image
orsource_image_family
pointing to your image. This is intended for more complex configurations because of workflow or pipelines.
Recently published images in project schedmd-slurm-public
support shielded VMs
without GPUs or mounting a Lustre filesystem. Both of these features require
kernel modules, which must be signed to be compatible with SecureBoot.
If you need GPUs, our published image family based on
ubuntu-os-cloud/ubuntu-2004-lts
has signed Nvidia drivers installed and
therefore supports GPUs with SecureBoot and Shielded VMs.
If you need Lustre or GPUs on a different OS, it is possible to do this manually with a custom image. Doing this requires
- generating a private/public key pair with openssl
- signing the needed kernel modules
- including the public key in the UEFI authorized keys
db
of the imagegcloud compute images create
- option:
--signature-database-file
- Default Microsoft keys should be included as well because this overwrites the default key database.
- Unfortunately, it appears that packer does not support this image creation option at this time, so the image creation step must be manual.
More details on this process are beyond the scope of this documentation. See link and/or contact Google for more information.