The LLNL and ORNL CORAL systems Lassen, Sierra, and Summit are pre-exascale supercomputers built by IBM. They run a specialized software stack that requires additional components to integrate properly with Flux. These components are provided as Lmod modules on all three systems.
To setup your environment to use these modules on the LLNL systems Lassen and Sierra, run:
module use /usr/tce/modulefiles/Core # if not already in use
module use /usr/global/tools/flux/blueos_3_ppc64le_ib/modulefiles
If you are using the ORNL system Summit, run:
module use /sw/summit/modulefiles/ums/gen007flux/linux-rhel8-ppc64le/Core
You can load the latest Flux-team managed installation on LLNL and ORNL CORAL machines using:
module load flux
Note
If you are using an installation of Flux that is not provided by the Flux
team and that is configured without --enable-pmix-bootstrap
(e.g., a
spack-installed Flux), launching it on CORAL systems requires a shim layer to
provide PMI on top of the PMIx
interface provided by the CORAL system launcher jsrun
. To load this module
alongside your side-installed Flux, run module load pmi-shim
.
We also suggest that you launch Flux using jsrun
with the following arguments:
jsrun -a 1 -c ALL_CPUS -g ALL_GPUS -n ${NUM_NODES} --bind=none --smpiargs="-disable_gpu_hooks" flux start
The ${NUM_NODES}
variable is the number of nodes that you want to launch
the Flux instance across. The remaining arguments ensure that all on-node
resources are available to Flux for scheduling.
Note
If you are using the pmi-shim
module mentioned above, you will need to set
PMIX_MCA_gds="^ds12,ds21"
in your environment before calling jsrun
. The
PMIX_MCA_gds
environment variable works around a bug in OpenPMIx that causes a hang when
using the PMI compatibility shim.
If you want to run MPI applications compiled with Spectrum MPI under Flux, then one additional step is required. When you run a Spectrum MPI binary under flux, you must enable Flux's Spectrum MPI plugin. From the CLI, this looks like:
flux mini run -o mpi=spectrum my_mpi_binary
From the Python API, this looks like:
#!/usr/bin/env python3 import os import flux from flux import job fh = flux.Flux() jobspec = job.JobspecV1.from_command(['my_mpi_binary']) jobspec.environment = dict(os.environ) jobspec.setattr_shell_option('mpi', 'spectrum') jobid = job.submit(fh, jobspec) print(jobid)
On all systems, Flux relies on hwloc
to auto-detect the on-node resources
available for scheduling. The hwloc
that Flux is linked against must be
configured with --enable-cuda
for Flux to be able to detect Nvidia GPUs.
The LLNL and ORNL CORAL flux
modules automatically loads an hwloc
configured
against a system-provided cuda
.
For all systems, you can test to see if the hwloc
that Flux is linked against
is CUDA-enabled by running:
$ flux start flux resource list
STATE NNODES NCORES NGPUS
free 1 40 4
allocated 0 0 0
down 0 0 0
If the number of free GPUs is 0, then the hwloc
that Flux is linked against is
not CUDA-enabled.
In addition, please refer to the manual page of the flux-mini(1) command to run or to submit an MPI job with a specific CPU/GPU set and affinity using its shell options. For example, to run a job at 4 MPI processes each binding to 10 CPU cores and 1 GPU on a compute node:
flux mini run -N 1 -n 4 -c 10 -g 1 -o mpi=spectrum -o cpu-affinity=per-task -o gpu-affinity=per-task my_mpi_binary