This workflow shows how to automatically build and deploy containers to clusters and verify the results using continuous integration. It is developed as a showcase for the SURESOFT workflow addressing reproducibility on HPC platforms. The project includes a sample application of a 2D Laplace heat transfer in a plate.
The workflow is grouped into four stages in the Continuous Integration pipeline using GitLab CI (see image below).
-
build
- builds containers with different MPI implementations (see Singularity Images) using Singularity
- Rocky Linux with MPICH using the hybrid model. Definition File: rockylinux9-mpich.def
- Rocky Linux with MPICH using the bind model. Definition File: rockylinux9-mpich-bind.def
- Rocky Linux with OpenMPI using the hybrid model. Definition File: rockylinux9-openmpi.def
- builds containers with different MPI implementations (see Singularity Images) using Singularity
-
simulation
- runs the image with MPI bind model on the cluster using hpc-rocket
- deploys the container to the cluster via SSH
- executes the container (e.g. via SLURM)
- returns a defined set of files as the result
- runs the image with MPI bind model on the cluster using hpc-rocket
-
test
- Runs a regression test with fieldcompare to compare the results of the simulation stage with reference data.
- benchmark
- Dynamically generates additional CI jobs to benchmark the performance of the different MPI approaches.
The .def
files in the Containers directory define Singularity images using different MPI implementations and binding approaches. The singularity files are based on rockylinux 9 as the targeted remote system uses CentOS Linux 7.
All .def
files are separated into two stages, a build
and a runtime
stage.
The build
stage is used to compile the application, while the runtime
stage only contains the dependencies necessary to run it.
This reduces size of the final image.
rockylinux9-mpich.def
and rockylinux9-openmpi.def
use the hybrid model where MPI is installed on the host machine as well as inside the container.
When running, the MPI on the host machine will communicate with the MPI instance inside the container.
In practice this leads to a small performance overhead in comparison to a native solution.
rockylinux9-mpich-bind.def
uses the bind model where no MPI instance is installed in the container.
Instead the MPI installation of the host machine is mounted into the container.
This results in a performance on par with a native solution.
However, the portability of the container is reduced, since the application must be compiled with the same MPI version that is used on the host machine.
The first CI-job, which builds the container, requires a GitLab Runner using a privileged Docker Executor. This is necessary because it uses a docker image to build the singularity container. However, this is not needed if the container already exists.
HPC Rocket is a commandline tool to send slurm commands to a remote machine and monitor the job progress. It was primarily written to launch slurm jobs from a CI pipeline.
- defines files to copy to cluster
- defines result files to copy back to gitlab
- defines slurm job file to submit
- slurm settings
- executes singularity image
fieldcompare
is a Python package with command-line interface (CLI) that can be used to compare
datasets for (fuzzy) equality. It was designed mainly to serve as a tool to realize regression tests
for research software, and in particular research software that deals with numerical simulations.
In regression tests, the output of a software is compared to reference data that was produced by
the same software at an earlier time, in order to detect if changes to the code cause unexpected
changes to the behavior of the software.
We use fieldcompare to compare the temperature field of the the 2d Laplace simulation with a predefined reference dataset.
-
matplot
-
dynamic CI pipeline
-
jinja templates
- slurmjob
- rocket files
- CI jobs