This is a replication package containing code and experimental results related to the paper titled: MaBoSS for HPC environments: Implementations of the continuous time Boolean model simulator for large CPU clusters and GPU accelerators.
The artifact comprises the following directories:
benchmark
-- benchmarking scriptsdata
-- the (bnd
,cfg
) model pairs on which the benchmarks were runplots
-- scripts for generating results plotspresented-results
-- plots (including some that were not included in the paper), CSV data files with measurements and R script that generated the plots from the datarepos
-- all the implementations of cross-correlationMaBoSS.GPU
-- GPU code (located undersrc
)MaBoSS
-- CPU and MPI code (located underengine/src
)
repo/MaBoSS.GPU/src
directory contains the source files to the CUDA kernels. Notably, the runtime compilation optimization can be viewed in kernel_compiler.h/cpp
and generator.h/cpp
source files, the simulation part in jit_kernels/simulation.cu
and the stats computation part in jit_kernels/final_states.cu
and jit_kernels/window_average_small.cu
.
MPI implementation of CPU MaBoSS code can be found by searching for the MPI_COMPAT
keyword in repo/MaBoSS/engine/src
directory.
Hardware requirements:
- A CUDA-compatible GPU
- A (homogeneous) CPU cluster
Software requirements:
- CUDA toolkit 12.2 or later and appropriate driver
- an MPI implementation, such as
mpich
cmake 3.18
or laterflex
andbison
librariesR
software for plotting the graphs (see details below)
Let us present a few commands for your convenience that will allow you to set up the environment quickly:
Installing all dependencies (except CUDA and GPU driver) on Debian/Ubuntu:
sudo apt-get update && apt-get install -y g++ cmake r-base flex bison mpich
Installing all dependencies (except CUDA and GPU driver) on RHEL-like distribution:
sudo dnf install -y cmake gcc-c++ R flex bison mpich
R packages necessary for generating the plots:
R -e "install.packages(c('ggplot2', 'cowplot', 'sitools', 'viridis', 'dplyr'), repos='https://cloud.r-project.org')"
Our experiments are designed to provide a comprehensive analysis of the aforementioned algorithms running various combinations of parameters computing different sizes of input instances. Therefore, the overall duration of running the experiments is quite long (around 2 to 3 days on our GPU cluster and MareNostrum SC).
To provide a swift way to check the reproducibility of our experiments, we prepared a special script that runs only a subset of the benchmarks.
Kick the tires:
Just to see whether the code is working, run the following from the root directory:
./kick-the-tires.sh
The script should take about 10 minutes to finish. The script runs a subset of GPU experiments.
After the script runs, it will generate results in a CSV format in the results
directory. It should contain 2 CSV files for CPU and GPU benchmark respectively. Each CSV file contains self-documenting headers. Finally, the plotting script is executed generating a single plot in the plots
directory.
More details on how the CSV results rows are processed into plots can be found in the plots/plots-fast.R
script.
The generated plot file will be named real.pdf
and it shows the comparison of CPU and GPU runtime on a small cellcycle model.
Complete set of measurements:
To run the complete benchmark, execute
./run-all.sh
The measured data and plots were stored in the presented-results
directory. The directory also contains plots.R
script, which was used to plot the data. It can be executed by Rscript plots.R
if you wish to re-generate the plots from the data yourself. Here we describe each figure in the presented-results/plots
directory:
Plot | Description | Figure number in paper |
---|---|---|
real.pdf |
GPU benchmark on real-world dataset | Figure 1 |
sizek_mpi.pdf |
MPI benchmark on real-world dataset | Figure 4 |
synth_mpi_speedup.pdf |
MPI strong-scaling speedup on synthetic dataset | Figure 5 |
synth_mpi.pdf |
MPI wall time on synthetic dataset | not included |
nodes.pdf |
GPU benchmark on synthetic dataset | Figure 2 |
nodes-compilation-big-NVIDIA RTX 3070 Laptop GPU.pdf |
Laptop GPU runtime compilation benchmark | not included |
nodes-compilation-big-NVIDIA Tesla A100 GPU.pdf |
Datacenter-grade GPU runtime compilation benchmark | Figure 3 |