-
Notifications
You must be signed in to change notification settings - Fork 28
Using the bwCluster
Already medium scale dynamical models can require large amounts of computational power for fitting and calculation of the profile likelihood. It can take days or even weeks to compute an appropriate number of mult-istart fits on a single high-end PC. The state of Baden-Württemberg provides such resources through the bwHPC concept by granting us access to the bwForCluster MLS&WISO Production.
This article is specific to the high performance computing infrastructure of the state of Baden-Württemberg and the Kreutz & Timmer groups. However, the mentioned functions can probably be adapted to an arbitrary third-party cluster with not too much effort.
The two login nodes can be accessed via
ssh fr_[RZ-ID]@bwfor.cluster.uni-mannheim.de
ssh fr_[RZ-ID]@bwforcluster.bwservices.uni-heidelberg.de
when the cluster entitlement was granted by the Rechenzentrum (RZ). To gain the entitlement, check the bwHPC Wiki for the point "Become Coworker of an RV". The Kreutz and Timmer group already registered a Rechenvorhaben (RV).
- Overview of jobs:
squeue
- Cancel a job:
scancel [jobId]
- Cancel all jobs:
scancel --user=fr_[yourRzId]
- View cluster usage:
sinfo_t_idle
The D2D cluster suite is a collection of functions that allow conveniently using the bwCluster from a local machine without having to copy files and to ssh to the cluster manually. It contains functions for file upload, job submission, review of job status, job killing and file download. An example workflow is described below.
To use the D2D cluster suite, it is necessary to do some preparations on the cluster (once): Cloning the D2D repository
from GitHub to some directory on the cluster, e.g. ~/d2d
, and creating a folder where all the files generated during
the computational tasks are stored (the cluster working directory, e.g. ~/d2d_work
. Next, it is necessary to set some
configuration options. This can be done interactively byalling arUserConfigBwCluster
without an argument.
It will ask for
- your ssh username
- the ssh server (
bwfor.cluster.uni-mannheim.de
) - the MATLAB version to use on the cluster (
R2019b
) - the path to D2D on the cluster (
~/d2d/arFramework3
) - the working directory (
~/d2d_work
)
For the available cluster functions, see the section below. For this example, a multistart optimization with 20 runs will be conducted. Upload of all necessary files and submission of the cluster jobs is possible via the function
arSendJobBwCluster(name, functionString, [cjId], [uploadFiles])
-
name
specifies a subfolder in the cluster working directory to which all files are being uploaded -
functionString
is a string that contains the actual function call on the cluster -
cjId
is an identifier for the computing job that will be submitted. It is stored in ar.config.cluster as well as in a backup file -
uploadFiles
is a boolean variable indicating whether arSendJobBwCluster should only submit the computing job or also upload the files
For this example, a suitable function call can look like this:
arSendJobBwCluster('myProj', 'arFitLhsBwCluster(20,2)', 'myProj_r01', true)
To submit another computing job for the same workspace but, e.g., different parameter settings, one can now simply do the desired changes and then submit another job with a new cjId
and without uploading the files again by arSendJobBwCluster('myProj', 'arFitLhsBwCluster(20,2)', 'myProj_r02', false)
.
The results can be downloaded to the local results folder by calling
arDownloadResultsBwCluster('myProj_r01', 'myProj_r01_clusterResults')
.
Before initiating the download, this function first checks whether all computations have finished by calling arJobStatusBwCluster('myProj_r01')
internally. This function creates a new folder inside the results directory with the name 'myProj_r01_clusterResults'
.
The D2D workflow on the cluster in similar to that on the knechte/ruprechte. First, upload (or clone) D2D to the cluster and your D2D workspace, e.g. by using the scp
command. Then log in to one of the login nodes, load the matlab module by
module load math/matlab/R2019b
and type matlab
to start MATLAB. Now add d2d to the MATLAB path via addpath
.
To perform multi-start fitting on the bwFor cluster, use the function arFitLhsBwCluster
. The usage is similar to calling arFitLHS
on a local machine. First, load your workspace on the cluster, then run arFitLhsBwCluster
. You can specify the queue and walltime for the cluster jobs in the arguments of that function. Type help arFitLhsBwCluster
for further information.
arFitLhsBwCluster
will create a number of MATLAB instances that perform a small batch of the total number of multistart fitting runs in parallel. Because each of those instances writes its results into a different directory, a results collection function must be run after all jobs have finished:
addpath('myD2Dpath')
arLoad('myWorkspace')
% start jobs on standard queue with 1 hour walltime and 10 fits per job
collectfun = arFitLhsBwCluster(1000, 10, 'standard', '01:00:00')
% wait for all jobs to be finished (!) run results collection
collectfun
% results will be stored in 'myWorkspace'
To achieve optimal performance, choose the number of jobs per node as the number of cores divided by the number of conditions of your model. This can be achieved by adding a local copy of arClusterConfig
to your workspace in which you can change the value of conf.n_inNode
.
Profile likelihood calculation can also be accelerated by using the cluster. The function pleBwCluster
shares the logic of arFitLhsBwCluster
and can compute the left and right branch of every profile in parallel. It can this speed up profile likelihood calculation by a factor of up to 2 * [# of fitted parameters]
compared to using ple
.
- Installation and system requirements
- Setting up models
- First steps
- Advanced events and pre-equilibration
- Computation of integration-based prediction bands
- How is the architecture of the code and the most important commands?
- What are the most important fields of the global variable ar?
- What are the most important functions?
- Optimization algorithms available in the d2d-framework
- Objective function, likelhood and chi-square in the d2d framework
- How to set up priors?
- How to set up steady state constraints?
- How do I restart the solver upon a step input?
- How to deal with integrator tolerances?
- How to implement a bolus injection?
- How to implement washing and an injection?
- How to implement a moment ODE model?
- How to run PLE calculations on a Cluster?