Skip to content

Running on Myriad cluster

Asif Tamuri edited this page Dec 19, 2024 · 10 revisions

The commands below should be executed on the login node of the cluster

Install git-lfs

  • Download latest version for Linux AMD64
    • wget https://github.com/git-lfs/git-lfs/releases/download/v3.6.0/git-lfs-linux-amd64-v3.6.0.tar.gz
  • Extract
    • tar xvf git-lfs-linux-amd64-v3.6.0.tar.gz
  • Go into directory
    • cd git-lfs-3.6.0
  • Make the directory for local binary installation
    • mkdir -p ~/.local/bin/
  • This directory needs to be in your path (should already be, but just in case)
    • export PATH="$HOME/.local/bin:$PATH"
  • Run the installation script
    • ./install.sh --local
  • Check it worked
    • git-lfs --version

(Adapted from git-lfs-install gist)

Create directories

  • mkdir ~/thanzi
  • mkdir -p ~/Scratch/thanzi/TLOmodel-outputs

Download TLOmodel

  • cd ~/thanzi
  • git clone https://github.com/UCL/TLOmodel.git
  • cd TLOmodel
  • Check resource files have been downloaded properly (i.e. git-lfs worked)

Create Python environment and install dependencies

  • Load the Python module for the cluster
    • module load python3/3.11
  • Outside of the TLOmodel source code directory, create the virtual environment
    • cd ~/thanzi
    • python -m venv venv-tlo
  • Activate the virtual environment
    • source ~/thanzi/venv-tlo/bin/activate
  • Install the TLOmodel requirements
    • cd ~/thanzi/TLOmodel
    • pip install -r requirements/dev.txt
    • pip install -e .
  • Check it worked (it will be slow first time)
    • tlo

Make a job submission script

Create the following file submit-scenario.sh in ~/thanzi. You have to customise parts of it (lines commented ***)

#!/bin/bash -l

############ JOB CONFIG

# *** Request for the most reasonable minimum you can, up to 72 hours. This specifies 24 hours
#$ -l h_rt=24:0:0

# Request 16GB of memory
#$ -l mem=16G

# *** Personal job name identifier
#$ -N testing_scenario

# *** Put in your username below
#$ -wd /home/<your UCL id>/Scratch/thanzi/TLOmodel-outputs

# *** Setup the job array: 1-(no. of draws * no. of runs) e.g. if 3 draws, 3 runs: 1-9
#$ -t 1-9

############ END OF JOB CONFIG

# *** Specify number of draws & runs
numberOfDraws=5
numberOfRuns=10

# make the output directory
taskNumber=$SGE_TASK_ID
thisRun=$(awk -v n=$taskNumber "BEGIN { for (i=0; i<$numberOfDraws; i++) for (j=0; j<$numberOfRuns; j++) if (++count==n) print i, j }")
thisRunPath=$(echo $thisRun | tr ' ' '/')
outputDir="$HOME/Scratch/thanzi/TLOmodel-outputs/${JOB_NAME}_${JOB_ID}/${thisRunPath}"
mkdir -p $outputDir

# Load and activate python environment
module load python3/3.11
source ~/thanzi/venv-tlo/bin/activate

cd ~/thanzi/TLOmodel

# *** Run the specified scenario
tlo scenario-run --draw $thisRun --output-dir $outputDir src/scripts/dev/scenarios/playing_22.py
tlo parse-log $outputDir
gzip $outputDir/*.log

Submit the job

cd ~/thanzi
qsub submit-scenario.sh

The qsub command will print the job id e.g.:

$ qsub submit_array.sh
Your job-array 62125.1-9:1 ("testing_scenario") has been submitted

You can check the status of your jobs:

$ qstat
job-ID  prior   name             user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------------
  62125 0.00000 testing_scenario ucbtaut      qw    12/19/2024 17:07:40                                    1 1-9:1

Eventually, the tasks will start running

$ qstat
job-ID  prior   name             user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------------
  62125 3.07768 testing_scenario ucbtaut      r     12/19/2024 17:14:06 Bran@node-b00a-007                 1 1
  62125 3.07768 testing_scenario ucbtaut      r     12/19/2024 17:14:06 Bran@node-b00a-007                 1 2
  62125 3.07768 testing_scenario ucbtaut      r     12/19/2024 17:14:08 Bran@node-b00a-007                 1 3
  62125 3.07768 testing_scenario ucbtaut      r     12/19/2024 17:14:08 Bran@node-b00a-007                 1 4
  62125 3.07768 testing_scenario ucbtaut      r     12/19/2024 17:14:08 Bran@node-b00a-013.myriad.ucl.     1 5
...snip...

Get the results

The results will be placed in ~/Scratch/thanzi/TLOmodel-outputs/testing_scenario_62125, where the number at the end is the job id.

  • Go to the job directory and zip up everything to download. Note setting the ID so the commands work.
    • cd ~/Scratch/thanzi/TLOmodel-outputs
  • Set the job id as a variable
    • JOBID=62125
  • Move the stdout and stderr files into the directory
    • mv testing_scenario.?${JOBID}* testing_scenario_${JOBID}
  • Zip everything up
    • zip -r download.zip testing_scenario_${JOBID}

You can move data from the cluster to your local machine, using concepts in the Myriad help.

Clone this wiki locally