Skip to content

tomhosking/mckenzie

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

McKenzie

A lightweight job tracker for the Slurm scheduler

Setup

Create a file config.sh in the root of the project, like this:

#!/bin/bash

export MCKENZIE_ENDPOINT=localhost:5002

Then run python src/app.py, and go to http://localhost:5002/ to monitor your jobs.

Example usage

Set up the hook within your job script:

MCKENZIE_HOOK=/path/to/mckenzie/scripts/hook.sh

Use McKenzie error trapping:

source /mnt/ext/phd/mckenzie/scripts/error_trap.sh

Create a new job:

${MCKENZIE_HOOK} -a 1 -i $jobId -p $slurmPartition -n $jobName

Set the status to 'warmup' and send the job config file to McKenzie:

${MCKENZIE_HOOK} -s warmup -c $1

Set the status to 'running' before starting the job, then 'complete' once it's done:

${MCKENZIE_HOOK} -s running
# ...do the job
${MCKENZIE_HOOK} -s complete

Upload metrics and output files:

${MCKENZIE_HOOK} -r $resultsfile -o $outputfile

To update McKenzie from within a running job, look at the example code in ./lib/mckenzie.py.

Todo

  • Capture cancels for jobs that aren't running yet
  • Capture output files
  • Easy access to slurm logs
  • Pivot tables for hparams
  • Custom list orders/filters
  • Detail view

About

A lightweight job tracker for Slurm scheduler

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published