Slurm Workload Manager is a popular HPC cluster job scheduler found in
many of the top 500 supercomputers. The slurmR
R package provides an R
wrapper to it that matches the parallel package’s syntax, this is, just
like parallel
provides the parLapply
, clusterMap
, parSapply
,
etc., slurmR
provides Slurm_lapply
, Slurm_Map
, Slurm_sapply
,
etc.
While there are other alternatives such as future.batchtools
,
batchtools
, clustermq
, and rslurm
, this R package has the
following goals:
-
It is dependency-free, which means that it works out-of-the-box
-
Emphasizes been similar to the workflow in the R package
parallel
-
It provides a general framework for creating personalized own wrappers without using template files.
-
Is specialized on Slurm, meaning more flexibility (no need to modify template files) and debugging tools (e.g., job resubmission).
-
Provide a backend for the parallel package, providing an out-of-the-box method for creating Socket cluster objects for multi-node operations. (See the examples below on how to use it with other R packages)
Checkout the VS section section for comparing slurmR
with other
R packages. Wondering who is using Slurm? Check out the list at the end
of this document.
From your HPC command line, you can install the development version from GitHub with:
$ git clone https://github.com/USCbiostats/slurmR.git
$ R CMD INSTALL slurmR/
The second line assumes you have R available in your system (usually
loaded via module R
or some other command). Or using the devtools
from within R:
# install.packages("devtools")
devtools::install_github("USCbiostats/slurmR")
To cite slurmR in publications use:
Vega Yon et al., (2019). slurmR: A lightweight wrapper for HPC with
Slurm. Journal of Open Source Software, 4(39), 1493,
https://doi.org/10.21105/joss.01493
And the actual R package:
Vega Yon G, Marjoram P (2022). _slurmR: A Lightweight Wrapper for
'Slurm'_. R package version 0.5-2,
<https://github.com/USCbiostats/slurmR>.
To see these entries in BibTeX format, use 'print(<citation>,
bibtex=TRUE)', 'toBibtex(.)', or set
'options(citation.bibtex.max=999)'.
For testing purposes, slurmR is available in
Dockerhub.
The rcmdcheck
and interactive
images are built on top of
xenonmiddleware/slurm
.
Once you download the files contained in the slurmR
repository, you can go to the
docker
folder and use the Makefile
included there to start a Unix
session with slurmR and Slurm included.
To test slurmR
using docker, check the README.md file located at
https://github.com/USCbiostats/slurmR/tree/master/docker.
library(slurmR)
# Loading required package: parallel
# slurmR default option for `tmp_path` (used to store auxiliar files) set to:
# /home/george/Documents/development/slurmR
# You can change this and checkout other slurmR options using: ?opts_slurmR, or you could just type "opts_slurmR" on the terminal.
# Suppose that we have 100 vectors of length 50 ~ Unif(0,1)
set.seed(881)
x <- replicate(100, runif(50), simplify = FALSE)
We can use the function Slurm_lapply
to distribute computations
ans <- Slurm_lapply(x, mean, plan = "none")
# Warning in normalizePath(file.path(tmp_path, job_name)):
# path[1]="/home/george/Documents/development/slurmR/slurmr-job-113bd5bca5b18":
# No such file or directory
# Warning: [submit = FALSE] The job hasn't been submitted yet. Use sbatch() to submit the job, or you can submit it via command line using the following:
# sbatch --job-name=slurmr-job-113bd5bca5b18 /home/george/Documents/development/slurmR/slurmr-job-113bd5bca5b18/01-bash.sh
Slurm_clean(ans) # Cleaning after you
Notice the plan = "none"
option; this tells Slurm_lapply
to only
create the job object but do nothing with it, i.e., skip submission. To
get more info, we can set the verbose mode on
opts_slurmR$verbose_on()
ans <- Slurm_lapply(x, mean, plan = "none")
# Warning in normalizePath(file.path(tmp_path, job_name)):
# path[1]="/home/george/Documents/development/slurmR/slurmr-job-113bd5bca5b18":
# No such file or directory
# --------------------------------------------------------------------------------
# [VERBOSE MODE ON] The R script that will be used is located at: /home/george/Documents/development/slurmR/slurmr-job-113bd5bca5b18/00-rscript.r and has the following contents:
# --------------------------------------------------------------------------------
# .libPaths(c("/home/george/R/x86_64-pc-linux-gnu-library/4.2", "/usr/local/lib/R/site-library", "/usr/lib/R/site-library", "/usr/lib/R/library"))
# message("[slurmR info] Loading variables and functions... ", appendLF = FALSE)
# Slurm_env <- function (x = "SLURM_ARRAY_TASK_ID")
# {
# y <- Sys.getenv(x)
# if ((x == "SLURM_ARRAY_TASK_ID") && y == "") {
# return(1)
# }
# y
# }
# ARRAY_ID <- as.integer(Slurm_env("SLURM_ARRAY_TASK_ID"))
#
# # The -snames- function creates the write names for I/O of files as a
# # function of the ARRAY_ID
# snames <- function (type, array_id = NULL, tmp_path = NULL, job_name = NULL)
# {
# if (length(array_id) && length(array_id) > 1)
# return(sapply(array_id, snames, type = type, tmp_path = tmp_path,
# job_name = job_name))
# type <- switch(type, r = "00-rscript.r", sh = "01-bash.sh",
# out = "02-output-%A-%a.out", rds = if (missing(array_id)) "03-answer-%03i.rds" else sprintf("03-answer-%03i.rds",
# array_id), job = "job.rds", stop("Invalid type, the only valid types are `r`, `sh`, `out`, and `rds`.",
# call. = FALSE))
# sprintf("%s/%s/%s", tmp_path, job_name, type)
# }
# TMP_PATH <- "/home/george/Documents/development/slurmR"
# JOB_NAME <- "slurmr-job-113bd5bca5b18"
#
# # The -tcq- function is a wrapper of tryCatch that on error tries to recover
# # the message and saves the outcome so that slurmR can return OK.
# tcq <- function (...)
# {
# ans <- tryCatch(..., error = function(e) e)
# if (inherits(ans, "error")) {
# ARRAY_ID. <- get("ARRAY_ID", envir = .GlobalEnv)
# msg <- paste0("[slurmR info] An error has ocurred while evualting the expression:\n[slurmR info] ",
# paste(deparse(match.call()[[2]]), collapse = "\n[slurmR info] "),
# "\n[slurmR info] in ", "ARRAY_ID # ", ARRAY_ID.,
# "\n[slurmR info] The error will be saved and quit R.\n")
# message(msg, immediate. = TRUE, call. = FALSE)
# ans <- list(res = ans, array_id = ARRAY_ID., job_name = get("JOB_NAME",
# envir = .GlobalEnv), slurmr_msg = structure(msg,
# class = "slurm_info"))
# saveRDS(list(ans), snames("rds", tmp_path = get("TMP_PATH",
# envir = .GlobalEnv), job_name = get("JOB_NAME", envir = .GlobalEnv),
# array_id = ARRAY_ID.))
# message("[slurmR info] job-status: failed.\n")
# q(save = "no")
# }
# invisible(ans)
# }
# message("done loading variables and functions.")
# tcq({
# INDICES <- readRDS("/home/george/Documents/development/slurmR/slurmr-job-113bd5bca5b18/INDICES.rds")
# })
# tcq({
# X <- readRDS(sprintf("/home/george/Documents/development/slurmR/slurmr-job-113bd5bca5b18/X_%04d.rds", ARRAY_ID))
# })
# tcq({
# FUN <- readRDS("/home/george/Documents/development/slurmR/slurmr-job-113bd5bca5b18/FUN.rds")
# })
# tcq({
# mc.cores <- readRDS("/home/george/Documents/development/slurmR/slurmr-job-113bd5bca5b18/mc.cores.rds")
# })
# tcq({
# seeds <- readRDS("/home/george/Documents/development/slurmR/slurmr-job-113bd5bca5b18/seeds.rds")
# })
# set.seed(seeds[ARRAY_ID], kind = NULL, normal.kind = NULL)
# tcq({
# ans <- parallel::mclapply(
# X = X,
# FUN = FUN,
# mc.cores = mc.cores
# )
# })
# saveRDS(ans, sprintf("/home/george/Documents/development/slurmR/slurmr-job-113bd5bca5b18/03-answer-%03i.rds", ARRAY_ID), compress = TRUE)
# message("[slurmR info] job-status: OK.\n")
# --------------------------------------------------------------------------------
# The bash file that will be used is located at: /home/george/Documents/development/slurmR/slurmr-job-113bd5bca5b18/01-bash.sh and has the following contents:
# --------------------------------------------------------------------------------
# #!/bin/sh
# #SBATCH --job-name=slurmr-job-113bd5bca5b18
# #SBATCH --output=/home/george/Documents/development/slurmR/slurmr-job-113bd5bca5b18/02-output-%A-%a.out
# #SBATCH --array=1-2
# #SBATCH --job-name=slurmr-job-113bd5bca5b18
# #SBATCH --cpus-per-task=1
# #SBATCH --ntasks=1
# /usr/lib/R/bin/Rscript /home/george/Documents/development/slurmR/slurmr-job-113bd5bca5b18/00-rscript.r
# --------------------------------------------------------------------------------
# EOF
# --------------------------------------------------------------------------------
# Warning: [submit = FALSE] The job hasn't been submitted yet. Use sbatch() to submit the job, or you can submit it via command line using the following:
# sbatch --job-name=slurmr-job-113bd5bca5b18 /home/george/Documents/development/slurmR/slurmr-job-113bd5bca5b18/01-bash.sh
Slurm_clean(ans) # Cleaning after you
The following example was extracted from the package’s manual.
# Submitting a simple job
job <- Slurm_EvalQ(slurmR::WhoAmI(), njobs = 20, plan = "submit")
# Checking the status of the job (we can simply print)
job
status(job) # or use the state function
sacct(job) # or get more info with the sactt wrapper.
# Suppose some of the jobs are taking too long to complete (say 1, 2, and 15 through 20)
# we can stop it and resubmit the job as follows:
scancel(job)
# Resubmitting only
sbatch(job, array = "1,2,15-20") # A new jobid will be assigned
# Once its done, we can collect all the results at once
res <- Slurm_collect(job)
# And clean up if we don't need to use it again
Slurm_clean(res)
Take a look at the vignette here.
The function makeSlurmCluster
creates a PSOCK cluster within a Slurm
HPC network, meaning that users can go beyond a single node cluster
object and take advantage of Slurm to create a multi-node cluster
object. This feature allows using slurmR
with other R packages that
support working with SOCKcluster
class objects. Here are some examples
With the future
package
library(future)
library(slurmR)
cl <- makeSlurmCluster(50)
# It only takes using a cluster plan!
plan(cluster, cl)
...your fancy futuristic code...
# Slurm Clusters are stopped in the same way any cluster object is
stopCluster(cl)
With the doParallel
package
library(doParallel)
library(slurmR)
cl <- makeSlurmCluster(50)
registerDoParallel(cl)
m <- matrix(rnorm(9), 3, 3)
foreach(i=1:nrow(m), .combine=rbind)
stopCluster(cl)
The slurmR
package has a couple of convenient functions designed for
the user to save time. First, the function sourceSlurm()
allows
skipping the explicit creating of a bash script file to be used together
with sbatch
by putting all the required config files on the first
lines of an R scripts, for example:
#!/bin/sh
#SBATCH --account=lc_ggv
#SBATCH --partition=scavenge
#SBATCH --time=01:00:00
#SBATCH --mem-per-cpu=4G
#SBATCH --job-name=Waiting
Sys.sleep(10)
message("done.")
Is an R script that on the first line coincides with that of a bash
script for Slurm: #!/bin/bash
. The following lines start with
#SBATCH
explicitly specifying options for sbatch
, and the reminder
lines are just R code.
The previous R script is included in the package (type
system.file("example.R", package="slurmR")
).
Imagine that that R script is named example.R
, then you use the
sourceSlurm
function to submit it to Slurm as follows:
slurmR::sourceSlurm("example.R")
This will create the corresponding bash file required to be used with
sbatch
, and submit it to Slurm.
Another nice tool is the slurmr_cmd()
. This function will create a
simple bash-script that we can use as a command-line tool to submit this
type of R-scripts. Moreover, this command will can add the command to
your session’s
alias as follows:
library(slurmR)
slurmr_cmd("~", add_alias = TRUE)
Once that’s done, you can submit R scripts with “Slurm-like headers” (as shown previously) as follows:
$ slurmr example.R
Since version 0.4-3, slurmR
includes the option preamble
. This
provides a way for the user to specify commands/modules that need to be
executed before running the Rscript. Here is an example using
module load
:
# Turning the verbose mode off
opts_slurmR$verbose_off()
# Setting the preamble can be done globally
opts_slurmR$set_preamble("module load gcc/6.0")
# Or on the fly
ans <- Slurm_lapply(1:10, mean, plan = "none", preamble = "module load pandoc")
# Printing out the bashfile
cat(readLines(ans$bashfile), sep = "\n")
# #!/bin/sh
# #SBATCH --job-name=slurmr-job-113bd5bca5b18
# #SBATCH --output=/home/george/Documents/development/slurmR/slurmr-job-113bd5bca5b18/02-output-%A-%a.out
# #SBATCH --array=1-2
# #SBATCH --job-name=slurmr-job-113bd5bca5b18
# #SBATCH --cpus-per-task=1
# #SBATCH --ntasks=1
# module load gcc/6.0
# module load pandoc
# /usr/lib/R/bin/Rscript /home/george/Documents/development/slurmR/slurmr-job-113bd5bca5b18/00-rscript.r
Slurm_clean(ans) # Cleaning after you
There are several ways to enhance R for HPC. Depending on what are your goals/restrictions/preferences, you can use any of the following from this manually curated list:
Package | Rerun (1) | *apply (2) | makeCluster (3) | Slurm options | Dependencies | Activity |
---|---|---|---|---|---|---|
slurmR | yes | yes | yes | on the fly | ||
drake | yes | - | - | by template | ||
rslurm | - | yes | - | on the fly | ||
future.batchtools | - | yes | yes | by template | ||
batchtools | yes | yes | - | by template | ||
clustermq | - | - | - | by template |
- After errors, a part or the entire job can be resubmitted.
- Functionality similar to the apply family in base R, e.g., lapply, sapply, mapply or similar.
- Creating a cluster object using either MPI or Socket connection.
The packages slurmR, rslurm work only on Slurm. The drake package is focused on workflows.
We welcome contributions to slurmR
. Whether it is reporting a bug,
starting a discussion by asking a question, or proposing/requesting a
new feature, please go by creating a new issue
here so that we can talk
about it.
Please note that this project is released with a Contributor Code of Conduct (see the CODE_OF_CONDUCT.md file included in this project). By participating in this project, you agree to abide by its terms.
Here is a manually curated list of institutions using Slurm:
Institution | Country | Link |
---|---|---|
University of Utah’s CHPC | US | link |
USC Center for Advance Research Computing | US | link |
Princeton Research Computing | US | link |
Harvard FAS | US | link |
Harvard HMS research computing | US | link |
UCSan Diego WM Keck Lab for Integrated Biology | US | link |
Stanford Sherlock | US | link |
Stanford SCG Informatics Cluster | US | link |
UC Berkeley Open Computing Facility | US | link |
University of Utah CHPC | US | link |
The University of Kansas Center for Research Computing | US | link |
University of Cambridge | UK | link |
Indiana University | US | link |
Caltech HPC Center | US | link |
Institute for Advanced Study | US | link |
UTSouthwestern Medical Center BioHPC | US | link |
Vanderbilt University ACCRE | US | link |
University of Virginia Research Computing | US | link |
Center for Advanced Computing | CA | link |
SciNet | CA | link |
NLHPC | CL | link |
Kultrun | CL | link |
Matbio | CL | link |
TIG MIT | US | link |
MIT Supercloud | US | supercloud.mit.edu/ |
Oxford’s ARC | UK | link |
With project is supported by the National Cancer Institute, Grant #1P01CA196596.
Computation for the work described in this paper was supported by the University of Southern California’s Center for High-Performance Computing (hpcc.usc.edu).