Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow not being executed in parallel on batch computing nodes (SLURM cluster grouped execution) #2339

Open
gipert opened this issue Jul 1, 2023 · 5 comments

Comments

@gipert
Copy link

gipert commented Jul 1, 2023

I'm trying to write a profile to run this workflow on NERSC's supercomputer. Batch computing nodes have 128 CPU cores (x2 hyperthreads) and 512 GB of memory. Submission is managed through SLURM and maximum wall time is 12h.

My workflow is mostly composed by a large number of ~1h long, single-threaded jobs. I would like to instruct Snakemake to pack them efficiently and submit a much lower number of jobs to SLURM. Workflows running on a node should profit from all available resources and run in parallel.

This is what I've written so far:

configfile: config.json
keep-going: true
quiet: rules

# profit from Perlmutter's scratch area: https://docs.nersc.gov/filesystems/perlmutter-scratch
# NOTE: should actually set this through the command line, since there is a
# scratch directory for each user and variable expansion does not work here:
#   $ snakemake --shadow-prefix "$PSCRATCH" [...]
# shadow-prefix: "$PSCRATCH"

# NERSC uses the SLURM job scheduler
# - https://snakemake.readthedocs.io/en/stable/executing/cluster.html#executing-on-slurm-clusters
slurm: true

# maximum number of cores requested from the cluster or cloud scheduler
cores: 256
# maximum number of cores used locally, on the interactive node
local-cores: 256
# maximum number of jobs that can exist in the SLURM queue at a time
jobs: 50

# reasonable defaults that do not stress the scheduler
max-jobs-per-second: 20
max-status-checks-per-second: 20

# (LEGEND) NERSC-specific settings
# - https://snakemake.readthedocs.io/en/stable/executing/cluster.html#advanced-resource-specifications
# - https://docs.nersc.gov/jobs
default-resources:
  - slurm_account="m2676"
  - constraint="cpu"
  - runtime=120
  - mem_mb=500
  - slurm_extra="--qos regular --licenses scratch,cfs"

# number of threads used by each rule
set-threads:
  - tier_ver=1
  - tier_raw=1

# memory and runtime requirements for each single rule
# - https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#resources
# - https://docs.nersc.gov/jobs/#available-memory-for-applications-on-compute-nodes
set-resources:
  - tier_ver:mem_mb=500
  - tier_ver:runtime=120
  - tier_raw:mem_mb=500
  - tier_raw:runtime=120

# we define groups in order to let Snakemake group rule instances in the same
# SLURM job. relevant docs:
# - https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#snakefiles-grouping
# - https://snakemake.readthedocs.io/en/stable/executing/grouping.html#job-grouping
groups:
  - tier_ver=sims
  - tier_raw=sims

# disconnected parts of the workflow can run in parallel (at most 256 of them)
# in a group
group-components:
    - sims=256

And this is the relevant part of Snakemake's output:

> snakemake --profile workflow/profiles/nersc-batch --verbose
sbatch call: sbatch --job-name 02d1132e-27d6-4d5c-aed4-3a88e1d30e93 -o .snakemake/slurm_logs/group_sims/%j.log --export=ALL -A m2676 -t 120 -C cpu --mem 20000 --cpus-per-task=40 --qos regular --licenses scratch,cfs -D /global/cfs/cdirs/m2676/users/pertoldi/legend-prodenv/sims/benchmark-1 --wrap='/global/cfs/cdirs/m2676/users/pertoldi/legend-prodenv/tools/snakemake-mambaforge3/envs/snakemake/bin/python3.11 -m snakemake --snakefile '"'"'/global/cfs/cdirs/m2676/users/pertoldi/legend-prodenv/sims/benchmark-1/workflow/Snakefile'"'"' --target-jobs [ELIDED] --allowed-rules [tier_raw ... ELIDED ... tier_raw] --local-groupid '"'"'eaff24b4-a825-52a7-9d08-aac77f1f7b10'"'"' --cores '"'"'all'"'"' --attempt 1 --resources '"'"'mem_mb=20000'"'"' '"'"'disk_mib=38160'"'"' '"'"'disk_mb=40000'"'"' '"'"'mem_mib=19080'"'"' --wait-for-files-file '"'"'/global/cfs/cdirs/m2676/users/pertoldi/legend-prodenv/sims/benchmark-1/.snakemake/tmp.gg2t0bhd/snakejob_sims_eaff24b4-a825-52a7-9d08-aac77f1f7b10.waitforfilesfile.txt'"'"' --force --keep-target-files --keep-remote --max-inventory-time 0 --nocolor --notemp --no-hooks --nolock --ignore-incomplete --rerun-triggers '"'"'input'"'"' '"'"'code'"'"' '"'"'mtime'"'"' '"'"'params'"'"' '"'"'software-env'"'"' --skip-script-cleanup  --shadow-prefix '"'"'/pscratch/sd/p/pertoldi'"'"' --conda-frontend '"'"'mamba'"'"' --wrapper-prefix '"'"'https://github.com/snakemake/snakemake-wrappers/raw/'"'"' --configfiles '"'"'/global/cfs/cdirs/m2676/users/pertoldi/legend-prodenv/sims/benchmark-1/config.json'"'"' --latency-wait 5 --scheduler '"'"'greedy'"'"' --scheduler-solver-path '"'"'/global/cfs/cdirs/m2676/users/pertoldi/legend-prodenv/tools/snakemake-mambaforge3/envs/snakemake/bin'"'"' --set-resources '"'"'tier_ver:mem_mb=500'"'"' '"'"'tier_ver:runtime=120'"'"' '"'"'tier_raw:mem_mb=500'"'"' '"'"'tier_raw:runtime=120'"'"' --default-resources '"'"'mem_mb=500'"'"' '"'"'disk_mb=max(2*input.size_mb, 1000)'"'"' '"'"'tmpdir=system_tmpdir'"'"' '"'"'slurm_account="m2676"'"'"' '"'"'constraint="cpu"'"'"' '"'"'runtime=120'"'"' '"'"'slurm_extra="--qos regular --licenses scratch,cfs"'"'"'  --slurm-jobstep --jobs 1 --mode 2'
Job eaff24b4-a825-52a7-9d08-aac77f1f7b10 has been submitted with SLURM jobid 10939730 (log: .snakemake/slurm_logs/group_sims/10939730.log).

And this is the content of that log file:

> cat .snakemake/slurm_logs/group_sims/10939730.log
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 256
Rules claiming more threads will be scaled down.
Provided resources: mem_mb=20000, disk_mib=38160, disk_mb=40000, mem_mib=19080
Select jobs to execute...
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 40
Rules claiming more threads will be scaled down.
Provided resources: mem_mb=500, mem_mib=477, disk_mb=1000, disk_mib=954
Select jobs to execute...

[Sat Jul  1 09:28:58 2023]
Job 0: Producing output file for job 'raw.l200a-wls-reflector-Rn222-to-Po214.0'
Reason: Missing output files: /global/cfs/cdirs/m2676/users/pertoldi/legend-prodenv/sims/benchmark-1/generated/tier/raw/l200a-fibers-Rn222-to-Po214/l200a-fibers-Rn222-to-Po214_0000.root

Changing to shadow directory: /pscratch/sd/p/pertoldi/shadow/tmpb727ncaf
Write-protecting output file /global/cfs/cdirs/m2676/users/pertoldi/legend-prodenv/sims/benchmark-1/generated/tier/raw/l200a-fibers-Rn222-to-Po214/l200a-fibers-Rn222-to-Po214_0000.root.
[Sat Jul  1 09:32:15 2023]
Finished job 0.
1 of 1 steps (100%) done
Write-protecting output file /global/cfs/cdirs/m2676/users/pertoldi/legend-prodenv/sims/benchmark-1/generated/tier/raw/l200a-fibers-Rn222-to-Po214/l200a-fibers-Rn222-to-Po214_0000.root.
[Sat Jul  1 09:32:16 2023]
Finished job 23.
1 of 40 steps (2%) done
Select jobs to execute...
srun: Job 10939730 step creation temporarily disabled, retrying (Requested nodes are busy)
srun: Step created for StepId=10939730.1
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 40
Rules claiming more threads will be scaled down.
Provided resources: mem_mb=500, mem_mib=477, disk_mb=1000, disk_mib=954
Select jobs to execute...

[Sat Jul  1 09:33:15 2023]
Job 0: Producing output file for job 'raw.l200a-wls-reflector-Rn222-to-Po214.0'
[...]

As you can see, jobs are serially executed on the node even if they are independent between each other.

What's wrong in my profile?

@gipert
Copy link
Author

gipert commented Jul 2, 2023

Update: removing the --slurm-jobstep at the end of the Snakemake command being executed on the batch node seems to fix the issue. That option takes care of prepending the right srun call:

call = f"srun -n1 --cpu-bind=q {self.format_job_exec(job)}"

but why does this produce a serial execution?

@gipert
Copy link
Author

gipert commented Jul 2, 2023

Seems like I'm experiencing the same issue reported here: #2060

@cmeesters
Copy link
Member

sorry for looking to late into this issue - since Snakemake v8 the executor code for SLURM has its own repo.

Does the issue persist for you after updating?

@gipert
Copy link
Author

gipert commented May 6, 2024

I need to check again. Is this snakemake/snakemake-executor-plugin-slurm#29 resolved?

@pachi
Copy link

pachi commented Oct 18, 2024

I had a similar case and v7.32.3 had the same problem while v8.23.2 works as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants