Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix srun scripts location #3068

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

svandenhaute
Copy link

Fixes #3067 for the specific case of SrunLauncher, because in that case it is guaranteed that the block submission script exposes a shared directory in which intermediate helper files created by the launcher can be placed. In particular, the SLURM stderr/stdout location.

@benclifford
Copy link
Collaborator

To test this out, I tried this out on perlmutter. It fails with this *.stderr message:

bxc@perlmutter:login22:~/tmp/pr3068/parsl> cat .pytest/parsltest-current/runinfo/000/submit_scripts/parsl.PM_HTEX_multinode.block-0.1707743650.7868135.submit.stderr2 
dirname: missing operand
Try 'dirname --help' for more information.

That SLURM environment does not have the SLURM_JOB_STDOUT environment variable, at least not at the point that this script runs. Inside that environment (according to env) I get the following (some values censored)

SLURM_NODEID=0
SLURM_TASK_PID=1764758
SLURM_PRIO_PROCESS=0
SLURM_SUBMIT_DIR=/global/u1/b/bxc/tmp/pr3068/parsl
SLURM_JOB_LICENSES=u1:1
SLURM_PROCID=0
SLURM_JOB_GID=76945
SLURMD_NODENAME=nid005278
SLURM_JOB_END_TIME=1707744514
SLURM_TASKS_PER_NODE=1(x2)
SLURM_NNODES=2
SLURM_JOB_START_TIME=1707744214
SLURM_NTASKS_PER_NODE=1
SLURM_JOB_NODELIST=nid[005278,005458]
SLURM_CLUSTER_NAME=XXXXREMOVED
SLURM_NODELIST=nid[005278,005458]
SLURM_NTASKS=2
SLURM_JOB_CPUS_PER_NODE=256(x2)
SLURM_TOPOLOGY_ADDR=nid005278
SLURM_WORKING_CLUSTER=perlmutter:slurmctld_service.local:6817:9984:109
SLURM_JOB_NAME=parsl.PM_HTEX_multinode.block-0.1707743650.7868135
SLURM_JOBID=21640350
SLURM_NODE_ALIASES=(null)
SLURM_JOB_QOS=debug
SLURM_TOPOLOGY_ADDR_PATTERN=node
SLURM_CPUS_ON_NODE=256
SLURM_JOB_NUM_NODES=2
SLURM_JOB_UID=76945
SLURM_JOB_PARTITION=regular_milan_ss11
SLURM_SCRIPT_CONTEXT=prolog_task
SLURM_JOB_USER=XXXXREMOVED
SLURM_NPROCS=2
SLURM_SUBMIT_HOST=XXXXREMOVED
SLURM_JOB_ACCOUNT=XXXXREMOVED
SLURM_GTIDS=0
SLURM_JOB_ID=21640350
SLURM_LOCALID=0

@svandenhaute
Copy link
Author

Weird. This seems to depend on the specific SLURM environment then. I tested on EuroHPC’s Meluxina and it worked fine. Is it an option to put them in the directory of stderr/stdout when available, and otherwise default to the current behavior in which they get placed in the main working directory?

Alternatively, we could modify SlurmProvider to submit jobs directly from the stdout/stderr directory and add a line in the job template which cd’s to the working directory? SLURM_SUBMIT_DIR is always available I think

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Some launchers create intermediate .sh files outside of parsl's log directories
2 participants