Skip to content
This repository has been archived by the owner on Feb 10, 2021. It is now read-only.

Fix handling of Job ID #56

Merged
merged 3 commits into from
Mar 9, 2018
Merged

Conversation

jakirkham
Copy link
Member

Fixes #32
Closes #33
Closes #46

Improves the handling of Job ID for different types of DRMAA clusters. Namely creates a session with DRMAA and queries drmsInfo on that session. This provides information about the specific cluster, which we then check against the supported types (i.e. SLURM, LSF, and SGE). Using this check, we set the correct environment variables for the Job ID and Task ID for each scheduler. In addition, we set what we call JOB_PARAM for each cluster, which represents the string the scheduler will replace with the Job ID when submitting a job. This should address some issues encountered by SLURM and LSF users.

@jakirkham
Copy link
Member Author

Have tested this on a containerized GE install that I have access to and our LSF cluster. Don't have access to SLURM to test against. So would appreciate help from SLURM users to test this.

cc @jcftang @ShigekiKarita

@mrocklin
Copy link
Member

Cool. I'm glad to see this.

@jcftang
Copy link

jcftang commented Feb 27, 2018

This looks good to me, however I wont have a chance to test this out till next week sometime

@jakirkham jakirkham force-pushed the fix_job_id_hdling branch 4 times, most recently from b77a575 to 8b4ab27 Compare February 27, 2018 19:52
In preparation for separating out the Job and Task IDs, wrap the worker
out/err template path line and the out/err path lines in the job
template.
@jakirkham
Copy link
Member Author

FWIW happy to do a fresh release of dask-drmaa after you have tested. Just a little incentive. ;)

@jcftang
Copy link

jcftang commented Mar 8, 2018

This appears to be generating the right bits and pieces from the testing that I have done (drmaa is crashing on my system for other reasons) I think this change looks good to me.

@jakirkham
Copy link
Member Author

Thanks @jcftang.

@jakirkham jakirkham merged commit 9651340 into dask:master Mar 9, 2018
@jakirkham jakirkham deleted the fix_job_id_hdling branch March 9, 2018 15:08
@jakirkham
Copy link
Member Author

Released in 0.1.1 on PyPI and conda-forge.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants