Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No block slurm #2965

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Conversation

OWissett
Copy link

@OWissett OWissett commented Oct 4, 2024

Motivation

I am using hydra for ML and I need to run my training on the HPC. Our HPC does not allow long running jobs on the login node, so to utilise the hydra submitit launcher it needs to be non-blocking.

Have you read the Contributing Guidelines on pull requests?

Yes

Test Plan

I have run the unit tests for plugin and also added an additional one specifically for the changes made.

I have not extensively tested this change with the sweeper, as I am not expecting to use the sweeper with it, however, I have added sentinel values for the job returns to prevent the sweeper from raising an error. It should be noted in the documentation that this feature is not designed to function with the sweeper. I do not think it needs to be, since sweeping generally does not take long running jobs, and should be performed on smaller datasets and training sessions which last less than a few hours.

Related Issues and PRs

This potentially fixes #2479.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 4, 2024
@OWissett
Copy link
Author

OWissett commented Oct 4, 2024

I am currently testing this on our workstation and cluster.

@OWissett OWissett marked this pull request as draft October 4, 2024 20:33
…auncher/submitit_launcher.py

Co-authored-by: vilhub <vilavil@gmail.com>
@PhilipVinc
Copy link

+1, this would be very useful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add an option to exit after the submitit launcher has scheduled the run on slurm
4 participants