Skip to content

slurm_scheduler: inherit cwd instead of image + skip mem request via cfg #372

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

d4l3k
Copy link
Member

@d4l3k d4l3k commented Jan 22, 2022

This makes the slurm_scheduler inherit the local cwd instead of specifying chdir to the image to better match the behavior of local_cwd and the other schedulers. It also adds a nomem=true cfg option so you can avoid requesting memory on pcluster nodes

Fixes #371
Fixes #359

Test plan:

pytest torchx/schedulers/test/slurm_scheduler_test.py
scripts/slurmint.sh

CI

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 22, 2022
@facebook-github-bot
Copy link
Contributor

@d4l3k has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@codecov
Copy link

codecov bot commented Jan 22, 2022

Codecov Report

Merging #372 (1fc7974) into main (a53732d) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #372   +/-   ##
=======================================
  Coverage   94.65%   94.66%           
=======================================
  Files          61       61           
  Lines        3202     3208    +6     
=======================================
+ Hits         3031     3037    +6     
  Misses        171      171           
Impacted Files Coverage Δ
torchx/schedulers/slurm_scheduler.py 96.94% <100.00%> (+0.14%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a53732d...1fc7974. Read the comment docs.

@facebook-github-bot
Copy link
Contributor

@d4l3k has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@d4l3k has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

slurm: environment improvements [Slurm scheduler] Add better support for specifying resources in slurm
2 participants