-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to disable automatic SLURM detection / signal handling? #5225
Comments
Hey @williamFalcon. Any idea on this one ? |
@jbohnslav a quick hack could be to delete the slurm env variables like so: import os
del os.environ["SLURM_NTASKS"]
del os.environ["SLURM_JOB_NAME"] at the beginning of the script, then lightning will not detect it as slurm. Another way could be to grab the original signal handler before trainer init, import signal
# before training init
original_handler = signal.getsignal(signal.SIGTERM)
# in on_fit_start hook
signal.signal(signal.SIGTERM, original_handler) (A while back I had a feature PR #3632 that added a configureable way to register signals) |
I just had a similar issue of wanting to deactivate SLURM detection (I use another library to deal with that and pytorch lightning is only one small component of my code). Given that wanting to deactivate SLURM detection seems a recurrent usecase (see also #6204 #6389 ) I think there should really just be a flag to the trainer (e.g. |
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team! |
❓ Questions and Help
What is your question?
I'm running single-GPU jobs on a SLURM cluster. PyTorch Lightning uses environment variables to detect that I'm on SLURM, and automatically interrupts SIGTERM signals. However, when I'm debugging, I don't want the SIGTERM to be bypassed-- I need to know where the signal is originating.
I can't seem to tell PytorchLightning to not use the Slurm handler, because it's automatically detected using environment variables. Is there any way to not use PL's default SLURM connector / SIGTERM bypass function?
The text was updated successfully, but these errors were encountered: