You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My jobs are executed successfully and they finish before the specified timeout (5 hour). However, it seems like the SLURM job keeps running even though the process has exited. I checked the trainer.log and it seems like submitit is ignoring the SIGTERM signal.
[2023-03-09 10:55:49,583][submitit][INFO] - Job completed successfully
[2023-03-09 10:55:49,585][submitit][WARNING] - Bypassing signal SIGTERM
[2023-03-09 10:55:49,585][submitit][WARNING] - Bypassing signal SIGTERM
[2023-03-09 10:55:49,585][submitit][WARNING] - Bypassing signal SIGTERM
[2023-03-09 10:55:49,585][submitit][WARNING] - Bypassing signal SIGTERM
[2023-03-09 10:55:49,585][submitit][WARNING] - Bypassing signal SIGTERM
[2023-03-09 10:55:49,585][submitit][WARNING] - Bypassing signal SIGTERM
[2023-03-09 10:55:49,585][submitit][WARNING] - Bypassing signal SIGTERM
[2023-03-09 10:55:49,586][submitit][WARNING] - Bypassing signal SIGTERM
I'm not sure if this a bug. I was wondering if there a way for the SLURM jobs to be killed before the timeout, after successful job completion? This would help save a lot of resources for other jobs in queue.
System Information:
Linux cedar1.cedar.computecanada.ca 3.10.0-1160.80.1.el7.x86_64 #1 SMP Tue Nov 8 15:48:59 UTC 2022 x86_64 GNU/Linux
The text was updated successfully, but these errors were encountered:
Hi,
I am using the Hydra submitit plugin to schedule Sweeps jobs in the Compute Canada cluster. I use the following config to schedule the sweeps:
My jobs are executed successfully and they finish before the specified timeout (5 hour). However, it seems like the SLURM job keeps running even though the process has exited. I checked the
trainer.log
and it seems likesubmitit
is ignoring the SIGTERM signal.I'm not sure if this a bug. I was wondering if there a way for the SLURM jobs to be killed before the timeout, after successful job completion? This would help save a lot of resources for other jobs in queue.
System Information:
The text was updated successfully, but these errors were encountered: