-
I have a group of workers running that I set up to just stay running until I cancelled them. We're going to need to temporarily shut down our cluster for a scheduled power outage and I'd like to pause the workers or the cloud database queue after they finish their next workflow so no jobs are stopped in the middle. Is there a nice way of doing this? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
For now, you can pause a cluster but can't pause a worker. If you submit a worker, it's going to run for as long as you told it too -- unless you kill the worker manually. So running And to make it so you don't have to keep resubmitting shorter-lived workers, you should use the (note: the |
Beta Was this translation helpful? Give feedback.
For now, you can pause a cluster but can't pause a worker.
If you submit a worker, it's going to run for as long as you told it too -- unless you kill the worker manually. So running
simmate engine start-worker
isn't the best idea. It's always better to do something likesimmate engine start-worker --nitems-max 10 --close-on-empty-queue
.And to make it so you don't have to keep resubmitting shorter-lived workers, you should use the
simmate engine start-cluster --type slurm --continuous
command which will maintain a certain amount of short-lived workers submitting/running for you. Then whenever you need to pause things, you can kill thestart-cluster
process and the short-lived workers wil…