-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Flag workers as non-active and pause in case of graceful shutdown #3564
Conversation
@fjetter checking in here. I notice that this was WIP/Draft. Did you want review here? |
It's in WIP because I couldn't cleanup the code yet and some tests are still failing. I wouldn't recommend to dive too deep into the code changes but there are a few specific things I'd like to get some feedback on The I noticed some irregularities between the nanny and a worker in terms of the states (and handlers) for closed / closing / closing-gracefully / retire / etc. Is there a general concept of how things should work usually? For instance, if there is a nanny, am I supposed to close the nanny AND the worker or would I expect that I just close the nanny and the nanny will then close the worker. Especially w.r.t. graceful downscaling this seems to be quite entangled. |
My guess is that there isn't a principled approach here, but that there should be. |
I think that keeping track of which workers are appropriate targets for work is generally a good idea. There are lots of things that we might want to check here, including if they're retiring, as you have now, but also if they're paused due to being out of memory, and probably other issues. I haven't yet had enough time to think about this well enough to have a good answer here. Probably it would be useful for someone to try to gather requirements on all the times when we might want to exclude workers from some activity, and the criteria by which we would want to exclude them. Having a larger set of situations might help us to make a decision like this with more clarity. cc'ing @crusaderky because I think that he's interested in this problem. |
This is still WIP and fails currently due to improper treatment of the pause/worker busy logic but I would like to address this in a separate PR since it turns out to be non trivial.
This PR is supposed to improve the treatment of worker retirement which I believe is still a valuable addition even if the replication/retirement logic is changed in the mid/long term.
There are a few assumptions in here:
Combining the two leaves us with with the fact that the
Scheduler.retire_workers
coroutine may not be finishing in time before the worker can complete it's shutdown, i.e. the coroutine may never hit theScheduler.remove_worker(address=w, safe=True)
call which removes the worker and all of the remaining tasks of the worker safely and transitions the tasks into the Ready state without increasing thesuspicious
counter.Worse even, due to the scheduler wide lock around the replication, the scheduler is not open to retire many workers sequentially if the first one blocks so the above issue may cascade.
This change introduces the concept of a retirement notification where the first step of the retirement is to remove the worker from an "active" list and notify the worker to prepare for shutdown. This has two benefits
workers
dict directly is because we still want to accept submission of it and still want to be able to properly close it after it has replicated its data, i.e. we still know it but we don't want to include it for new assignments.Closes #3526