-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Worker lifecycle hooks #3300
Comments
The best current work on this is here: #2844 How do the cloud providers signal that it's time for the process to wrap up? Is it with SIGINT? If so then we might be able to just clean up that PR and be done. |
Or rather, the cleanest way to do this with that PR would be to send a SIGINT signal. We could expose the same functionality with a http route or whatnot as well though. |
It varies between cloud provider. On AWS, for example, there is a magic URL that you need to poll which will tell you when the instance will be terminated (if it is scheduled for termination). The most notice you will get is two minutes. This is why I think we should enable some way of adding hooks because that polling logic doesn't really belong in here. Maybe it should be part of Either way if the process gets notified if imminent shutdown sending a SIGINT sounds reasonable. I'm not sure what the right way is to implement this. |
There is a We would then create a WorkerPlugin or preload script that periodically checked that address (once every five seconds?) and then called |
Ah great ok! I think We would need to schedule this as a repeating async task as I assume Then in |
|
On the AWS' ECS/EC2 case this post should be helpful. For FARGATE spot pricing I am not sure what the callbacks are yet. |
Thanks @sodre. I was just planning on polling http://169.254.169.254/latest/meta-data/spot/termination-time. I think this should still work within Fargate. Thanks @mrocklin will do! I think I have enough information to move over to |
Sometimes workers get killed, memory is lost and tasks need to be run again. Many cloud providers have a cheaper compute option which can be removed at any time in exchange for a discount and this regularly happens when using these services.
Most of these services offer a warning ahead of the machines being pulled. It would be nice to take advantage of this warning, stop workers from executing new tasks and ask them to shuffle memory to other workers.
In Kubernetes a node can be cordoned (do not accept new work) and drained (move existing work to another node) via API calls. This is the kind of functionality that would be useful here also.
A couple of questions:
The text was updated successfully, but these errors were encountered: