-
-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add idle cluster cleanup #667
Comments
This would be a massive quality of live improvement and cost saver. We have hundreds of ephemeral operator controlled dask clusters being spun up every day. The processes which create them can abruptly fail and miss cleanup hooks, especially if initiated on spot instances. There isn't a trivial way for us to automate cleanup for these currently. |
We are also very interested in this feature. Update: I noticed there is already open POC #672 and it uses not-yet-documented Scheduler HTTP API |
That is super exciting! I'd love to chat sometime about your experience with it.
Yeah that's the motivation behind this issue.
The POC #672 just needs a little love to push it over the line. Given the interest in this issue I'll definitely bump it up my priority list. If either of you have feedback on the design of that PR I'd love for you to comment there. |
Would be happy to! |
Awesome. Perhaps the Dask Slack is a good place to start this chat? Would you mind signing up and pinging me over there? |
Inspired by dask/dask-gateway#687 I think we should add a self cleanup for idle clusters here.
I expect the implementation would involve having some kind of configurable idle timeout in the
DaskCluster
resource and if this is set have the controller poll the scheduler via a timer to find out if it is idle. If it is idle for longer than the timeout the controller would delete theDaskCluster
resource.The text was updated successfully, but these errors were encountered: