-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add synced dict between cluster and scheduler to store cluster info #5033
Conversation
we might want to add the scheduler_sync_interval to the dask options but for my tests I just passed it through the inits. don't have a strong preference and don't know if this is something people actually would want to adjust. making it somehow configurable is quite useful for tests, though |
Co-authored-by: Florian Jetter <fjetter@users.noreply.github.com>
Linux and macos tests appear to be flaky and unrelated. However Windows tests are valid failures. Looking into it 👀. |
Hrm all windows tests are still failing. The sync doesn't seem to be working. I'll investigate this next week. |
After pulling in latest changes CI seems to be failing inconsistently again, but it's not clear if it is related to this change. Running again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jacobtomlinson!
Thanks for the review @jrbourbeau. I've addressed all of your feedback. I note that CI is still failing just on Windows with timeouts in some other test. I'm going to get a working Windows environment set up locally today so I can try and reproduce and debug this. |
Thank you for your service 🙇♂️ |
I didn't manage this yesterday, got pulled onto something else. But it looks like pulling in latest changes from |
Given that things seem ok here now I intend to merge this tomorrow unless there are further comments. |
Closes #5031
Closes #4607
Related to #4263
This PR adds a
cluster_info
attribute to allCluster
objects which is a dictionary that is synced to the scheduler periodically. Any info already on the scheduler during_start
is merged into the dict inCluster
and then that dict is synced back to the scheduler every second.By default this dict just contains the cluster name and the class type. This will be useful in #5012 for advertising cluster manager provenance for use in
dask-ctl
.This is a good place to persist state at various points in the cluster lifecycle. Particularly when wanting to disconnect/reconnect from a running cluster. In cluster managers like
KubeCluster
this would be a good place to store all the config for the cluster. Things like worker CPU/memory settings for creating new worker pods. This will then be loaded back in when usingKubeCluster.from_name()
instead of having to try and serialise everything into some platform-specific metadata store as I attempted in dask/dask-kubernetes#318.I've also updated the
cluster.name
attribute with a property and setter to ensure that also lives within thecluster.cluster_info
dict and is synced back and forth.