-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose scheduler idle via RPC and HTTP API #7642
Merged
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jacobtomlinson just curious, where is this RPC used? I don't see a corresponding client method, or use on workers. Are you planning to call it in dask-k8s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah using it in dask-kubernetes, but only as a fallback if the HTTP API is not enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I actually missed this. I think "best practice" should be to register this in dask-k8s with a scheduler extension. If I just had a look at this code base, it would look like dead code and I might remove it.
We're rarely doing these refactorings but there is nothing here that tells us what is actually "public" and what isn't so there is always a risk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fallback of the fallback is to register a scheduler extension so that we can support older versions of distributed. My worry is that scheduler extensions still depend on scheduler internals like
Scheduler.idle_since
which could also change.There are tests in dask-kubernetes that cover this, so in terms of risk there is some mitigation. But it would be really nice to try and move towards a world where we have well defined public APIs in terms of the scheduler object (for extensions), the RPC and the HTTP API. Otherwise it is always going to be risky developing against the scheduler.
We could also expose this via the client and treat that as the public API instead of prodding the RPC directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a much bigger thing that I want it to be but yes I would love to see nothing of the actual Scheduler object to be public. It is just too big, there are too many small things, it's changing too quickly, ...
That's typically something I feel very comfortable with since it is very explicit that this is intended for external usage.
Just to be very clear: It's fine keeping it like this. "Fixing" this API leakage is very hard. I'm open to any suggestions but handing users (both in extension but especially in plugins) the entire object isn't maintainable, particularly not as long as there is no proper discipline around "using underscores"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally sympathise with that. Although scheduler plugins/extensions/run_on_scheduler are all well established patterns. So backing out from those at this point will be hard.
Most, if not all, of the cluster managers open an RPC to the scheduler and invoke methods directly. I think part of the challenge here is that some cluster managers live in distributed and some live in
dask/dask{kubernetes,jobqueue,yarn,gateway,cloudprovider,etc}
.For a long time the
dask-foo
projects were considered part of core Dask and it wasn't unusual to make a PR to both distributed and the cluster manager you were working on which were coupled. However this is a maintenance challenge because tests for code indistributed
run in other repos.Perhaps a good step forward would be for
dask-foo
projects to only interact with the scheduler via theClient
and the HTTP API. These are API surfaces that we can consider to be public.The question is what do we do with the base
Cluster
andSpecCluster
classes indistributed
? These both use the RPC directly and are intended to be subclassed by third-party libraries. Maybe these classes should be updated to stop using the RPC and to use aClient
instead?