-
-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementing an Ersatz of ClusterManager to fix jobqueue issues linked to upstream deploy.Cluster limitations #170
Comments
I think that having dask-jobqueue break entirely from the classes in dask-distributed would be an interesting approach. Of the three projects (jobqueue, kubernetes, yarn) jobqueue seems to be the fastest moving right now. Rather than constrain your development by the slowness of the group, it might be useful to see what you come up with and how you restructure things. |
@mrocklin I'm currently working on the ClusterManager implementation here. I'm wondering if this wouldn't be cleaner if I just removed dependency on distributed.deploy from dask-jobqueue, and duplicate the needed part here. Currently I'm somewhere in between... |
But this is probably what you had in mind... |
I see no problem with breaking from the |
Closing this as SpecCluster implementation #306 is covering it. |
So we have a target: dask/distributed#2235.
We have some issues related to it:
worker_key
function in scale outside of adaptive #152: grouped worker and scale_downand perhaps less related: #103
I propose as discussed in some of the issues or PR mentioned above to try to fix those issues directly in dask-jobqueue. This will involve duplicating some of the logic of distributed.deploy.Cluster object here, but also give some interesting insights of how to refactor things for dask/distributed#2235.
I propose to do this gradually, and to slowly provides some PR that will fix issues one by one, and also analyse and underline some existing code pieces from the current deploy.Cluster mechanism that should be modified for dask/distributed#2235.
The text was updated successfully, but these errors were encountered: