-
-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Multiple Worker Types #384
Comments
Thanks for raising this @ljstrnadiii. Excited to see users asking for this. Scaling multiple worker groups is not well supported by any of the helper tools in Dask today, including We intend to address this as part of #256. The only workaround I can suggest today is that you install the Dask Helm Chart and then manually create a second deployment of workers with GPUs and handle the scaling yourself. |
This should be closed, yeah? |
Really stoked to see progress on this!!! dask-kubernetes/dask_kubernetes/helm.py Line 244 in 812e403
If this is working, which I hope to give a try this week, then this issue should be completely closed I think. I am not sure exactly where the annotation effort is at though. |
Yeah I'm going to close this out now. We have a blog post in the pipeline to demo this functionality dask/dask-blog#130 |
As a machine learning engineer I would like to specify the resources for specific operations just like this example in the dask tutorial.
I am pretty new to dask, kubernetes, and helm, but have had success deploying with helm on GKE. And, I think I have seen enough examples of using nodeSelector for gpu node pools, gcs's daemonset for driver installs, etc. that I could get them up and running easily. However, the machines that have accelerators are often limited or we need to scale cpu and gpu independently.
Specifically, I wish to basically run the example in the link:
From what I gather, this is not possible as is with a kubernetes-based deployment using
HelmCluster
orKubeCluster
. I am also noticing that.scale
is not designed for multiple worker types, which makes me think this might be a heavier lift than I think and looks like this issue might actually belong in dask-distributed.dask-kubernetes/dask_kubernetes/helm.py
Line 258 in ce9a2d2
I am seeing the blocker from 2018 here: dask/distributed#2118
I have been able to manually (with
kubectl
) add a pod with theworker-spec.yaml
example here and the scheduler shows it is available where cluster.scale() has no effect, but I honestly have no clue how to build a deployment to control that pod spec even if this hack worked.note: it looks like ray is doing this here
Anyways, a few questions:
worker-spec.yaml
and deploy an auxiliary set of these workers? i.e. an intermediate step (without support by dask-distributed) to easily supplement an existing worker set with highmem or gpus, etc.The text was updated successfully, but these errors were encountered: