Support Multiple Worker Types #384

ljstrnadiii · 2022-01-18T00:28:05Z

As a machine learning engineer I would like to specify the resources for specific operations just like this example in the dask tutorial.

I am pretty new to dask, kubernetes, and helm, but have had success deploying with helm on GKE. And, I think I have seen enough examples of using nodeSelector for gpu node pools, gcs's daemonset for driver installs, etc. that I could get them up and running easily. However, the machines that have accelerators are often limited or we need to scale cpu and gpu independently.

Specifically, I wish to basically run the example in the link:

with dask.annotate(resources={'GPU': 1}):
    processed = [client.submit(process, d) for d in data]
with dask.annotate(resources={'MEMORY': 70e9}):
    final = client.submit(aggregate, processed)

From what I gather, this is not possible as is with a kubernetes-based deployment using HelmCluster or KubeCluster. I am also noticing that .scale is not designed for multiple worker types, which makes me think this might be a heavier lift than I think and looks like this issue might actually belong in dask-distributed.

dask-kubernetes/dask_kubernetes/helm.py

Line 258 in ce9a2d2

async def _scale(self, n_workers):

I am seeing the blocker from 2018 here: dask/distributed#2118

I have been able to manually (with kubectl) add a pod with the worker-spec.yaml example here and the scheduler shows it is available where cluster.scale() has no effect, but I honestly have no clue how to build a deployment to control that pod spec even if this hack worked.

note: it looks like ray is doing this here

Anyways, a few questions:

Is it possible to currently support multiple worker types? (in case I missed something)
Is there a deployment hack to take a worker-spec.yaml and deploy an auxiliary set of these workers? i.e. an intermediate step (without support by dask-distributed) to easily supplement an existing worker set with highmem or gpus, etc.

The text was updated successfully, but these errors were encountered:

ljstrnadiii · 2022-01-18T00:32:48Z

Example of a cpu limitation for a given gpu node.

jacobtomlinson · 2022-01-18T10:18:44Z

Thanks for raising this @ljstrnadiii. Excited to see users asking for this.

Scaling multiple worker groups is not well supported by any of the helper tools in Dask today, including dask-kubernetes. The annotations support is still relatively new.

We intend to address this as part of #256.

The only workaround I can suggest today is that you install the Dask Helm Chart and then manually create a second deployment of workers with GPUs and handle the scaling yourself.

ljstrnadiii · 2022-02-14T18:02:15Z

This should be closed, yeah?

ljstrnadiii · 2022-02-14T18:07:28Z

Really stoked to see progress on this!!!

dask-kubernetes/dask_kubernetes/helm.py

Line 244 in 812e403

def scale(self, n_workers, worker_group=None):

If this is working, which I hope to give a try this week, then this issue should be completely closed I think. I am not sure exactly where the annotation effort is at though.

jacobtomlinson · 2022-02-15T13:10:11Z

Yeah I'm going to close this out now. We have a blog post in the pipeline to demo this functionality dask/dask-blog#130

jacobtomlinson added enhancement help wanted labels Jan 18, 2022

jacobtomlinson mentioned this issue Jan 18, 2022

Add additional worker groups dask/helm-chart#221

Closed

jacobtomlinson closed this as completed Feb 15, 2022

ghost mentioned this issue Feb 17, 2022

Helm deployment with multiple worker configuration profiles dask/dask-gateway#470

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Multiple Worker Types #384

Support Multiple Worker Types #384

ljstrnadiii commented Jan 18, 2022

ljstrnadiii commented Jan 18, 2022 •

edited

Loading

jacobtomlinson commented Jan 18, 2022

ljstrnadiii commented Feb 14, 2022

ljstrnadiii commented Feb 14, 2022

jacobtomlinson commented Feb 15, 2022

Support Multiple Worker Types #384

Support Multiple Worker Types #384

Comments

ljstrnadiii commented Jan 18, 2022

ljstrnadiii commented Jan 18, 2022 • edited Loading

jacobtomlinson commented Jan 18, 2022

ljstrnadiii commented Feb 14, 2022

ljstrnadiii commented Feb 14, 2022

jacobtomlinson commented Feb 15, 2022

ljstrnadiii commented Jan 18, 2022 •

edited

Loading