Future: Support Multiple Dask Worker Types in DaskKubernetesEnvironment #1586

joeschmid · 2019-10-02T15:06:39Z

NOTE: this is a future issue, i.e. it's not something that we require immediately, but we wanted to file this issue to get it on your radar.

Current behavior

DaskKubernetesEnvironment provides a mechanism to specify k8s pod specs for the Dask workers and scheduler. Flows that deploy using DaskKubernetesEnvironment will be executed on a dynamically created Dask cluster, using the k8s pod specs provided. This is a really useful capability.

Proposed behavior

We have some use cases that would benefit from different worker types. Certain tasks in our Flows would benefit from GPUs, other tasks might need particularly high memory Dask workers, etc.

Prefect's existing task tagging (mapping to Dask resources) makes it very easy to have certain tasks execute on certain worker types. However, most Dask cluster management approaches (especially those using dask-kubernetes) assume a single, uniform worker type / k8s pod spec.

There is active work going on in dask-kubernetes to switch the KubeCluster implementation to be based on SpecCluster from dask distributed. The PR for this work is close to being merged: dask/dask-kubernetes#162 While it may not support multiple worker types initially, it might in the near future.

I hacked together a prototype of KubeCluster (using the old, non-SpecCluster approach) that supports multiple worker types in this PR: dask/dask-kubernetes#179 There is some discussion on that PR regarding dask-kubernetes support for multiple worker types.

When dask-kubernetes supports multiple worker pod specs, it would be nice to be able to use DaskKubernetesEnvironment to specify these worker pod specs, potentially with associated worker counts.

Example

Imagine a basic data science Flow:

Extract relevant data from data lake
Perform feature engineering
Train & test models using different ML approaches

In step 2 we have aspects of feature engineering that aren't well distributed today, i.e. don't take advantage of Dask, but perform Pandas operations and require large amounts of memory. In step 3 training some models wouldn't benefit from GPUs, but some would.

While we could just specify a cluster with uniform worker types all of which have GPUs and large RAM, that would be far more costly than if we used a mix of mostly generic, low-cost workers, plus a few GPU workers, plus a few high-memory workers.

We could also split this into separate Flows, but that seems a bit artificial just to get around this particular technical issue.

joshmeek · 2019-10-02T16:34:11Z

Great possible enhancement if the feature gets implemented in dask-kubernetes! Would make it really straightforward to implement the ability to set multiple worker pods on the DaskKubernetesEnvironment.

In your comment on the issue dask/dask-kubernetes#179 (comment) you mentioned using custom k8s yaml to manage dask on kubernetes and I'm wondering if you'd benefit from using the Kubernetes Job Environment which will allow you to specify your own yaml configurations for your deployment. With your deployment you could specify a scheduler pod and multiple replicas of your different worker pods. This would mimic the auto-created cluster nature of dask-kubernetes as it would be spun up on each new flow run.

Note: I have also done something similar to what you tried where I created a static dask cluster on k8s and two differently spec'd worker pods that connected to it.

xuevin · 2020-10-27T22:52:51Z

I want to bump this post to see if there are any viable solutions for this now. Having the ability to mix worker specs is a feature that would extremely useful !

zanieb · 2020-10-28T16:35:05Z

Linking to the relevant blocking dask issue dask/distributed#2118

joeschmid · 2020-10-28T16:59:24Z

I have a working implementation of multiple Dask worker types in my fork of Dask Cloud Provider here: https://github.com/joeschmid/dask-cloudprovider/tree/multiple-worker-types I've been meaning to clean it up and submit a PR to the DCP project and am just way behind on that.

jcrist · 2020-12-21T21:34:34Z

The DaskKubernetesEnvironment is now deprecated. Since this issue is blocked by upstream work in dask, I'm going to close this. Please continue discussions in the relevant dask repos.

Fixups for the dask task runner

joeschmid added the enhancement An improvement of an existing feature label Oct 2, 2019

joeschmid mentioned this issue Oct 2, 2019

WIP Support multiple worker types DO NOT MERGE dask/dask-kubernetes#179

Closed

jcrist closed this as completed Dec 21, 2020

zanieb added a commit that referenced this issue Apr 13, 2022

Merge pull request #1586 from PrefectHQ/dask-fixups

8a8604f

Fixups for the dask task runner

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Future: Support Multiple Dask Worker Types in DaskKubernetesEnvironment #1586

Future: Support Multiple Dask Worker Types in DaskKubernetesEnvironment #1586

joeschmid commented Oct 2, 2019

joshmeek commented Oct 2, 2019 •

edited

Loading

xuevin commented Oct 27, 2020

zanieb commented Oct 28, 2020

joeschmid commented Oct 28, 2020

jcrist commented Dec 21, 2020

Future: Support Multiple Dask Worker Types in DaskKubernetesEnvironment #1586

Future: Support Multiple Dask Worker Types in DaskKubernetesEnvironment #1586

Comments

joeschmid commented Oct 2, 2019

Current behavior

Proposed behavior

Example

joshmeek commented Oct 2, 2019 • edited Loading

xuevin commented Oct 27, 2020

zanieb commented Oct 28, 2020

joeschmid commented Oct 28, 2020

jcrist commented Dec 21, 2020

joshmeek commented Oct 2, 2019 •

edited

Loading