-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Future: Support Multiple Dask Worker Types in DaskKubernetesEnvironment #1586
Comments
Great possible enhancement if the feature gets implemented in dask-kubernetes! Would make it really straightforward to implement the ability to set multiple worker pods on the In your comment on the issue dask/dask-kubernetes#179 (comment) you mentioned using custom k8s yaml to manage dask on kubernetes and I'm wondering if you'd benefit from using the Kubernetes Job Environment which will allow you to specify your own yaml configurations for your deployment. With your deployment you could specify a scheduler pod and multiple replicas of your different worker pods. This would mimic the auto-created cluster nature of dask-kubernetes as it would be spun up on each new flow run. Note: I have also done something similar to what you tried where I created a static dask cluster on k8s and two differently spec'd worker pods that connected to it. |
I want to bump this post to see if there are any viable solutions for this now. Having the ability to mix worker specs is a feature that would extremely useful ! |
Linking to the relevant blocking dask issue dask/distributed#2118 |
I have a working implementation of multiple Dask worker types in my fork of Dask Cloud Provider here: https://github.com/joeschmid/dask-cloudprovider/tree/multiple-worker-types I've been meaning to clean it up and submit a PR to the DCP project and am just way behind on that. |
The |
NOTE: this is a future issue, i.e. it's not something that we require immediately, but we wanted to file this issue to get it on your radar.
Current behavior
DaskKubernetesEnvironment provides a mechanism to specify k8s pod specs for the Dask workers and scheduler. Flows that deploy using DaskKubernetesEnvironment will be executed on a dynamically created Dask cluster, using the k8s pod specs provided. This is a really useful capability.
Proposed behavior
We have some use cases that would benefit from different worker types. Certain tasks in our Flows would benefit from GPUs, other tasks might need particularly high memory Dask workers, etc.
Prefect's existing task tagging (mapping to Dask resources) makes it very easy to have certain tasks execute on certain worker types. However, most Dask cluster management approaches (especially those using dask-kubernetes) assume a single, uniform worker type / k8s pod spec.
There is active work going on in dask-kubernetes to switch the KubeCluster implementation to be based on SpecCluster from dask distributed. The PR for this work is close to being merged: dask/dask-kubernetes#162 While it may not support multiple worker types initially, it might in the near future.
I hacked together a prototype of KubeCluster (using the old, non-SpecCluster approach) that supports multiple worker types in this PR: dask/dask-kubernetes#179 There is some discussion on that PR regarding dask-kubernetes support for multiple worker types.
When dask-kubernetes supports multiple worker pod specs, it would be nice to be able to use DaskKubernetesEnvironment to specify these worker pod specs, potentially with associated worker counts.
Example
Imagine a basic data science Flow:
In step 2 we have aspects of feature engineering that aren't well distributed today, i.e. don't take advantage of Dask, but perform Pandas operations and require large amounts of memory. In step 3 training some models wouldn't benefit from GPUs, but some would.
While we could just specify a cluster with uniform worker types all of which have GPUs and large RAM, that would be far more costly than if we used a mix of mostly generic, low-cost workers, plus a few GPU workers, plus a few high-memory workers.
We could also split this into separate Flows, but that seems a bit artificial just to get around this particular technical issue.
The text was updated successfully, but these errors were encountered: