Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we upstream cloud specific functionality that can be used by multiple subprojects? #4537

Closed
jacobtomlinson opened this issue Feb 23, 2021 · 3 comments

Comments

@jacobtomlinson
Copy link
Member

jacobtomlinson commented Feb 23, 2021

In dask/dask-cloudprovider#251 we just merged a worker plugin for use on Azure preemtible nodes.

Preemptible nodes are cheaper than regular nodes but can be withdrawn from service by Azure at times of high demand on their side. The node is given a warning ahead of being withdrawn via a metadata API accessible from the node. The worker plugin that has been contributed polls the metadata service and if it sees the node is going to be terminated it will gracefully shut the worker down and hopefully transfer tasks and state to other workers before the node is removed.

This plugin is useful in any package which can deploy workers onto Azure preemptible nodes, dask-kubernetes for example.

Most cloud vendors have some variation on this functionality, but we only have an Azure plugin for now.

I initially asked for the plugin to be contributed to dask-cloudprovider to try and keep cloud-related things together. Specifically to keep cloud vendor SDK dependencies contained to one package as they are quite heavy (although in the case of this specific plugin there are no Azure SDK dependencies used).

@TomAugspurger has suggested that as this plugin is useful in more packages than dask-cloudprovider it would feel odd to have to install both dask-kubernetes and dask-cloudprovider[azure] if you were an Azure Kubernetes user. He proposes that we upstream this plugin here.

While I agree with the logic of upstreaming code that is useful in multiple packages I have reservations about this.

  • Should distributed care about platform-specific functionality?
  • What happens if the GCP or AWS implementation of the preemptible plugin requires the SDK dependency?
  • There may be other cloud-specific utilities we want to create in the future beyond preemptible notices which also require dependencies.

Another suggestion made by @Timost would be to have a cloud utilities package separate from dask-cloudprovider. But IMO this feels like unnecessary maintenance work.

Raising this here for further discussion.

@TomAugspurger
Copy link
Member

Agreed that having a package separate from dask-cloudprovider or distributed would be overkill.

While the Azure one didn't add dependencies, I imagined that if it did it would have required an extra like distributed[azure]. But then there are issues with distributing conda packages.

So all together, dask-cloudprovider does seem to be like the best home for this. I'm happy to close this if others are OK with those utilities living in dask-cloudprovider.

@Timost
Copy link
Contributor

Timost commented Feb 23, 2021

Sounds good to me !

@TomAugspurger
Copy link
Member

No discussion here, so I think we're OK with putting cloud-specific functionality into dask-cloudprovider.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants