Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schedulers and workers do not know their cluster provenance #5031

Closed
jacobtomlinson opened this issue Jul 6, 2021 · 1 comment · Fixed by #5033
Closed

Schedulers and workers do not know their cluster provenance #5031

jacobtomlinson opened this issue Jul 6, 2021 · 1 comment · Fixed by #5033

Comments

@jacobtomlinson
Copy link
Member

jacobtomlinson commented Jul 6, 2021

When creating clusters via Cluster objects the provenance of that creation is not communicated downstream to each component.

If I create a cluster with

from dask.distributed import LocalCluster
cluster = LocalCluster()

And then inspect the cluster.scheduler object or create an RPC directly with the scheduler and have a poke around there is no reference to the cluster object that created it, the cluster's name or the fact that it was created by LocalCluster as opposed to KubeCluster.

In the longer-term goal to make cluster objects reconstructible (via tools like dask-ctl) it would be really useful if some information like the cluster ID and class is passed on to the scheduler.

This could also be helpful as part of #5012 as the scheduler could broadcast this information using mDNS which would aid discovery.

This could also tie in with #4607 where this information is placed into that store.

@jrbourbeau
Copy link
Member

Perhaps we could set some metadata on schedulers which are created via Cluster objects. For example, we could log the cluster's name and type (e.g. LocalCluster vs. KubeCluster)

diff --git a/distributed/deploy/cluster.py b/distributed/deploy/cluster.py
index 37ddc31f..41e6301e 100644
--- a/distributed/deploy/cluster.py
+++ b/distributed/deploy/cluster.py
@@ -9,7 +9,7 @@ from inspect import isawaitable
 from tornado.ioloop import PeriodicCallback

 import dask.config
-from dask.utils import _deprecated, format_bytes, parse_timedelta
+from dask.utils import _deprecated, format_bytes, parse_timedelta, typename

 from ..core import Status
 from ..objects import SchedulerInfo
@@ -70,6 +70,10 @@ class Cluster:
         self._watch_worker_status_task = asyncio.ensure_future(
             self._watch_worker_status(comm)
         )
+        await self.scheduler_comm.set_metadata(
+            keys=["cluster-manager-info"],
+            value={"type": typename(type(self)), "name": self.name},
+        )
         self.status = Status.running

     async def _close(self):
In [1]: from distributed import LocalCluster

In [2]: cluster = LocalCluster()

In [3]: cluster.scheduler.get_metadata(keys=["cluster-manager-info"])
Out[3]: {'type': 'distributed.deploy.local.LocalCluster', 'name': 'b6f241f6'}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants