-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make rootish heuristic sensitive to size of task group #8005
base: main
Are you sure you want to change the base?
Conversation
Currently we require rootish tasks to belong to a task group that has no more than five dependencies. Sometimes (as in a recent Xarray workload showed to me by @dcherian) this doesn't work because there are a few dependencies, but still far fewer than there are tasks in the task group. In his case there were 6000 tasks in the task group and 6 dependencies of that group, and so it was erroneously classified as non-rootish. This PR proposes a change to the logic, where we accept tasks whose groups have <1% of the number of dependencies as they have tasks in the group. So in Deepak's case because there are fewer than 6000 * 1% == 60 dependencies, these tasks get classified as rootish. Future note, we're doing this dynamically now, maybe we should be looking at data stored instead of number of tasks.
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 19 files + 15 19 suites +15 11h 43m 56s ⏱️ + 10h 37m 9s For more details on these failures, see this check. Results for commit e3d393b. ± Comparison against base commit 9d9702e. ♻️ This comment has been updated with latest results. |
To be clear above when I say "data stored" I mean number of bytes. If a large task group depends on relatively little data then we should probably think of it as rootish because running these tasks won't allow us to release lots of data |
Can confirm this fixes the issue in #7274 (comment) All open_dataset tasks are now detected as root-ish Thanks matt! |
🎉 |
@fjetter I think I'm probably handing this off to you. I think that the most useful thing is that we've identified an issue with @dcherian 's workload and the underlying problem (the magic number of five tasks in the rootish heuristic). The specific solution here of "change 5 tasks to 1% of tasks" is kinda arbitrary but maybe OK. I'm hopeful that at least identifying the problem is useful to you. |
I'll have a look asap |
One of our rechunking stress tests is failing. This indicates two problems
Digging into both will take a bit of time |
The first problem I think I leave to you 🙂 Context on the second problem (maybe just for me, but maybe also for others). There are 3000 tasks in the rechunk-split group that depend on a group with only 6 tasks (which is less than 30). The amount of data in each group is the same. Some options:
|
So we could do something like this: class TaskGroup:
...
@property
def nbytes_expected(self) -> int:
n_done = self.states["memory"] + self.states["released"] + self.states["forgotton"]
fraction_done = n_done / len(self)
if not n_done:
return 0
return self.nbytes_total / fraction_done
def is_rootish(self):
return self.nbytes_expected > 100 * sum(dep.nbytes_expected for dep in
self.dependencies)
class TaskState:
...
def is_rootish(self):
return self.group.is_rootish() This might get a little expensive. If so we might want to maintain a counter on the TaskGroup for how many tasks we've kept up with nbytes. |
Maybe something like this: index 369fd0706..5a93799ef 100644
--- a/distributed/scheduler.py
+++ b/distributed/scheduler.py
@@ -1035,6 +1035,10 @@ class TaskGroup:
#: The total number of bytes that this task group has produced
nbytes_total: int
+ #: The total number of tasks this task group has finished
+ n_finished: int
+ n_total: int
+
#: The total amount of time spent on all tasks in this TaskGroup
duration: float
@@ -1082,6 +1086,8 @@ class TaskGroup:
self.states = dict.fromkeys(ALL_TASK_STATES, 0)
self.dependencies = set()
self.nbytes_total = 0
+ self.n_finished = 0
+ self.n_total = 0
self.duration = 0
self.types = set()
self.start = 0.0
@@ -1105,6 +1111,7 @@ class TaskGroup:
def add(self, other: TaskState) -> None:
self.states[other.state] += 1
+ self.n_total += 1
other.group = self
def __repr__(self) -> str:
@@ -1119,7 +1126,7 @@ class TaskGroup:
)
def __len__(self) -> int:
- return sum(self.states.values())
+ return self.n_total
def _to_dict_no_nest(self, *, exclude: Container[str] = ()) -> dict[str, Any]:
"""Dictionary representation for debugging purposes.
@@ -1149,6 +1156,21 @@ class TaskGroup:
for state, count in self.states.items()
)
+ @property
+ def nbytes_expected(self) -> int:
+ if not n_finished:
+ return 0
+
+ return self.nbytes_total * self.n_total / self.n_finished
+
+ def is_rootish(self):
+ if self.resource_restrictions or self.worker_restrictions or self.host_restrictions:
+ return False
+ if not self.nbytes_total:
+ return False
+ return self.nbytes_expected > 100 * sum(dep.nbytes_expected for dep in
+ self.dependencies)
+
class TaskState:
"""A simple object holding information about a task.
@@ -1448,6 +1470,7 @@ class TaskState:
if old_nbytes >= 0:
diff -= old_nbytes
self.group.nbytes_total += diff
+ self.group.n_finished += 1
for ws in self.who_has:
ws.nbytes += diff
self.nbytes = nbytes
@@ -2279,7 +2302,7 @@ class SchedulerState:
"""
ts = self.tasks[key]
- if self.is_rootish(ts):
+ if ts.group.is_rootish():
# NOTE: having two root-ish methods is temporary. When the feature flag is |
Also, I was looking into rootishness and it looks like decide_worker has been sharded into lots of different possibilities with rootish and queued, which made me sad (but I can see how this would have made sense) |
I'm a bit confused about the suggestion of measuring nbytes. We need this kind of information before we're scheduling anything. We either have to decide to schedule greedily or queue up. We only ever get an nbytes measurement once a task has been scheduled. There are a couple of arguments over here #7531 I'm generally also less excited about expanding root-ish logic because everything we've learned so far points to dask.order being the issue here, see dask/dask#9995 for a writeup with some links for further reading.
You're not alone. If you want to dig into this further I suggest for you and I to have a chat about the current state and where we want this to end up |
Ah, I failed to think that we were assigning all of these at once. I'm curious, when does task-queuing come into play, only for rootish tasks I take it? |
Yes, if a task is classified according to |
Currently we require rootish tasks to belong to a task group that has no more than five dependencies.
Sometimes (as in a recent Xarray workload showed to me by @dcherian) this doesn't work because there are a few dependencies, but still far fewer than there are tasks in the task group. In his case there were 6000 tasks in the task group and 6 dependencies of that group, and so it
was erroneously classified as non-rootish.
This PR proposes a change to the logic, where we accept tasks whose groups have <1% of the number of dependencies as they have tasks in the group. So in Deepak's case because there are fewer than 6000 * 1% == 60 dependencies, these tasks get classified as rootish.
Future note, we're doing this dynamically now, maybe we should be looking at data stored instead of number of tasks.