-
-
Notifications
You must be signed in to change notification settings - Fork 727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix decide_worker_rootish_queuing_disabled
assert
#7065
Fix decide_worker_rootish_queuing_disabled
assert
#7065
Conversation
Is it possible to write a unit test that deterministically reproduces the problem? As of #7062, the failure rate is 1/162. |
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 15 files ±0 15 suites ±0 6h 15m 2s ⏱️ - 7m 41s For more details on these failures, see this check. Results for commit 5592da9. ± Comparison against base commit 9c8ff86. ♻️ This comment has been updated with latest results. |
Yes, I can write one. It'll be a bit complicated though. |
@gjoseph92 what's the status on this? |
@gjoseph92 any update on this? |
This reverts commit f8b84c0.
The assertion probably did make sense. We shouldn't assign tasks to non-running workers.
f8b84c0
to
3d33954
Compare
I'm finding it pretty hard to come up with a test for this. I imagine this case is happening when But I don't understand how this could actually happen. A And this line is intended to clear distributed/distributed/scheduler.py Lines 2114 to 2116 in 0983731
My guess is that there's some edge case that that logic isn't catching. Generally though, if this rootish task is going waiting->processing, and other tasks in the group are currently released or waiting, I'd expect those tasks to also be recommended waiting->processing in the same transitions cycle. Then the last task would clear out It seems there's a case where that doesn't happen, so Bottom line is that I'm reluctant to spend more time trying to come up with a test for this edge case, especially when we're looking to move away from this code path anyway by turning queuing on by default, and by using a more sensible (and stateless) algorithm for co-assignment #7141. Instead, I've switched this to just not re-use a previous worker if it's not running, which seems reasonable enough. @fjetter @crusaderky how do you feel about merging this without a specific test? |
This is correct.
I think this is what is happening:
Personally I am quite unhappy to have half-dead, poorly tested code paths. |
I like the work-stealing theory. But more fundamentally, how could |
The good news: I pushed a deterministic reproducer. (work stealing is innocent). |
# - TaskGroup(y) has more than 4 tasks (total_nthreads * 2) | ||
# - TaskGroup(y) has less than 5 dependency groups | ||
# - TaskGroup(y) has less than 5 dependency tasks | ||
assert s.is_rootish(s.tasks["y-2"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing to note: while is_rootish(s.tasks["y-2"])
is true here, it was not true when we actually submitted y-2
. At that point, there were only 2 known tasks in the y
TaskGroup.
The only reason the y
tasks go down the rootish decide-worker code path is that decide_worker
doesn't get to run until the xs complete.
I don't think it matters for this test, but this is just broadly an issue with using client.submit
instead of dask delayed. There's a bit of logic in the scheduler that assumes graphs should be submitted in full up front, instead of incrementally with repeated submit
or map
s (for example, dask.order, is_rootish, #7141).
# - TaskGroup(y) has less than 5 dependency tasks | ||
assert s.is_rootish(s.tasks["y-2"]) | ||
|
||
await evx[0].set() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, this is basically the issue with
distributed/distributed/scheduler.py
Lines 2113 to 2116 in 0983731
# Record `last_worker`, or clear it on the final task | |
tg.last_worker = ( | |
ws if tg.states["released"] + tg.states["waiting"] > 1 else None | |
) |
The silly assumption this is making is effectively that all tasks in the TaskGroup have the same dependencies, so they're all runnable (and will all transition) at the same time. It's assuming that we're not going to have a subset of the group transition, then later transition another subset.
This is obviously not true, it just happens to pretty much always be the case in "common workloads" built with dask collections.
I've always really, really disliked the use of TaskGroups in the co-location logic in order to guess at things about graph structure. In this test, if you just change the names of some the y
tasks so they're in different groups, it'll probably pass. Task names should be opaque to dask and should not affect scheduling behavior. The combination of name-based heuristics with statefulness in the current implementation is even more problematic, and makes bugs like this possible.
To me, all this is just another argument for #7141. A static way of determining these groupings based purely on graph structure would eliminate the possibility of these kinds of bugs.
Furthermore, in the graph you've constructed, the y
tasks shouldn't even be treated as one co-assignment group. There should be three groups. The fact that there's only one group is just an artifact of the naming scheme (which totally can happen in real usage).
@crusaderky your new test is passing for me; that's expected right? It seems like the |
Yes, my new test fails on main and passes on your branch with no_queue. |
This seems to be actually (intermittently) failing Also maybe Also all the normal websockets stuff and #7208. |
Including #7212 in here to get CI more greenish, but let's merge that PR before this one. |
Closes #7063
pre-commit run --all-files