-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Dask.order] Remove non-runnable leaf nodes from ordering #10697
Conversation
Note: this also includes a mypy fix that is breaking distributed linting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good to me, though I do not fully understand the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after a second pass.
About the change, this is a similar change as #10619 but for leaf nodes instead of root nodes. There is a more thorough argument over there. I might follow up with the same for root nodes later but that is a little more tricky to do well |
Co-authored-by: Hendrik Makait <hendrik@makait.com>
This is a follow up to #10660 and addresses the still failing test in dask/distributed#8255
Most notably, this removes a special case in the
process_runnable
code path that caused this kind of graph to be executed too eagerly. The special code path was not hit in the specific unit test I added intest_order.py
. I extended the test logic to cover this now.This special branch was added to enable eager execution of some dangling code branches as observed in
test_array_store_final_order
(The image shows the correct order)
The dangling branches I was referring to are those linear branches ending in P51 and P15. Without that special casing, those two branches would not have been executed causing the root to be left in memory until the end.
What this visualization is not showing is what causes the problem. This graph is actually reducing to a single task
"store-12345": ["store-map-1", "store-map-2", ...]
that is effectively an alias. This alias is throwing off the critical path algorithm such that those thin, small branches are effectively ignored and deemed as not valuable. Without that final reducer, the algorithm is forced to finish the connected graph first.This weird alias is actually something that is not even properly understood by the visualization code. It's rendered as the box in the lower right corner but it doesn't know what to do with it.