Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dask.order] Remove non-runnable leaf nodes from ordering #10697

Merged
merged 7 commits into from
Dec 14, 2023

Conversation

fjetter
Copy link
Member

@fjetter fjetter commented Dec 13, 2023

This is a follow up to #10660 and addresses the still failing test in dask/distributed#8255

Most notably, this removes a special case in the process_runnable code path that caused this kind of graph to be executed too eagerly. The special code path was not hit in the specific unit test I added in test_order.py. I extended the test logic to cover this now.

This special branch was added to enable eager execution of some dangling code branches as observed in test_array_store_final_order

image

(The image shows the correct order)

The dangling branches I was referring to are those linear branches ending in P51 and P15. Without that special casing, those two branches would not have been executed causing the root to be left in memory until the end.

What this visualization is not showing is what causes the problem. This graph is actually reducing to a single task "store-12345": ["store-map-1", "store-map-2", ...] that is effectively an alias. This alias is throwing off the critical path algorithm such that those thin, small branches are effectively ignored and deemed as not valuable. Without that final reducer, the algorithm is forced to finish the connected graph first.
This weird alias is actually something that is not even properly understood by the visualization code. It's rendered as the box in the lower right corner but it doesn't know what to do with it.

@fjetter
Copy link
Member Author

fjetter commented Dec 13, 2023

Note: this also includes a mypy fix that is breaking distributed linting

Copy link
Member

@hendrikmakait hendrikmakait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good to me, though I do not fully understand the change.

dask/tests/test_order.py Outdated Show resolved Hide resolved
dask/tests/test_order.py Outdated Show resolved Hide resolved
dask/order.py Outdated Show resolved Hide resolved
dask/order.py Show resolved Hide resolved
Copy link
Member

@hendrikmakait hendrikmakait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after a second pass.

@fjetter
Copy link
Member Author

fjetter commented Dec 14, 2023

About the change, this is a similar change as #10619 but for leaf nodes instead of root nodes. There is a more thorough argument over there. I might follow up with the same for root nodes later but that is a little more tricky to do well

fjetter and others added 2 commits December 14, 2023 10:08
Co-authored-by: Hendrik Makait <hendrik@makait.com>
@fjetter fjetter changed the title Remove non-runnable leaf nodes from ordering [Dask.order] Remove non-runnable leaf nodes from ordering Dec 14, 2023
@fjetter fjetter merged commit 1105f9b into dask:main Dec 14, 2023
25 of 27 checks passed
@fjetter fjetter deleted the normalize_non_runnable_leafs branch December 14, 2023 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants