Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Order assignment groups #10562

Closed
wants to merge 8 commits into from
Closed

Conversation

fjetter
Copy link
Member

@fjetter fjetter commented Oct 16, 2023

I teased in #10557 (comment) that this new rewrite would lend itself to follow the assignment groups idea that was developed in dask/distributed#7141

The actual changes for the assignment groups are all in commit aaf3f1f i.e. they are very minimal. I suspect this requires a bit more refinement to be actually used (and some logic on scheduler side as well).

Still, a couple of pretty pictures to motivate this work.

What the coloring in below's images shows is what I called (co-)assignment groups over in dask/distributed#7141 Their purpose has nothing to do with ordering itself but rather with a related problem which is task placement. When scheduling tasks just-in-time as we're doing it on the distributed scheduler, we cannot afford to do any look-ahead and have to assign tasks to a worker without knowledge about where the tasks result will be needed. However, if one knows that a set of tasks will reduce to the same reducer later on, it is quite beneficial to schedule them on the same worker to reduce latencies and network transfer. It could even be used to schedule tasks ahead of time (which is what we often call speculative task assignment).

So same groups here would be a suggestion to the scheduler to co-schedule the provided tasks

Example 1 test_reduce_with_many_common_dependents

This test was actually from a distributed test that is supposed to test coassignment. Perfect grouping

test_reduce_with_many_common_dependents

Example 2 test_order_with_equal_dependents

Most of leaf branches are correctly assigned to the same group. This would be great. Balancing of those groups is not ideal so we'd likely want to cut/split them somehow. Also, there is a little more fragmentation in the branches than what I'd like to see. Still, not a big problem.

test_order_with_equal_dependents

Example 3 test_flox_reduction

This is one of the array reductions I investigated over in #10535
The groups are almost perfectly assigned with two outliers.

test_flox_reduction

@mrocklin
Copy link
Member

mrocklin commented Oct 16, 2023 via email

@rjzamora
Copy link
Member

However, if one knows that a set of tasks will reduce to the same reducer later on, it is quite beneficial to schedule them on the same worker to reduce latencies and network transfer. It could even be used to schedule tasks ahead of time (which is what we often call speculative task assignment).

I think it's safe to say I'm supportive of these kinds of improvements. What is the high-level status with this and #10557? Do you feel that you are running into blockers and need help on any of these fronts?

@fjetter
Copy link
Member Author

fjetter commented Nov 2, 2023

What is the high-level status with this and

I paused development on #10557 because I couldn't easily manage runtime performance of the algorithm itself. At least micro benchmarking of it is a mixed bag (see #10557 (comment) for some micro benchmarks of dask-benchmark)

The ordering that new algorithm is producing is almost certainly better than what is on main and it's new structure lends itself quite easily to compute these co-assignment groups as proposed here. However, the algorithm as it is right now is extremely greedy which causes it to be rather slow. I think this is all something I should do myself or at least do a little more prep work explaining the algo in #10557 to have a conversation.

As most changes to such critical code, testing is the most difficult part. I need to run more tests on large and realistic graphs, possibly extracting them into micro benchmarks. This is an area where I would definitely appreciate help. Either code examples on random data or even raw graphs I can micro benchmark would be wonderful.

@rjzamora
Copy link
Member

rjzamora commented Nov 2, 2023

As most changes to such critical code, testing is the most difficult part. I need to run more tests on large and realistic graphs, possibly extracting them into micro benchmarks. This is an area where I would definitely appreciate help. Either code examples on random data or even raw graphs I can micro benchmark would be wonderful.

Great. Hopefully I can help with this. I've been meaning to boil down some of the challenging patterns we have seen in large scale data preparation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants