-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Order assignment groups #10562
Conversation
I would buy a calendar with this artwork.
…On Mon, Oct 16, 2023 at 9:44 AM Florian Jetter ***@***.***> wrote:
I teased in #10557 (comment)
<#10557 (comment)> that
this new rewrite would lend itself to follow the assignment groups idea
that was developed in dask/distributed#7141
<dask/distributed#7141>
The actual changes for the assignment groups are all in commit aaf3f1f
<aaf3f1f>
i.e. they are very minimal. I suspect this requires a bit more refinement
to be actually used (and some logic on scheduler side as well).
Still, a couple of pretty pictures to motivate this work.
What the coloring in below's images shows is what I called (co-)assignment
groups over in dask/distributed#7141
<dask/distributed#7141> Their purpose has nothing
to do with ordering itself but rather with a related problem which is task
placement. When scheduling tasks just-in-time as we're doing it on the
distributed scheduler, we cannot afford to do any look-ahead and have to
assign tasks to a worker without knowledge about where the tasks result
will be needed. However, if one knows that a set of tasks will reduce to
the same reducer later on, it is quite beneficial to schedule them on the
same worker to reduce latencies and network transfer. It could even be used
to schedule tasks ahead of time (which is what we often call speculative
task assignment).
So same groups here would be a *suggestion* to the scheduler to
co-schedule the provided tasks
Example 1 test_reduce_with_many_common_dependents
This test was actually from a distributed test that is supposed to test
coassignment. Perfect grouping
[image: test_reduce_with_many_common_dependents]
<https://user-images.githubusercontent.com/8629629/275550195-0570f43a-0b91-4b41-b619-05e00c468c12.png>
Example 2 test_order_with_equal_dependents
Most of leaf branches are correctly assigned to the same group. This would
be great. Balancing of those groups is not ideal so we'd likely want to
cut/split them somehow. Also, there is a little more fragmentation in the
branches than what I'd like to see. Still, not a big problem.
[image: test_order_with_equal_dependents]
<https://user-images.githubusercontent.com/8629629/275550258-60b04c5b-64e8-4231-ae28-5cbd7246b4f0.png>
Example 3 test_flox_reduction
This is one of the array reductions I investigated over in #10535
<#10535>
The groups are almost perfectly assigned with two outliers.
[image: test_flox_reduction]
<https://user-images.githubusercontent.com/8629629/275550448-f527064a-ffd2-428f-b36e-93275d676f97.png>
------------------------------
You can view, comment on, or merge this pull request online at:
#10562
Commit Summary
- 27bb6f5
<27bb6f5>
more fixes to dask.order
- bbfa1d8
<bbfa1d8>
Rewrite dask.order
- d7493b9
<d7493b9>
Only reset layers_loaded after init_stack is empty
- 6387344
<6387344>
remove numpy import
- f7f1e42
<f7f1e42>
revert visualize
- dcd17f1
<dcd17f1>
perf fixes
- f297ad4
<f297ad4>
more perf fixes
- aaf3f1f
<aaf3f1f>
Prototype for assignment groups
File Changes
(3 files <https://github.com/dask/dask/pull/10562/files>)
- *M* dask/base.py
<https://github.com/dask/dask/pull/10562/files#diff-10422b02c591d63ee295724faa14f7698b4a742c98ba20771c5f70d1a6926d06>
(17)
- *M* dask/order.py
<https://github.com/dask/dask/pull/10562/files#diff-16341d447452e36a9d001fe3bcb08157cba6233ecbeb3b1f2d0af30b00c42677>
(680)
- *M* dask/tests/test_order.py
<https://github.com/dask/dask/pull/10562/files#diff-5ff706869bdb554d4e160ef54f0241ce96be47c22dae92ab483b20ab88037d2a>
(122)
Patch Links:
- https://github.com/dask/dask/pull/10562.patch
- https://github.com/dask/dask/pull/10562.diff
—
Reply to this email directly, view it on GitHub
<#10562>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTG4PHVNPDSLCJDRTJTX7VB5BAVCNFSM6AAAAAA6CKG3MWVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE2DKMZYG4YTSNQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
I think it's safe to say I'm supportive of these kinds of improvements. What is the high-level status with this and #10557? Do you feel that you are running into blockers and need help on any of these fronts? |
I paused development on #10557 because I couldn't easily manage runtime performance of the algorithm itself. At least micro benchmarking of it is a mixed bag (see #10557 (comment) for some micro benchmarks of dask-benchmark) The ordering that new algorithm is producing is almost certainly better than what is on main and it's new structure lends itself quite easily to compute these co-assignment groups as proposed here. However, the algorithm as it is right now is extremely greedy which causes it to be rather slow. I think this is all something I should do myself or at least do a little more prep work explaining the algo in #10557 to have a conversation. As most changes to such critical code, testing is the most difficult part. I need to run more tests on large and realistic graphs, possibly extracting them into micro benchmarks. This is an area where I would definitely appreciate help. Either code examples on random data or even raw graphs I can micro benchmark would be wonderful. |
Great. Hopefully I can help with this. I've been meaning to boil down some of the challenging patterns we have seen in large scale data preparation. |
I teased in #10557 (comment) that this new rewrite would lend itself to follow the assignment groups idea that was developed in dask/distributed#7141
The actual changes for the assignment groups are all in commit aaf3f1f i.e. they are very minimal. I suspect this requires a bit more refinement to be actually used (and some logic on scheduler side as well).
Still, a couple of pretty pictures to motivate this work.
What the coloring in below's images shows is what I called (co-)assignment groups over in dask/distributed#7141 Their purpose has nothing to do with ordering itself but rather with a related problem which is task placement. When scheduling tasks just-in-time as we're doing it on the distributed scheduler, we cannot afford to do any look-ahead and have to assign tasks to a worker without knowledge about where the tasks result will be needed. However, if one knows that a set of tasks will reduce to the same reducer later on, it is quite beneficial to schedule them on the same worker to reduce latencies and network transfer. It could even be used to schedule tasks ahead of time (which is what we often call speculative task assignment).
So same groups here would be a suggestion to the scheduler to co-schedule the provided tasks
Example 1 test_reduce_with_many_common_dependents
This test was actually from a distributed test that is supposed to test coassignment. Perfect grouping
Example 2 test_order_with_equal_dependents
Most of leaf branches are correctly assigned to the same group. This would be great. Balancing of those groups is not ideal so we'd likely want to cut/split them somehow. Also, there is a little more fragmentation in the branches than what I'd like to see. Still, not a big problem.
Example 3 test_flox_reduction
This is one of the array reductions I investigated over in #10535
The groups are almost perfectly assigned with two outliers.