-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Task co-assignment logic is worst-case for binary operations like a + b
#6597
Labels
Comments
gjoseph92
added a commit
to gjoseph92/distributed
that referenced
this issue
Jun 18, 2022
When there were multiple root task groups, we were just re-using the last worker for every batch because it had nothing processing on it. Unintentionally this also fixes dask#6597 in some cases (because the first task goes to processing, but we measure queued, so we pick the same worker for both task groups)
2 tasks
gjoseph92
added a commit
to gjoseph92/distributed
that referenced
this issue
Jun 22, 2022
When there were multiple root task groups, we were just re-using the last worker for every batch because it had nothing processing on it. Unintentionally this also fixes dask#6597 in some cases (because the first task goes to processing, but we measure queued, so we pick the same worker for both task groups)
gjoseph92
added a commit
to gjoseph92/distributed
that referenced
this issue
Jun 23, 2022
Bit of a hack, but closes dask#6597. I'd like to have a better metric for the batch size, but I think this is about as good as we can get. Any reasonably large number will do here.
2 tasks
This was referenced Sep 1, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The root task co-assignment logic does the exact opposite of what it should for operations combining two different datasets, like
a + b
.It assigns all the
a
s to one worker, and all theb
s to another. Eachx
then requires transferring ana
or ab
. So 50% of the data gets transferred. This could have been 0% if we had co-assigned properly.The reason for this is that co-assignment selects a worker to re-use per task group. So it goes something like (recall that we're iterating through root tasks in priority order):
a1
. It has nolast_worker
set, so pick the least busy worker:w1
.b1
. It has nolast_worker
set, so pick the least busy worker.w1
already has a task assigned to it (a1
), so we pickw2
.a
s tow1
, and the next 3b
s tow2
(they come through interleaved, since they're interleaved in priority order)a5
. They're both equally busy; say we pickw2
.b5
. We just madew2
slightly busier thanw1
, so pickw1
.The last-used worker should be global state (well, global to a particular sequence of
transitions
caused byupdate_graph
). Each subsequent task in priority should re-use this worker until it's filled up, regardless of what task group the task belongs to.The tricky part is calculating what "filled up" means. We currently use the size of the task group to decide how many root tasks in total there are, which we then divide by nthreads to decide how many to assign per worker. But of course, that's not actually the total number of root tasks. I'm not sure yet how to figure out the total number of root tasks in constant time within
decide_worker
.Broadly speaking, this stateful and kinda hacky co-assignment logic is a bit of a pain to integrate into #6560. I've been able to do it, but maintaining good assignment while rebalancing tasks when adding and removing workers is difficult. Our co-assignment logic is too reliant on statefulness and getting to iterate through all the tasks at once in priority order, we can't actually re-co-assign things when workers change. If we had a data structure/mechanism to efficiently identify "which tasks are siblings of this one", or maybe even "which worker holds the task nearest in priority to this one", it might make solving both problems easier.
As a simple test that fails on main (each worker has transferred 4 keys):
Note that this case occurs in @TomNicholas's example workload: #6571
cc @fjetter @mrocklin
The text was updated successfully, but these errors were encountered: