Task co-assignment logic is worst-case for binary operations like `a + b` #6597

gjoseph92 · 2022-06-18T04:54:05Z

The root task co-assignment logic does the exact opposite of what it should for operations combining two different datasets, like a + b.

 x     x     x     x
 /\    /\    /\    /\
a  b  a  b  a  b  a  b
1  2  3  4  5  6  7  8  <-- priority

It assigns all the as to one worker, and all the bs to another. Each x then requires transferring an a or a b. So 50% of the data gets transferred. This could have been 0% if we had co-assigned properly.

The reason for this is that co-assignment selects a worker to re-use per task group. So it goes something like (recall that we're iterating through root tasks in priority order):

Assign a1. It has no last_worker set, so pick the least busy worker: w1.
Assign b1. It has no last_worker set, so pick the least busy worker. w1 already has a task assigned to it (a1), so we pick w2.
Assign the next 3 as to w1, and the next 3 bs to w2 (they come through interleaved, since they're interleaved in priority order)
Time to pick a new worker for a5. They're both equally busy; say we pick w2.
Time to pick a new worker for b5. We just made w2 slightly busier than w1, so pick w1.
Pattern continues. Each time we flip-flop, sending the tasks to opposite workers

The last-used worker should be global state (well, global to a particular sequence of transitions caused by update_graph). Each subsequent task in priority should re-use this worker until it's filled up, regardless of what task group the task belongs to.

The tricky part is calculating what "filled up" means. We currently use the size of the task group to decide how many root tasks in total there are, which we then divide by nthreads to decide how many to assign per worker. But of course, that's not actually the total number of root tasks. I'm not sure yet how to figure out the total number of root tasks in constant time within decide_worker.

Broadly speaking, this stateful and kinda hacky co-assignment logic is a bit of a pain to integrate into #6560. I've been able to do it, but maintaining good assignment while rebalancing tasks when adding and removing workers is difficult. Our co-assignment logic is too reliant on statefulness and getting to iterate through all the tasks at once in priority order, we can't actually re-co-assign things when workers change. If we had a data structure/mechanism to efficiently identify "which tasks are siblings of this one", or maybe even "which worker holds the task nearest in priority to this one", it might make solving both problems easier.

As a simple test that fails on main (each worker has transferred 4 keys):

@gen_cluster(
    client=True,
    nthreads=[("", 1), ("", 1)],
)
async def test_decide_worker_coschedule_order_binary_op(c, s, a, b):
    xs = [delayed(i, name=f"x-{i}") for i in range(8)]
    ys = [delayed(i, name=f"y-{i}") for i in range(8)]
    zs = [x + y for x, y in zip(xs, ys)]

    await c.gather(c.compute(zs))

    assert not a.transfer_incoming_log, [l["keys"] for l in a.transfer_incoming_log]
    assert not b.transfer_incoming_log, [l["keys"] for l in b.transfer_incoming_log]

Note that this case occurs in @TomNicholas's example workload: #6571

cc @fjetter @mrocklin

The text was updated successfully, but these errors were encountered:

When there were multiple root task groups, we were just re-using the last worker for every batch because it had nothing processing on it. Unintentionally this also fixes dask#6597 in some cases (because the first task goes to processing, but we measure queued, so we pick the same worker for both task groups)

Bit of a hack, but closes dask#6597. I'd like to have a better metric for the batch size, but I think this is about as good as we can get. Any reasonably large number will do here.

fjetter · 2022-09-02T16:02:04Z

#6985

gjoseph92 added bug Something is broken performance memory scheduling labels Jun 18, 2022

gjoseph92 mentioned this issue Jun 20, 2022

[WIP] Queue root tasks on scheduler, not workers [with co-assignment] #6598

Draft

2 tasks

fjetter mentioned this issue Jun 24, 2022

Root-task withholding without co-assignment #6631

Closed

gjoseph92 mentioned this issue Jul 7, 2022

Integration test: common physical science workload coiled/benchmarks#174

Closed

gjoseph92 mentioned this issue Aug 17, 2022

Withhold root tasks [no co assignment] #6614

Merged

2 tasks

fjetter closed this as completed in #6614 Aug 31, 2022

ian-r-rose mentioned this issue Aug 31, 2022

Package Sync coiled/benchmarks#235

Merged

This was referenced Sep 1, 2022

Performance regressions after queuing PR coiled/benchmarks#295

Open

Revert "Fix co-assignment for binary operations" #6985

Merged

fjetter reopened this Sep 2, 2022

fjetter mentioned this issue Nov 11, 2022

Validate stateless co-assignment algorithm #7298

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task co-assignment logic is worst-case for binary operations like `a + b` #6597

Task co-assignment logic is worst-case for binary operations like `a + b` #6597

gjoseph92 commented Jun 18, 2022 •

edited

Loading

fjetter commented Sep 2, 2022

Task co-assignment logic is worst-case for binary operations like a + b #6597

Task co-assignment logic is worst-case for binary operations like a + b #6597

Comments

gjoseph92 commented Jun 18, 2022 • edited Loading

fjetter commented Sep 2, 2022

Task co-assignment logic is worst-case for binary operations like `a + b` #6597

Task co-assignment logic is worst-case for binary operations like `a + b` #6597

gjoseph92 commented Jun 18, 2022 •

edited

Loading