Fix a deadlock when queued tasks are resubmitted quickly in succession #7348

fjetter · 2022-11-24T17:34:23Z

There is a race condition if task-finishes and free-keys are submitted concurrently.

Queued tasks could end up being transitioned to memory which is wrong because shortly after this the worker will have forgotten the data already.

Sending another free-keys in this situation is not absolutely necessary but safe since the scheduler guarantees ordering of messages to a worker, i.e. if the task is in queued there is no other worker supposed to have or compute this task until it is transitioned out of this state. The free-keys is just there for good measure and will be handled on worker side gracefully.

I added a test for processing as well to be on the safe side. This isn't asserted but the worker just computes the task twice, as it should from our "release and compute task" semantics.

The test is a bit involved due to how we define rootishness but I added hopefully sufficient commentary.

github-actions · 2022-11-24T18:14:14Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

      15 files ±  0       15 suites ±0 6h 29m 15s ⏱️ + 14m 34s
  3 228 tests +  2   3 140 ✔️ ±  0   83 💤 ±0 5 ❌ +2
23 868 runs +16 22 954 ✔️ +14 909 💤 +1 5 ❌ +1

For more details on these failures, see this check.

Results for commit 9026d47. ± Comparison against base commit cff33d5.

♻️ This comment has been updated with latest results.

hendrikmakait

LGTM, great job on the description of what the test is doing. Thanks, @fjetter!

hendrikmakait

The new test fails with queuing turned off.

fjetter · 2022-11-25T13:52:11Z

The new test fails with queuing turned off.

ahhh.. thanks, that makes sense. I forgot about the additional CI config and was already concerned

Fix a deadlock when queued tasks are resubmitted quickly in succession

dcdcd4c

fjetter requested review from crusaderky and hendrikmakait and removed request for crusaderky November 24, 2022 17:34

fjetter self-assigned this Nov 24, 2022

Ensure all tasks are on the scheduler before proceeding

926198a

hendrikmakait approved these changes Nov 25, 2022

View reviewed changes

oops

9026d47

hendrikmakait self-requested a review November 25, 2022 10:41

hendrikmakait requested changes Nov 25, 2022

View reviewed changes

Skip if rootish is disabled

9ecb67c

hendrikmakait approved these changes Nov 25, 2022

View reviewed changes

fjetter merged commit 7c414cf into dask:main Nov 25, 2022

fjetter deleted the deadlock_queued branch November 25, 2022 15:59

gjoseph92 mentioned this pull request Nov 28, 2022

Issues with tasks completing on workers after being released and re-submitted #7356

Open

fjetter mentioned this pull request Feb 28, 2023

Dask: Stalling Tasks? #5879

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix a deadlock when queued tasks are resubmitted quickly in succession #7348

Fix a deadlock when queued tasks are resubmitted quickly in succession #7348

fjetter commented Nov 24, 2022

github-actions bot commented Nov 24, 2022 •

edited

Loading

hendrikmakait left a comment

hendrikmakait left a comment

fjetter commented Nov 25, 2022

Fix a deadlock when queued tasks are resubmitted quickly in succession #7348

Fix a deadlock when queued tasks are resubmitted quickly in succession #7348

Conversation

fjetter commented Nov 24, 2022

github-actions bot commented Nov 24, 2022 • edited Loading

Unit Test Results

hendrikmakait left a comment

Choose a reason for hiding this comment

hendrikmakait left a comment

Choose a reason for hiding this comment

fjetter commented Nov 25, 2022

github-actions bot commented Nov 24, 2022 •

edited

Loading