Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a deadlock when queued tasks are resubmitted quickly in succession #7348

Merged
merged 4 commits into from
Nov 25, 2022

Conversation

fjetter
Copy link
Member

@fjetter fjetter commented Nov 24, 2022

Closes #7200

There is a race condition if task-finishes and free-keys are submitted concurrently.

Queued tasks could end up being transitioned to memory which is wrong because shortly after this the worker will have forgotten the data already.

Sending another free-keys in this situation is not absolutely necessary but safe since the scheduler guarantees ordering of messages to a worker, i.e. if the task is in queued there is no other worker supposed to have or compute this task until it is transitioned out of this state. The free-keys is just there for good measure and will be handled on worker side gracefully.

I added a test for processing as well to be on the safe side. This isn't asserted but the worker just computes the task twice, as it should from our "release and compute task" semantics.

The test is a bit involved due to how we define rootishness but I added hopefully sufficient commentary.

@fjetter fjetter requested review from crusaderky and hendrikmakait and removed request for crusaderky November 24, 2022 17:34
@fjetter fjetter self-assigned this Nov 24, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Nov 24, 2022

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       15 files  ±  0         15 suites  ±0   6h 29m 15s ⏱️ + 14m 34s
  3 228 tests +  2    3 140 ✔️ ±  0    83 💤 ±0  5 +2 
23 868 runs  +16  22 954 ✔️ +14  909 💤 +1  5 +1 

For more details on these failures, see this check.

Results for commit 9026d47. ± Comparison against base commit cff33d5.

♻️ This comment has been updated with latest results.

Copy link
Member

@hendrikmakait hendrikmakait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great job on the description of what the test is doing. Thanks, @fjetter!

@hendrikmakait hendrikmakait self-requested a review November 25, 2022 10:41
Copy link
Member

@hendrikmakait hendrikmakait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new test fails with queuing turned off.

@fjetter
Copy link
Member Author

fjetter commented Nov 25, 2022

The new test fails with queuing turned off.

ahhh.. thanks, that makes sense. I forgot about the additional CI config and was already concerned

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Transition queued->memory causes AssertionError
2 participants