-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a deadlock when queued tasks are resubmitted quickly in succession #7348
Conversation
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 15 files ± 0 15 suites ±0 6h 29m 15s ⏱️ + 14m 34s For more details on these failures, see this check. Results for commit 9026d47. ± Comparison against base commit cff33d5. ♻️ This comment has been updated with latest results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, great job on the description of what the test is doing. Thanks, @fjetter!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new test fails with queuing turned off.
ahhh.. thanks, that makes sense. I forgot about the additional CI config and was already concerned |
Closes #7200
There is a race condition if
task-finishes
andfree-keys
are submitted concurrently.Queued tasks could end up being transitioned to memory which is wrong because shortly after this the worker will have forgotten the data already.
Sending another free-keys in this situation is not absolutely necessary but safe since the scheduler guarantees ordering of messages to a worker, i.e. if the task is in queued there is no other worker supposed to have or compute this task until it is transitioned out of this state. The free-keys is just there for good measure and will be handled on worker side gracefully.
I added a test for processing as well to be on the safe side. This isn't asserted but the worker just computes the task twice, as it should from our "release and compute task" semantics.
The test is a bit involved due to how we define rootishness but I added hopefully sufficient commentary.