transitioner: Fix race condition with file_upload_handler #3603
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of the Change
Fixes race condition between transitoner and file_upload_handler, leading to following symptoms:
The problem is caused by incorrect value of transition_time of affected workunit. One of possible, but not limited to, sequences of events leading to race condition in standard scenario (min_quorum=2) is:
The problem has been widely exposed on my Private GFN Server which have to deal with high amount of short tasks.
Solution
Since transitioner knows original transition_time of workunit it processes, check that it wasn't changed in progress. If it was, something has been updated in background and workunit will be scheduled for immediate retransition on the next scan. This operation must be atomic so it is done at database level (in SQL).
Alternate Designs
Release Notes