wait for in-flight ops before cloning File
#5803
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
If there is an ongoing operation on a file, wait for that to complete before performing the clone in
File::try_clone
. This avoids a race between the ongoing operation and any subsequent operations performed on the clone.Fixes: #5759
Motivation
#5759 shows a situation where a file is written to with
write_all
, and once that completes (Poll::Ready), cloned withtry_clone
. The file contents are then read back from the clone. In unlucky situations, this read produces an empty result because the read occurs before the write has completed.Solution
Like most methods on File, the
try_clone
function should wait until any previous asynchronously-completing operations on the file are complete before actually cloning the file.No Tests?
The failure fixed by this PR appears very difficult to test:
write(2)
syscall is in progress. Reproducing the race requires that the read happen before the invocation ofwrite(2)
. In practice, this bug reproduces best on a very busy system, such as in CI, where the worker thread that callswait(2)
is not scheduled immediately.spawn_blocking
which does not exhibit this bug.spawn_blocking
was blocked from executing, then when the fix was applied the test would block indefinitely.I'm open to suggestions as to how to test this, but perhaps the fix is simple enough to accept without them?