Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix WriteAsyncCancelledFile test failure #56895

Closed
wants to merge 3 commits into from

Conversation

adamsitnik
Copy link
Member

Try to fix #56894

Please ignore this PR until the CI is 100% green

@adamsitnik adamsitnik added this to the 6.0.0 milestone Aug 5, 2021
@ghost
Copy link

ghost commented Aug 5, 2021

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

Try to fix #56894

Please ignore this PR until the CI is 100% green

Author: adamsitnik
Assignees: -
Labels:

area-System.IO

Milestone: 6.0.0

@adamsitnik adamsitnik requested a review from jozkee August 5, 2021 16:08
@adamsitnik
Copy link
Member Author

@jozkee some explanation: in #56716 I've changed the code to use Interlocked.Add to update the position in WriteAsync. This particular test is using multiple (10) tasks, so most probably the lack of Interlocked.Read was causing to read an outdated value on x86. I am just guessing since I was not able to repro it locally (in the CI it failed only for WIn 7 x86)

get => _filePosition;
set => _filePosition = value;
get => Interlocked.Read(ref _filePosition);
set => Interlocked.Exchange(ref _filePosition, value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of performance impact might this end up having, in particular on 32-bit? Is this just a case of a test doing something unsupported, or we think this access pattern driving the need for this is valid?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of performance impact might this end up having, in particular on 32-bit?

For the following micro-benchmark:

[Benchmark]
[Arguments(OneKibibyte, FileOptions.None)]
[Arguments(OneKibibyte, FileOptions.Asynchronous)]
public void PositionGetSet(long fileSize, FileOptions options)
{
    string filePath = _sourceFilePaths[fileSize];
    using (FileStream fileStream = new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.Read, FourKibibytes, options))
    {
        for (long i = 0; fileStream.Position < fileSize; i++)
        {
            fileStream.Position = i;
        }
    }
}

For x86:

Method Job fileSize options Mean
PositionGetSet .NET 5 1024 None 730.0 us
PositionGetSet .NET 6 1024 None 69.95 us
PositionGetSet this PR 1024 None 329.94 us
PositionGetSet .NET 5 1024 Asynchronous 2,821.5 us
PositionGetSet .NET 6 1024 Asynchronous 80.85 us
PositionGetSet this PR 1024 Asynchronous 346.32 us

So the impact on x86 is quite huge, however, we are still few times faster than .NET 5 and I don't expect this method to be on a hot path.

Is this just a case of a test doing something unsupported, or we think this access pattern driving the need for this is valid?

The test is not doing anything unsupported. It starts an async write operation and tries to cancel it, if the cancellation succeeds, we check for file length and position to see that they are 0.

Based on my analysis of the source code and handling of WriteFile return values and callbacks my only idea was false sharing and the extra interlocked seems to solve the problem.

I am going to see if marking _position as volatile would help.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a volatile field cannot be of the type 'long'

So I don't have other ideas except of disabling this test for x86

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I have one more idea, but let's wait to see CI results before we get into details (it might be not worth it)

get => _filePosition;
set => _filePosition = value;
get => Interlocked.Read(ref _filePosition);
set => Interlocked.Exchange(ref _filePosition, value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can this have an outdated value? The failing test (WriteAsyncCancelledFile) is creating a new file and a new FileStream instance in each run, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be caused by false sharing, but I am just guessing: https://docs.microsoft.com/en-us/archive/msdn-magazine/2008/october/net-matters-false-sharing

Copy link
Member

@stephentoub stephentoub Aug 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, that's a good article. The lead author must really know what he's talking about. 😄

That said, this isn't false sharing. @jozkee is correct that this value shouldn't be outdated, as it comes linearly after the operations should have completed and quiesced with appropriate synchronization to ensure that all updates were published. I don't know why it's failing, nor why this fixes it, but it suggests a real problem with something potentially being done in a problematic order in the code, such as accidentally changing _filePosition after completing the task.

(When I previously commented, I hadn't actually looked at the test in question. I thought this was trying to fix a problem where the test was accessing position concurrently with asynchronous operations being in flight, in which case there could be torn reads/writes of a 64-bit value on a 32-bit platform. But that doesn't appear to be what this test is doing. Maybe something in the implementation itself is doing so?)

…ration finishes, but track the next async operation offset in a separate field and update it accordingly
@adamsitnik
Copy link
Member Author

I am going to close this PR and send a new one with the fix once I figure out what exactly is going on. Thank you all for feedback!

@adamsitnik adamsitnik closed this Aug 11, 2021
@adamsitnik adamsitnik deleted the issue56894 branch August 11, 2021 13:41
@ghost ghost locked as resolved and limited conversation to collaborators Sep 10, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FileStream_WriteAsync_AsyncWrites.WriteAsyncCancelledFile test failed
3 participants