Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Breaking change] FileStream.Position is updated AFTER ReadAsync|WriteAsync completes #50858

Closed
adamsitnik opened this issue Apr 7, 2021 · 10 comments
Labels
area-System.IO breaking-change Issue or PR that represents a breaking API or functional change over a prerelease.
Milestone

Comments

@adamsitnik
Copy link
Member

adamsitnik commented Apr 7, 2021

FileStream has never been thread-safe, but until .NET 6 we have tried to somehow support multiple concurrent calls to its async methods (ReadAsync & WriteAsync).
We were doing that only on Windows and according to my knowledge, this was not documented anywhere.

In order to allow for 100% asynchronous file IO with FileStream and fix the following issues:

#48813 has introduced locking in the FileStream buffering logic. Now when buffering is enabled (bufferSize passed to FileStream ctor is > 1) every ReadAsync & WriteAsync operation is serialized.

This means, that FileStream.Position is updated AFTER ReadAsync|WriteAsync completes. Not after the operation is started (the old, windows-specific behavior).

async Task AntiPatternDoNotReuseAsync(string path)
{
    byte[] userBuffer = new byte[4_000];

    using FileStream fs = new FileStream(path, FileMode.Create, FileAccess.Write, FileShare.None,
         bufferSize: 4096, useAsync: true); // buffering and async IO!

    Task[] writes = new Task[3]; 

    // the first write is buffered, as 4000 < 4096
    writes[0] = fs.WriteAsync(userBuffer, 0, userBuffer.Length); // no await
    Console.WriteLine(fs.Position); // 4000 for both .NET 5 and .NET 6

    // the second write starts the async operation of writing the buffer to the disk as buffer became full
    writes[1] = fs.WriteAsync(userBuffer, 0, userBuffer.Length); // no await
    // 8000 for .NET 5, for .NET 6 most likely not as the operation has not finished yet and the Position was not updated
    Console.WriteLine(fs.Position); 

    // the third write will most probably wait for the lock to be released
    writes[2] = fs.WriteAsync(userBuffer, 0, userBuffer.Length); // no await
    // 12000 for .NET 5, for .NET 6 most likely not as the operation has not even started yet and the Position was not updated
    Console.WriteLine(fs.Position);

    await Task.WhenAll(writes);
    Console.WriteLine(fs.Position); // the Position is now up-to-date (12000)
}

Pre-change behavior:

4000
8000
12000
12000

Post-change behavior:

4000
?
?
12000

To enable the .NET 5 behavior, users can specify an AppContext switch or an environment variable:

{
    "configProperties": {
        "System.IO.UseNet5CompatFileStream": true
    }
}
set DOTNET_SYSTEM_IO_USENET5COMPATFILESTREAM=1

@dotnet/compat @stephentoub @jozkee @carlossanlop @jeffhandley

@adamsitnik adamsitnik added area-System.IO breaking-change Issue or PR that represents a breaking API or functional change over a prerelease. labels Apr 7, 2021
@adamsitnik adamsitnik added this to the 6.0.0 milestone Apr 7, 2021
@ghost
Copy link

ghost commented Apr 7, 2021

Tagging subscribers to this area: @carlossanlop
See info in area-owners.md if you want to be subscribed.

Issue Details

FileStream has never been thread-safe, but until .NET 6 we have tried to somehow support multiple concurrent calls to its async methods (ReadAsync & WriteAsync).
We were doing that only on Windows and according to my knowledge, this was not documented anywhere.

In order to allow for 100% asynchronous file IO with FileStream and fix the following issues:

#48813 has introduced locking in the FileStream buffering logic. Now when buffering is enabled (bufferSize passed to FileStream ctor is > 1) every ReadAsync & WriteAsync operation is serialized.

This means, that FileStream.Position is updated AFTER ReadAsync|WriteAsync completes. Not after the operation is started (the old, windows-specific behaviour).

byte[] bytes = new byte[10_000];
string path = Path.Combine(Path.GetTempPath(), Path.GetTempFileName());

using (FileStream fs = new FileStream(path, FileMode.Create, FileAccess.ReadWrite, FileShare.None, bufferSize: 4096, useAsync: true))
{
    Task[] writes = new Task[3];
    
    writes[0] = fs.WriteAsync(bytes, 0, bytes.Length);
    Console.WriteLine(fs.Position);

    writes[1] = fs.WriteAsync(bytes, 0, bytes.Length);
    Console.WriteLine(fs.Position);

    writes[2] = fs.WriteAsync(bytes, 0, bytes.Length);
    Console.WriteLine(fs.Position);

    await Task.WhenAll(writes);
    Console.WriteLine(fs.Position);
}

Pre-change behavior:

10000
20000
30000
30000

Post-change behavior:

0
0
0
30000

To enable the .NET 5 behaviour, users can specify an AppContext switch or an environment variable:

{
    "configProperties": {
        "System.IO.UseNet5CompatFileStream": true
    }
}
set DOTNET_SYSTEM_IO_USENET5COMPATFILESTREAM=1

@dotnet/compat @stephentoub @jozkee @carlossanlop @jeffhandley

Author: adamsitnik
Assignees: -
Labels:

area-System.IO, breaking-change

Milestone: 6.0.0

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Apr 7, 2021
@adamsitnik adamsitnik removed the untriaged New issue has not been triaged by the area owner label Apr 7, 2021
@benaadams
Copy link
Member

Aside; if there is a breaking change on Position does it also drop the call to SetFilePointer?

From "Windows Server Performance Team Blog: Designing Applications for High Performance – Part III" (June 25, 2008)

The old DOS SetFilePointer API is an anachronism. One should specify the file offset in the overlapped structure even for synchronous I/O.

@adamsitnik
Copy link
Member Author

Aside; if there is a breaking change on Position does it also drop the call to SetFilePointer?

This was the 2nd breaking change (out of 2 so far) that we have introduced and I've filled a separate issue for it: #50860

@weltkante
Copy link
Contributor

weltkante commented Apr 12, 2021

What is the behavior of synchronously issuing multiple WriteAsync calls, not waiting until they complete? That is an important scenario I'd expect to continue to work. The async methods are not necessarily about thread-concurrency, but also about interleaving calls on a singlethreaded application.

The issue description sounds like the caller will have to maintain its own position if he does any seeking between interleaved calls, is that right? And that completing calls will corrupt the position (leave them in an arbitrary state depending on the ordering of user calls vs. async calls completing) - even in a single threaded application?

I would have expected FileStream to provide a synchronous view of the "current position" to the caller so he can properly orchestrate interleaved operations. The position not being reliable if anyone ever calls an async method (as you don't know when it'll complete) sounds like it has potential of breaking a lot of code (and require to write a FileStream wrapper maintaining a stable position for orchestrating async writes)

@carlossanlop
Copy link
Member

carlossanlop commented Apr 20, 2021

@adamsitnik since #27643 and #16341 have been fixed, is there anything else we should address before closing this issue?

Since this is a breaking change that needs to be documented, I opened an issue in dotnet/docs: dotnet/docs#23858

@adamsitnik
Copy link
Member Author

Closing (dotnet/docs#24060)

@weltkante
Copy link
Contributor

weltkante commented Aug 11, 2021

Any word on my question/concerns above?

@adamsitnik
Copy link
Member Author

What is the behavior of synchronously issuing multiple WriteAsync calls, not waiting until they complete?

@weltkante I am not sure if I understand, could you provide a code sample?

@weltkante
Copy link
Contributor

weltkante commented Aug 11, 2021

Ah ok I thought this was something considered in the design because the issue title explicitly spells it out, so I was asking for clarification on that design. If FileStream.Position is updated when ReadAsync/WriteAsync completes my question was, what happens if I issue multiple such async calls synchronously from the same thread, not awaiting them immediately. To me the issue title sounds like this suddenly will become "undefined behavior" because at the time when the calls are issued the Position may or may not have been updated yet by the previously issued calls (depending on how fast I'm issuing them vs. how fast they are completing). This is not a concern of multithreading, but a concern of overlapping outstanding operations.

The original design of updating FileStream.Position when calls are issued was fully deterministic when issuing multiple interleaved calls since it didn't depend on the call to complete to determine the position for the next call.

If that wasn't considered yet I'll try to write some code to show what I mean. I assume I should start that investigation in a separate issue (if I find the behavior is as broken as it sounds)?

@weltkante
Copy link
Contributor

weltkante commented Aug 12, 2021

Never mind, I've read through the linked issues and realize that having multiple outstanding requests has been moved into RandomAccess static methods and you have to track the position yourself.

This is NOT updated in the documentation of FileStream.Read/WriteAsync, only as breaking change? Also without guiding to the new RandomAccess static methods for people who need multiple outstanding requests. Why not update documentation that you can't have multiple outstanding async operations on an ordinary FileStream?

@ghost ghost locked as resolved and limited conversation to collaborators Sep 11, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.IO breaking-change Issue or PR that represents a breaking API or functional change over a prerelease.
Projects
None yet
Development

No branches or pull requests

4 participants