Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FileStream] update Position after ReadAsync completes #56093

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
using System.Buffers;
using System.Diagnostics;
using System.IO;
using System.IO.Strategies;
using System.Threading;
using System.Threading.Tasks.Sources;

Expand Down Expand Up @@ -49,6 +50,7 @@ internal sealed unsafe class OverlappedValueTaskSource : IValueTaskSource<int>,
internal ManualResetValueTaskSourceCore<int> _source; // mutable struct; do not make this readonly
private NativeOverlapped* _overlapped;
private CancellationTokenRegistration _cancellationRegistration;
private AsyncWindowsFileStreamStrategy? _strategy;
/// <summary>
/// 0 when the operation hasn't been scheduled, non-zero when either the operation has completed,
/// in which case its value is a packed combination of the error code and number of bytes, or when
Expand All @@ -74,9 +76,10 @@ internal static Exception GetIOError(int errorCode, string? path)
? ThrowHelper.CreateEndOfFileException()
: Win32Marshal.GetExceptionForWin32Error(errorCode, path);

internal NativeOverlapped* PrepareForOperation(ReadOnlyMemory<byte> memory, long fileOffset)
internal NativeOverlapped* PrepareForOperation(ReadOnlyMemory<byte> memory, long fileOffset, AsyncWindowsFileStreamStrategy? strategy = null)
{
_result = 0;
_strategy = strategy;
_memoryHandle = memory.Pin();
_overlapped = _fileHandle.ThreadPoolBinding!.AllocateNativeOverlapped(_preallocatedOverlapped);
_overlapped->OffsetLow = (int)fileOffset;
Expand Down Expand Up @@ -132,8 +135,9 @@ internal void RegisterForCancellation(CancellationToken cancellationToken)
}
}

internal void ReleaseResources()
private void ReleaseResources()
{
_strategy = null;
// Unpin any pinned buffer.
_memoryHandle.Dispose();

Expand Down Expand Up @@ -187,6 +191,7 @@ private static void IOCallback(uint errorCode, uint numBytes, NativeOverlapped*

internal void Complete(uint errorCode, uint numBytes)
{
AsyncWindowsFileStreamStrategy? strategy = _strategy;
ReleaseResources();

switch (errorCode)
Expand All @@ -195,6 +200,11 @@ internal void Complete(uint errorCode, uint numBytes)
case Interop.Errors.ERROR_BROKEN_PIPE:
case Interop.Errors.ERROR_NO_DATA:
case Interop.Errors.ERROR_HANDLE_EOF: // logically success with 0 bytes read (read at end of file)
if (numBytes > 0)
{
// update the position before setting the result
strategy?.UpdatePosition(numBytes);
}
// Success
_source.SetResult((int)numBytes);
break;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -242,14 +242,15 @@ internal static ValueTask<int> ReadAtOffsetAsync(SafeFileHandle handle, Memory<b
return ScheduleSyncReadAtOffsetAsync(handle, buffer, fileOffset, cancellationToken);
}

internal static unsafe (SafeFileHandle.OverlappedValueTaskSource? vts, int errorCode) QueueAsyncReadFile(SafeFileHandle handle, Memory<byte> buffer, long fileOffset, CancellationToken cancellationToken)
internal static unsafe (SafeFileHandle.OverlappedValueTaskSource? vts, int errorCode) QueueAsyncReadFile(SafeFileHandle handle, Memory<byte> buffer, long fileOffset,
CancellationToken cancellationToken, AsyncWindowsFileStreamStrategy? strategy = null)
{
handle.EnsureThreadPoolBindingInitialized();

SafeFileHandle.OverlappedValueTaskSource vts = handle.GetOverlappedValueTaskSource();
try
{
NativeOverlapped* nativeOverlapped = vts.PrepareForOperation(buffer, fileOffset);
NativeOverlapped* nativeOverlapped = vts.PrepareForOperation(buffer, fileOffset, strategy);
Debug.Assert(vts._memoryHandle.Pointer != null);

// Queue an async ReadFile operation.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,27 +35,10 @@ private unsafe ValueTask<int> ReadAsyncInternal(Memory<byte> destination, Cancel
ThrowHelper.ThrowNotSupportedException_UnreadableStream();
}

long positionBefore = _filePosition;
if (CanSeek)
{
long len = Length;
if (positionBefore + destination.Length > len)
{
destination = positionBefore <= len ?
destination.Slice(0, (int)(len - positionBefore)) :
default;
}

// When using overlapped IO, the OS is not supposed to
// touch the file pointer location at all. We will adjust it
// ourselves, but only in memory. This isn't threadsafe.
_filePosition += destination.Length;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that any Windows code being ported from .NET Framework to .NET 6 that was issuing concurrent reads based on position will now be broken. Just want to confirm that's intentional.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have already announced that for FileStream with buffering enabled, which is the default setting so I think it's OK: #50858

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other option is to enable consistency in the other direction: always update the position before the operation. Net result of this would be that on Unix we'd need to also patch the position back after a read if fewer bytes were read. But if we're saying that you can't trust the position until after the operation completes anyway, maybe that's ok and a smaller break?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other option is to enable consistency in the other direction: always update the position before the operation

On Unix, it would require us to perform a sys-call to obtain the file length (which we should not cache on Unix, because file locking is only advisory and others can still write to file) to make sure that the position is not >= EOF. If we would set _position > EOF, the next write operation could create a sparse file. Sample:

File.WriteAllBytes(path, new byte[100]);
using var fs = new FileStream(path, FileMode.Open, FileAccess.ReadWrite, FileShare.ReadWrite, bufferSize: 0, FileOptions.Asynchronous);
fs.ReadAsync(new byte[123]); // no await, the position is 123
fs.WriteAsync(new byte[] { 1 }); // the byte would get written at position 123, not 100

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephentoub you are right that this could break users who issue concurrent reads in .NET Framework. With #50858 we break them only in a way that the reads are not concurrent, they are serialized. With this change they could perform multuple reads at the same offset without knowing that. I am going to close the PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change they could perform multuple reads at the same offset without knowing that. I am going to close the PR

👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I still wonder if we should change the Unix behavior, though, to first update the position and then patch it back after the read completes. It would seem to enable more compatibility with existing Windows code, even though with the Unix implementation either way you shouldn't be relying on the Position while there's an active async operation in flight.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like the fact that we used to pretend that issuing concurrent reads and writes is OK with non-thread-safe FileStream. In a perfect world, we would always update the position after the operation completes and give compiler errors for invalid usage.

But getting back to your question I think that we can only choose the lesser evil here:

  • If we update the position after read completes, we have great perf (single sys-call), but the users who are misusing FileStream in the following way:
    • issuing multiple concurrent reads when buffering is disabled and the reads are not awaited: they end up reading content from the same offset (silent error)
    • issuing write before async read operation completes: they end up overwriting the file content (silent error)
  • If we update the position before read starts using current file length:
    • everyone has to pay for the perf penalty (additional sys-call)
    • if someone has extended the file in the meantime we still need to patch the position. Perhaps we could use some opportunistic caching anyway?
  • If we update the position before the read starts assuming that we are going to read entire buffer (_position += buffer.Length):
    • there is no perf penalty (we are not checking the file length up-front)
    • issuing concurrent reads should be fine, as long as pread returns 0 when reading from offset > EOF
    • issuing write before async read operation completes: we might end up with a sparse file. This should be extremely rare, as this is extreme misuse.

@stephentoub which option do you like the most? I am going to verify the pread result for offset > EOF in the meantime

Copy link
Member

@stephentoub stephentoub Jul 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably guessed given my previous comments, but I think option 3 (update position before read starts and then backpatch it if necessary) is the least bad.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like the fact that we used to pretend that issuing concurrent reads and writes is OK with non-thread-safe FileStream

100% agree. If we were doing it from scratch today, we would not do that.


(SafeFileHandle.OverlappedValueTaskSource? vts, int errorCode) = RandomAccess.QueueAsyncReadFile(_fileHandle, destination, positionBefore, cancellationToken);
(SafeFileHandle.OverlappedValueTaskSource? vts, int errorCode) = RandomAccess.QueueAsyncReadFile(_fileHandle, destination, _filePosition, cancellationToken, this);
return vts != null
? new ValueTask<int>(vts, vts.Version)
: (errorCode == 0) ? ValueTask.FromResult(0) : ValueTask.FromException<int>(HandleIOError(positionBefore, errorCode));
: (errorCode == 0) ? ValueTask.FromResult(0) : ValueTask.FromException<int>(SafeFileHandle.OverlappedValueTaskSource.GetIOError(errorCode, _fileHandle.Path));
}

public override Task WriteAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken)
Expand Down Expand Up @@ -149,5 +132,7 @@ public override IAsyncResult BeginWrite(byte[] buffer, int offset, int count, As
TaskToApm.Begin(WriteAsync(buffer, offset, count), callback, state);

public override void EndWrite(IAsyncResult asyncResult) => TaskToApm.End(asyncResult);

internal void UpdatePosition(uint numBytes) => _filePosition += numBytes;
}
}