Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid expensive GetFileInformationByHandleEx syscall if possible #49541

Closed
adamsitnik opened this issue Mar 12, 2021 · 13 comments · Fixed by #49975
Closed

Avoid expensive GetFileInformationByHandleEx syscall if possible #49541

adamsitnik opened this issue Mar 12, 2021 · 13 comments · Fixed by #49975
Assignees
Labels
area-System.IO tenet-performance Performance related issue
Milestone

Comments

@adamsitnik
Copy link
Member

To solve #16354 we are going to track the file offset in memory and avoid expensive Seek calls (draft 91423fa):

SeekCore(_fileHandle, destination.Length, SeekOrigin.Current);

SeekCore(_fileHandle, source.Length, SeekOrigin.Current);

This will allow for removing another expensive SetLength call and solve #25905 (draft a7ca4cb):

To reach optimal syscall usage for async File IO and get 100% async IO on Windows we just need to get rid of calls to GetFileInformationByHandleEx (called by FileStream.Length which looks very innocent in the source code ;) ):

From my initial observations, it seems that we should be able to cache the Length. The reasoning is that by default files are opened with FileShare.Read property, which means "allow other processes to just read from the file":

private const FileShare DefaultShare = FileShare.Read;

This should allow us to cache the Length (as long as the user does not specify FileShare.Write in an explicit way which is very rare) and invalidate it on our own when:

  • user performs a write that extends the file
  • user performs a call to Position = x or Seek(x) that extends or shrinks the file
  • user performs a call to SetLength(x) that extends or shrinks the file
@adamsitnik adamsitnik added area-System.IO tenet-performance Performance related issue labels Mar 12, 2021
@adamsitnik adamsitnik added this to the 6.0.0 milestone Mar 12, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Mar 12, 2021
@adamsitnik adamsitnik removed the untriaged New issue has not been triaged by the area owner label Mar 12, 2021
@adamsitnik
Copy link
Member Author

One more idea for optimization that could be useful for small files: when the user opens the file for reading and is about the perform first ReadAsync (we know that the user is going to call Length anyway) we could call it before allocating the buffer (in BufferingFileStreamStrategy) and ensure that the buffer that we are about to allocate is not bigger than the file itself. It would be rather a small win, I'll update the description when I have more data about real-life usage.

@stephentoub
Copy link
Member

Backing up for a moment, what happens if we don't use the Length at all? e.g. this code:

long len = Length;
// Make sure we are reading from the position that we think we are
VerifyOSHandlePosition();
if (_filePosition + destination.Length > len)
{
if (_filePosition <= len)
{
destination = destination.Slice(0, (int)(len - _filePosition));
}
else
{
destination = default;
}
}

tries to limit a read to fit within the file's size... what happens if we don't do that?

@adamsitnik
Copy link
Member Author

In case the buffer is bigger than what is left in the file, we slice the buffer:

if (_filePosition <= len)
{
destination = destination.Slice(0, (int)(len - _filePosition));
}

and then we can safely move the file offset by destination.Length:

SeekCore(_fileHandle, destination.Length, SeekOrigin.Current);

If we don't perform this check, we might set the offset to a value that is > Length. It's not a problem for the next read operation (it would return 0 bytes or some kind of error), but it could be a problem for next write operation:

FileStream fs = new FileStream($path, $readAndWrite);
fs.SetLength(100);
byte[] bytes = new byte[200];
await fs.ReadAsync(bytes, 0, 200);
// if the above sets Position to 100, we are safe. 
// What happens if it sets it to 200? Is the next write going to create a sparse file?
await fs.WriteAsync(bytes, 0, 200);

We could ofc update the position once we receive the number of retrieved bytes:

internal static void IOCallback(uint errorCode, uint numBytes, NativeOverlapped* pOverlapped)

But it would work well only when users would be awaiting the operations.

@stephentoub
Copy link
Member

stephentoub commented Mar 12, 2021

We could ofc update the position once we receive the number of retrieved bytes:
But it would work well only when users would be awaiting the operations.

Right. It seems then this is an attempt to support concurrent operations on the FileStream? And if so, in the face of such concurrent operations does the right thing actually happen? What happens if there's an asynchronous error during the read, for example?

@adamsitnik
Copy link
Member Author

It seems then this is an attempt to support concurrent operations on the FileStream?

Yes! And I am not sure if OS will always return n bytes if n bytes are available. Is it possible that it might return less and we are going to miss some bytes?

cc @JeremyKuhne

@adamsitnik
Copy link
Member Author

What happens if there's an asynchronous error during the read, for example?

Before we throw the exception we try to reset the file offset to the current offset:

if (!_fileHandle.IsClosed && CanSeek) // Update Position - It could be anywhere.
{
SeekCore(_fileHandle, 0, SeekOrigin.Current);
}

Which I suspect would be the most recent value that we have passed to Seek? Which in case of multiple concurrent operations could be just anything?

FileStream fs = new FileStream($path, $readAndWrite);
fs.SetLength(3);
byte[] bytes1 = new byte[1], bytes2 = new byte[1], bytes3 = new byte[1];
Task[] reads = new Task[] { fs.ReadAsync(bytes1, 0, 1), fs.ReadAsync(bytes2, 0, 1), fs.ReadAsync(bytes3, 0, 1) };
// if reads[1] fails the fs.Position is set to ?

@stephentoub
Copy link
Member

What happens if there's an asynchronous error during the read, for example?

Before we throw the exception we try to reset the file offset to the current offset

That code is handling if the operation fails synchronously.

@adamsitnik
Copy link
Member Author

That code is handling if the operation fails synchronously.

Good point. So the actual error handling happens here:

Exception e = Win32Marshal.GetExceptionForWin32Error(errorCode);
e.SetCurrentStackTrace();
TrySetException(e);

and does not modify the offset?

@stephentoub
Copy link
Member

stephentoub commented Mar 12, 2021

and does not modify the offset?

That appears to be the case.

It's also not clear what should happen in that case. The whole notion of trying to allow for concurrent operations while also tracking position is fairly flawed, in particular if the operation might fail or return less than was requested.

@adamsitnik
Copy link
Member Author

The whole notion of trying to allow for concurrent operations while also tracking position is fairly flawed, in particular if the operation might fail or return less than was requested.

I totally agree. But what options do we have?

  1. Keep trying to allow for concurrent operations. This is somehow flawed by design and the rewrite is the first opportunity in 20 years (?) to change that.
  2. Add locking. This would be expensive, as locking adds non-trivial overhead. 99.99% of users that use FileStream in the correct way would have to pay for it.
  3. Detect concurrent operations and throw? On the one hand, we would very quickly find all incorrect usages, but on the other users who would get exceptions at runtime would not be happy. We are going to provide a config switch to enable legacy behavior so this would be easy to mitigate.

Personally, I would go with (3) as it would simplify the implementation a lot and it would be always correct (update offset after read|write async has finished with the actual number of bytes). Please keep in mind that I have very little experience with introducing breaking changes to .NET.

@stephentoub @jozkee @carlossanlop what do you think?

@stephentoub
Copy link
Member

stephentoub commented Mar 15, 2021

What specifically breaks if we just stop clipping read sizes to file length? A developer issuing concurrent reads not only has the workaround of falling back to legacy, they also can just do the clipping themselves if it matters, yes?

@adamsitnik
Copy link
Member Author

What specifically breaks if we just stop clipping read sizes to file length?

If we don't perform this check, we might set the offset to a value that is > Length. It's not a problem for the next read operation (it would return 0 bytes or some kind of error), but it could be a problem for the next write operation:

FileStream fs = new FileStream($path, $readAndWrite);
fs.SetLength(100);
byte[] bytes = new byte[200];
await fs.ReadAsync(bytes, 0, 200);
// if the above sets Position to 100, we are safe. 
// What happens if it sets it to 200? Is the next write going to create a sparse file?
await fs.WriteAsync(bytes, 0, 200);

@jozkee have you tried to see what happens?

A developer issuing concurrent reads not only has the workaround of falling back to legacy, they also can just do the clipping themselves if it matters, yes?

Yes, but I expect that most of the users don't perform any clipping as Read promises to return what is available, even if it's less than the buffer size.

@stephentoub
Copy link
Member

Yes, but I expect that most of the users don't perform any clipping as Read promises to return what is available, even if it's less than the buffer size.

Yes. But it wouldn't negatively affect them. It would only affect them if they then turned around and expected to be able to concurrently issue a write at the position reported before the read completed.

I really wonder if we're making this too hard on ourselves.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Mar 22, 2021
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Mar 23, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Apr 22, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.IO tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants