-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FileStream file preallocation performance #45946
Comments
I like it on Windows but what are the benefits on Unix? |
Thanks for the suggestion. It would be interesting for someone to propose the API changes. (using template but in this issue) |
@danmosemsft I would propose adding an extra parameter to the FileSteam class so optimizations can be made throughout the framework: - public FileStream(String path, FileMode mode, FileAccess access, FileShare share, int bufferSize) {
+ public FileStream(String path, FileMode mode, FileAccess access, FileShare share, int bufferSize, long allocationSize) For example the FileSystem.CopyFile function: runtime/src/libraries/System.IO.FileSystem/src/System/IO/FileSystem.Unix.cs Lines 22 to 24 in 8e913a2
Can be changed to pass
File uploads via ASP.NET could also be optimized: (this uses the sample code from msdn)
Linux/Unix has disk fragmentation just like Windows but most linux/unix tools are using the fallocate syscall so its not really an issue but .NET doesn't use fallocate and so the performance of .NET is at a disadvantage on Linux compared to other tools. .NET should include support for fallocate on Linux/Unix: You can view the fragmentation using the fsck command: |
@dmex Under Unix I assume not only Linux. We need a consistency behavior on all supported platforms including MacOs, FreeBSD. So we need a workaround for Linux fallocate on the platforms.
|
BSD has posix_fallocate (BSD documentation) and OSX has There isn't much documentation on OSX about F_PREALLOCATE but there's an example here: https://lists.apple.com/archives/darwin-dev/2007/Dec/msg00040.html
preallocationSize would probably be more consistent and a lot better at conveying the meaning. |
Background and MotivationSpecifying file allocation size when creating the file can improve:
Proposed APInamespace System.IO
{
public class FileStream : Stream
{
public FileStream(string path, FileMode mode)
public FileStream(string path, FileMode mode, FileAccess access)
public FileStream(string path, FileMode mode, FileAccess access, FileShare share)
public FileStream(string path, FileMode mode, FileAccess access, FileShare share, int bufferSize)
public FileStream(string path, FileMode mode, FileAccess access, FileShare share, int bufferSize, bool useAsync)
public FileStream(string path, FileMode mode, FileAccess access, FileShare share, int bufferSize, FileOptions options)
+ public FileStream(string path, FileMode mode, FileAccess access, FileShare share, int bufferSize, FileOptions options, long allocationSize)
}
public partial class StreamWriter : System.IO.TextWriter
{
public StreamWriter(Stream stream)
public StreamWriter(Stream stream, System.Text.Encoding encoding)
public StreamWriter(Stream stream, System.Text.Encoding encoding, int bufferSize)
public StreamWriter(Stream stream, System.Text.Encoding? encoding = null, int bufferSize = -1, bool leaveOpen = false)
public StreamWriter(string path)
public StreamWriter(string path, bool append)
public StreamWriter(string path, bool append, System.Text.Encoding encoding)
+ public StreamWriter(string path, bool append = false, System.Text.Encoding? encoding, int bufferSize = -1, FileOptions options = FileOptions.None, long allocationSize = -1)
}
public static partial class File
{
public static FileStream Create(string path)
public static FileStream Create(string path, int bufferSize)
public static FileStream Create(string path, int bufferSize, FileOptions options)
+ public static FileStream Create(string path, int bufferSize, FileOptions options, long allocationSize)
}
public sealed partial class FileInfo : System.IO.FileSystemInfo
{
public FileInfo(string fileName)
public System.IO.FileStream Create()
+ public System.IO.FileStream Create(FileOptions options, long allocationSize)
}
} Note: Some of the methods above add Usage Examplesusing (FileStream source = File.Open(sourceFilePath))
using (FileStream destination = File.Create(destinationFilePath, allocationSize: source.Length))
{
source.CopyTo(destination);
} Alternative DesignsThe proposed design is so simple that it's hard to come up with an alternative. RisksThe API itself does not introduce breaking changes, but using it in all |
@adamsitnik I left a comment in the other API proposal that is relevant to this proposal (wrapping new arguments in a new type, rather than adding new parameters to existing APIs). |
namespace System.IO
{
public class FileStream : Stream
{
public FileStream(string path, FileMode mode, FileAccess access = appropriate default, FileShare share = appropriate default, int bufferSize = appropriate default, FileOptions options = appropriate default, long allocationSize = appropriate default)
}
public partial class StreamWriter : System.IO.TextWriter
{
public StreamWriter(string path, bool append = false, System.Text.Encoding? encoding, int bufferSize = -1, FileOptions options = FileOptions.None, long allocationSize = -1)
}
public static partial class File
{
public static FileStream Create(string path, int bufferSize = appropriate default, FileOptions options = appropriate default, long allocationSize = appropriate default)
}
public sealed partial class FileInfo : System.IO.FileSystemInfo
{
public System.IO.FileStream Create(FileOptions options = appropriate default, long allocationSize = appropriate default)
}
} |
This is the second issue that just approved making the overloads of these methods even longer. Did we discuss adding options bags and then a single overload taking that bag? (Had conflicts today and couldn't make the API reviews.) |
The question came up. Partially it was that no one had a good name. 3 of the 4 of these are building on members newly added from #24698. Hopefully they're just added as the one member with defaults, which really makes this just adding one new member, not 4. |
Right, I have the same concerns there, adding 27 (!) new overloads. I'm surprised these were approved. |
As well as being more overloads to select from, the longer set of parameters makes it more daunting to use the constructor, because there is no longer a clear progression of increasingly less used parameters, but instead you may need to comma past several that you want to keep as default in order to specify a value for the one you care about; and the code also becomes harder to read (eg., wider). Re naming, it seems we have a fair bit of precedent for XXOptions, as well as ProcessStartInfo. |
Yeah, but "FileOptions" is already the name of one of the parameter types (a flags enum). Already used the good name 😄 |
A few name suggestions... FileStreamOptions, FileStreamExtendedOptions, FileExtendedParameters The last one would probably be the most descriptive. |
I figured FileStreamOptions... Maybe we should have an analyzer in our SDK that enforces some kind of heuristic limit on # constructors x # parameters 🙂 |
With the recent overhaul around |
I've created a new proposal for the option bag: #52446 |
Is the name |
I thought it would be The usage example created by @adamsitnik in the proposal #52446 uses a fixed preallocation hint which isn't the best usage example: You generally only use preallocation with a known length: // Opening file for Read:
var basic = new FileStreamOptions
{
Path = @"C:\FrameworkDesignGuidelines.pdf",
Mode = FileMode.Open,
Access = FileAccess.Read,
};
var read = new FileStream(basic);
var advanced = new FileStreamOptions
{
Path = @"C:\Copy.pdf",
Mode = FileMode.CreateNew,
Access = FileAccess.Write,
Share = FileShare.None,
AllocationSize = read.Length // <-- Pass length as preallocation hint
};
var write = new FileStream(advanced);
read.CopyTo(write); AllocationSize would confuse some to think it's a buffer allocation rather than the physical file allocation preformed by the disk driver. |
In the main proposal of #52446 users can't provide
I wanted to avoid
The problem is that it's actually not the file size. If you specify the
I like this name, I am going to update the proposal My other idea is @dmex thanks for the improved example, I've added it to the proposal |
Either name works I guess.... I like preallocationSize since it's consistent with native implementations and helps when refering to documentation used elsewhere but then again most use-cases are going to be 'guaranteeing disk space' exists I guess. Either name works 👍 |
Edit by carlossanlop: API Proposal can be found here.
Description
The FileStream class doesn't currently support Windows file preallocation hints reducing the performance of file operations while also increasing the disk fragmentation of newly created files.
If we use the example for file uploads for aspnetcore here:
https://docs.microsoft.com/en-us/aspnet/core/mvc/models/file-uploads?view=aspnetcore-5.0#file-upload-scenarios
None of the .NET classes support passing the AllocationSize when creating the file even though it's been a feature included with Windows since 2000. When you're creating multiple files with large file sizes such as with servers and file uploads, installers or build servers (we use .netcore for our builds servers compiling native C) passing the file length as AllocationSize when creating the file can significantly reduce fragmentation and improve performance.
An example of passing the allocation size:
Windows Vista and above support the creation of files with their initial AllocationSize. This hint is passed to the file system driver and we can use the published FAT driver source code to show these optimizations:
https://github.com/microsoft/Windows-driver-samples/blob/master/filesys/fastfat/allocsup.c#L1164
https://github.com/microsoft/Windows-driver-samples/blob/master/filesys/fastfat/allocsup.c#L1233-L1250
The FatTruncateFileAllocation procedure becomes a noop when the AllocationSize is valid:
https://github.com/microsoft/Windows-driver-samples/blob/master/filesys/fastfat/allocsup.c#L1533
You can search for AllocationSize and see the other optimizations when the value is known during file creation:
https://github.com/microsoft/Windows-driver-samples/blob/6c1981b8504329521343ad00f32daa847fa6083a/filesys/fastfat/create.c#L6493-L6506
It would be safe to assume similar optimisations in the NTFS and ReFS drivers when AllocationSize is valid.
Configuration
The AllocationSize must be passed with NtCreateFile when a file is being created, overwritten, or superseded. NtCreateFile is documented here: https://docs.microsoft.com/en-us/windows/win32/api/winternl/nf-winternl-ntcreatefile
FileStream should also support get/set for the file allocation size like it currently does for the file size using GetFileInformationByHandleEx with the FileAllocationInfo class and FILE_ALLOCATION_INFO however this doesn't receive the full benefits of passing the AllocationSize up front with NtCreateFile so both methods should be supported for different use cases.
Analysis
A reduction in IO for our workloads when using C# to create files:
data:image/s3,"s3://crabby-images/bc0a5/bc0a5c2c3b621534f7c7457be09098b0f712bd05" alt="image"
Data
There's also a discussion on stackoverflow about preallocation with more details:
https://stackoverflow.com/questions/53334343/windows-refs-ntfs-file-preallocation-hint
.NET including support for preallocation hints would be very welcome feature for reducing disk fragmentation on servers (file uploads) and desktop applications (installers and build tools). Please consider adding support for this feature into the next version of the runtime.
The text was updated successfully, but these errors were encountered: