Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new option bag for FileStream ctor #52446

Closed
adamsitnik opened this issue May 7, 2021 · 16 comments · Fixed by #51111
Closed

Add new option bag for FileStream ctor #52446

adamsitnik opened this issue May 7, 2021 · 16 comments · Fixed by #51111
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.IO tenet-performance Performance related issue
Milestone

Comments

@adamsitnik
Copy link
Member

adamsitnik commented May 7, 2021

Background and Motivation

We have recently approved one new FileStream constructor argument: long allocationSize (#45946 (comment)) that got implemented, but has not been merged yet (#51111).

As pointed by @carlossanlop and @stephentoub, we should consider adding options bag instead.

An extra motivation for adding the option bag is a new feature request (#52321) for allowing to pass Memory<byte> as a buffer. This would help us to solve #15088 and literally remove the last problematic allocation from FileStream.

Proposed API

namespace System.IO
{
+    public readonly struct FileStreamOptions
+    {
+        public string Path { get; init; }
+        public FileMode Mode { get; init; }
+        public FileAccess Access { get; init; }
+        public FileShare Share { get; init; }
+        public Memory<byte>? Buffer { get; init; } // default value == null => use default buffer size and allocate the buffer (current behaviour)
+        public FileOptions Options { get; init; }
+        public long PreAllocationSize { get; init; }
+    }

    public class FileStream : Stream
    {
        public FileStream(string path, FileMode mode)
        public FileStream(string path, FileMode mode, FileAccess access)
        public FileStream(string path, FileMode mode, FileAccess access, FileShare share)
        public FileStream(string path, FileMode mode, FileAccess access, FileShare share, int bufferSize)
        public FileStream(string path, FileMode mode, FileAccess access, FileShare share, int bufferSize, bool useAsync)
        public FileStream(string path, FileMode mode, FileAccess access, FileShare share, int bufferSize, FileOptions options)
+       public FileStream(FileStreamOptions options)

+       [EditorBrowsable(EditorBrowsableState.Never)]
        [Obsolete("This constructor has been deprecated.  Please use new FileStream(SafeFileHandle handle, FileAccess access) instead.  https://go.microsoft.com/fwlink/?linkid=14202")]
        public FileStream(IntPtr handle, FileAccess access)
+       [EditorBrowsable(EditorBrowsableState.Never)]
        [Obsolete("This constructor has been deprecated.  Please use new FileStream(SafeFileHandle handle, FileAccess access) instead, and optionally make a new SafeFileHandle with ownsHandle=false if needed.  https://go.microsoft.com/fwlink/?linkid=14202")]
        public FileStream(IntPtr handle, FileAccess access, bool ownsHandle)
+       [EditorBrowsable(EditorBrowsableState.Never)]
        [Obsolete("This constructor has been deprecated.  Please use new FileStream(SafeFileHandle handle, FileAccess access, int bufferSize) instead, and optionally make a new SafeFileHandle with ownsHandle=false if needed.  https://go.microsoft.com/fwlink/?linkid=14202")]
        public FileStream(IntPtr handle, FileAccess access, bool ownsHandle, int bufferSize)
+       [EditorBrowsable(EditorBrowsableState.Never)]
        [Obsolete("This constructor has been deprecated.  Please use new FileStream(SafeFileHandle handle, FileAccess access, int bufferSize, bool isAsync) instead, and optionally make a new SafeFileHandle with ownsHandle=false if needed.  https://go.microsoft.com/fwlink/?linkid=14202")]
        public FileStream(IntPtr handle, FileAccess access, bool ownsHandle, int bufferSize, bool isAsync)
    }

Usage Examples

// Opening file for Read:
var basic = new FileStreamOptions
{
    Path = @"C:\FrameworkDesignGuidelines.pdf",
    Mode = FileMode.Open,
    Access = FileAccess.Read,
};
using FileStream read = new FileStream(basic);

// creating new file for async write with allocation size and buffer provided by the user
byte[] array = ArrayPool<byte>.Shared.Rent(4096);
var advanced = new FileStreamOptions
{
    Path = @"C:\copy.pdf",
    Mode = FileMode.CreateNew,
    Access = FileAccess.Write,
    Share = FileShare.None,
    PreAllocationSize = read.Length,
    Options = FileOptions.Asynchronous | FileOptions.WriteThrough,
    Buffer = array
};
using FileStream write = new FileStream(advanced);
read.CopyTo(write);
ArrayPool<byte>.Shared.Return(array);

// To disable the buffering, users would have to pass a default or empty `Memory<byte>`:
var noBuffering = new FileStreamOptions
{
    Buffer = default(Memory<byte>) // Array.Empty<byte>() would also work
};

Alternative Designs

Don't let the user provide the buffer (to minimize risk of misuse), but instead provide bufferSize and extend FileOptions with PoolBuffer:

namespace System.IO
{
+    public readonly struct FileStreamOptions
+    {
+        public string Path { get; init; }
+        public FileMode Mode { get; init; }
+        public FileAccess Access { get; init; }
+        public FileShare Share { get; init; }
+        public int BufferSize { get; init; } // the difference
+        public FileOptions Options { get; init; }
+        public long PreAllocationSize { get; init; }
+    }

    public class FileStream : Stream
    {
        public FileStream(string path, FileMode mode)
        public FileStream(string path, FileMode mode, FileAccess access)
        public FileStream(string path, FileMode mode, FileAccess access, FileShare share)
        public FileStream(string path, FileMode mode, FileAccess access, FileShare share, int bufferSize)
        public FileStream(string path, FileMode mode, FileAccess access, FileShare share, int bufferSize, bool useAsync)
        public FileStream(string path, FileMode mode, FileAccess access, FileShare share, int bufferSize, FileOptions options)
+       public FileStream(FileStreamOptions options)
    }
    
    public enum FileOptions
    {
        WriteThrough,
        None,
        Encrypted,
        DeleteOnClose,
        SequentialScan,
        RandomAccess,
        Asynchronous,
+       PoolBuffer // new option
    }

Risks

Allowing the users to pass the buffer creates the risk of misusing the buffer by the user:

  • not returning a rented array to the pool and exhausting the ArrayPool
  • freeing the native memory that Memory<byte> wraps when it's still being used by FileStream
@adamsitnik adamsitnik added api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.IO tenet-performance Performance related issue labels May 7, 2021
@adamsitnik adamsitnik added this to the 6.0.0 milestone May 7, 2021
@adamsitnik adamsitnik self-assigned this May 7, 2021
@ghost
Copy link

ghost commented May 7, 2021

Tagging subscribers to this area: @carlossanlop
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and Motivation

We have recently approved one new FileStream constructor argument: long allocationSize (#45946 (comment)) that got implemented, but has not been merged yet (#51111).

As pointed by @carlossanlop and @stephentoub, we should consider adding options bag instead.

An extra motivation for adding the option bag is a new feature request (#52321) for allowing to pass Memory<byte> as a buffer. This would help us to solve #15088 and literally remove the last problematic allocation from FileStream.

Proposed API

namespace System.IO
{
+    public readonly struct FileStreamOptions
+    {
+        public string Path { get; init; }
+        public FileMode Mode { get; init; }
+        public FileAccess Access { get; init; }
+        public FileShare Share { get; init; }
+        public Memory<byte>? Buffer { get; init; } // default value == null => use default buffer size and allocate the buffer (current behaviour)
+        public FileOptions Options { get; init; }
+        public long AllocationSize { get; init; }
+    }

    public class FileStream : Stream
    {
        public FileStream(string path, FileMode mode)
        public FileStream(string path, FileMode mode, FileAccess access)
        public FileStream(string path, FileMode mode, FileAccess access, FileShare share)
        public FileStream(string path, FileMode mode, FileAccess access, FileShare share, int bufferSize)
        public FileStream(string path, FileMode mode, FileAccess access, FileShare share, int bufferSize, bool useAsync)
        public FileStream(string path, FileMode mode, FileAccess access, FileShare share, int bufferSize, FileOptions options)
+       public FileStream(FileStreamOptions options)
    }

Usage Examples

// Opening file for Read:
var basic = new FileStreamOptions
{
    Path = @"C:\FrameworkDesignGuidelines.pdf",
    Mode = FileMode.Open,
    Access = FileAccess.Read,
};
var read = new FileStream(basic);

// creating new file for async write with allocation size and buffer provided by the user
var advanced = new FileStreamOptions
{
    Path = @"C:\ouput.txt",
    Mode = FileMode.CreateNew,
    Access = FileAccess.Write,
    Share = FileShare.None,
    AllocationSize = 1024 * 1024,
    Options = FileOptions.Asynchronous | FileOptions.WriteThrough,
    Buffer = ArrayPool<byte>.Shared.Rent(4096)
};
var write = new FileStream(advanced);

// To disable the buffering, users would have to pass a default or empty `Memory<byte>`:
var noBuffering = new FileStreamOptions
{
    Buffer = default // Array.Empty<byte>() would also work
};

Alternative Designs

Don't let the user provide the buffer (to minimize risk of misuse), but instead provide bufferSize and extend FileOptions with PoolBuffer:

namespace System.IO
{
+    public readonly struct FileStreamOptions
+    {
+        public string Path { get; init; }
+        public FileMode Mode { get; init; }
+        public FileAccess Access { get; init; }
+        public FileShare Share { get; init; }
+        public int BufferSize { get; init; } // the difference
+        public FileOptions Options { get; init; }
+        public long AllocationSize { get; init; }
+    }

    public class FileStream : Stream
    {
        public FileStream(string path, FileMode mode)
        public FileStream(string path, FileMode mode, FileAccess access)
        public FileStream(string path, FileMode mode, FileAccess access, FileShare share)
        public FileStream(string path, FileMode mode, FileAccess access, FileShare share, int bufferSize)
        public FileStream(string path, FileMode mode, FileAccess access, FileShare share, int bufferSize, bool useAsync)
        public FileStream(string path, FileMode mode, FileAccess access, FileShare share, int bufferSize, FileOptions options)
+       public FileStream(FileStreamOptions options)
    }
    
    public enum FileOptions
    {
        WriteThrough,
        None,
        Encrypted,
        DeleteOnClose,
        SequentialScan,
        RandomAccess,
        Asynchronous,
+       PoolBuffer // new option
    }

Risks

Allowing the users to pass the buffer creates the risk of misusing the buffer by the user:

  • not returning a rented array to the pool and exhausting the ArrayPool
  • freeing the native memory that Memory<byte> wraps when it's still being used by FileStream
Author: adamsitnik
Assignees: adamsitnik
Labels:

api-suggestion, area-System.IO, tenet-performance

Milestone: 6.0.0

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label May 7, 2021
@adamsitnik adamsitnik removed the untriaged New issue has not been triaged by the area owner label May 7, 2021
@davidfowl
Copy link
Member

How would you know to return the array pool buffer in the FileStream implementation with the original proposal or is that the job of the caller to do?

@adamsitnik
Copy link
Member Author

adamsitnik commented May 7, 2021

How would you know to return the array pool buffer in the FileStream implementation with the original proposal or is that the job of the caller to do?

It would be caller's responsibility.

With the alternative design (FileOptions.PoolBuffer) FileStream would both rent and return the array.

edit: I've edited the example to clarify that

@davidfowl
Copy link
Member

Do we have another other examples of APIs like that?

@teo-tsirpanis
Copy link
Contributor

Since FileStreamOptions is proposed to be readonly, it could be passed to FileStream's constructor by reference with the in keyword. The struct weighs 56 bytes on 64-bit machines.

@sakno
Copy link
Contributor

sakno commented May 7, 2021

Do we have another other examples of APIs like that?

XmlReader.Create with XmlReaderSettings? Perhaps, the better naming is FileStreamSettings?

@teo-tsirpanis
Copy link
Contributor

teo-tsirpanis commented May 7, 2021

And the more recent System.IO.Pipelines.PipeOptions.

@danmoseley
Copy link
Member

should this have blocking label to put it at the top of the api review queue?

@adamsitnik adamsitnik added api-ready-for-review API is ready for review, it is NOT ready for implementation blocking Marks issues that we want to fast track in order to unblock other important work and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels May 8, 2021
@adamsitnik
Copy link
Member Author

@ayende @nietras (who provided feedback about issues with buffer allocations)

considering the two proposals:

  • possibility to pass Memory<byte> as the buffer where the user is responsible for resources management
  • possibility to specify FileOptions.PoolBuffer and bufferSize where FileStream just rents and returns the ArrayPool buffer on it's own.

Which would work best for your needs and why?

@ayende
Copy link
Contributor

ayende commented May 10, 2021

I would rather have Memory<byte> there. This will allow me to allocate the memory directly (I already have a native pool that I can use) and prevent all managed allocations.
If I'm using ArrayPool, I have to use managed memory.

@terrajobst
Copy link
Member

terrajobst commented May 14, 2021

Video

  • There is no notion of required properties for structs/classes
    • We should extract path and keep it as a constructor parameter
    • IOW, we the options type should focus on optional parameters
  • Does this need to be a struct? Structs always have usability issues and this API seems like the one where the constructor does so much work that the allocation of the options type doesn't seem to matter.
    • Is there value in init or is get/set good enough? The benefit is wider language support and presumably FileStream will copy the values anyways because there are already fields and/or the options are directly translated into the native code without holding onto them.
  • We should have a separate issue about the ability to pass in the buffer as that opens up the possibility of use-after-free issues and also makes it more error prone to share options across multiple FileStream instances
  • PreAllocation
    • Should this be Preallocation or PreAllocation? @bartonjs to decide.
  • The options don't cover the constructor that takes a handle. Should it?
namespace System.IO
{
    public sealed class FileStreamOptions
    {
        public FileStreamOptions();
        public FileMode Mode { get; set; }
        public FileAccess Access { get; set; }
        public FileShare Share { get; set; }
        public FileOptions Options { get; set; }
        public long PreAllocationSize { get; set; }
    }
    public partial class FileStream : Stream
    {
        // Existing:
        // public FileStream(string path, FileMode mode)
        // public FileStream(string path, FileMode mode, FileAccess access)
        // public FileStream(string path, FileMode mode, FileAccess access, FileShare share)
        // public FileStream(string path, FileMode mode, FileAccess access, FileShare share, int bufferSize)
        // public FileStream(string path, FileMode mode, FileAccess access, FileShare share, int bufferSize, bool useAsync)
        // public FileStream(string path, FileMode mode, FileAccess access, FileShare share, int bufferSize, FileOptions options)
        public FileStream(string path, FileStreamOptions options);

        // Hiding obsoleting APIs

        [EditorBrowsable(EditorBrowsableState.Never)]
        public FileStream(IntPtr handle, FileAccess access);

        [EditorBrowsable(EditorBrowsableState.Never)]
        public FileStream(IntPtr handle, FileAccess access, bool ownsHandle);

        [EditorBrowsable(EditorBrowsableState.Never)]
        public FileStream(IntPtr handle, FileAccess access, bool ownsHandle, int bufferSize);

        [EditorBrowsable(EditorBrowsableState.Never)]
        public FileStream(IntPtr handle, FileAccess access, bool ownsHandle, int bufferSize, bool isAsync);
    }
}

@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels May 14, 2021
@carlossanlop carlossanlop removed the blocking Marks issues that we want to fast track in order to unblock other important work label May 14, 2021
@danmoseley
Copy link
Member

My 2c

Should this be Preallocation or PreAllocation

Pre is not a word. Preallocation is a word.

@sakno
Copy link
Contributor

sakno commented May 14, 2021

FileStreamOptions missed bufferSize.

use-after-free issues can be solved through delegation of buffer allocation. I see two options here:

public class FileStreamOptions // not sealed
{
  public int BufferSize { get; set; }

  public virtual Memory<byte> AllocateBuffer() => new byte[BufferSize].AsMemory(); // default impl
}

or

public sealed class FileStreamOptions
{
  public int BufferSize { get; set; }

  public Func<int, Memory<byte>>? BufferAllocator { get; set; }
}

@bartonjs
Copy link
Member

There's a subtle difference in interpretation between pre-allocation and preallocation. With the hyphen means before an allocation event ("Pre-allocation we were running at 1200 RPS, but after we were up to 24k RPS."), without the hyphen means allocating in advance of need ("Utilizing a preallocation of 73 workers we were able to smooth out the latency spike induced from lunchtime web traffic.")

Since this is "allocating in advance of need" there's no hyphen, so no second capital letter. This matches both the MS Style Guide (https://docs.microsoft.com/en-us/style-guide/a-z-word-list-term-collections/p/pre) and Wiktionary's (uncited) definition/spelling (https://en.wiktionary.org/wiki/preallocate). (None of the American Heritage Dictionary, Merriam-Webster, or Oxford English Dictionary currently list "preallocate" as a word, for what it's worth.)

Therefore:

namespace System.IO
{
    partial class FileStreamOptions
    {
        public long PreallocationSize { get; set; }
    }
}

Alternative names:

  • CreateSize (if it's only used for File.Create or File.Open with OpenMode.Create)
  • The classic "SizeHint"
  • InitialSize
  • Probably are lots of others, but I'm stopping here. Or, rather, one before here.

@KevinCathcart
Copy link
Contributor

Deleted my previous post. I’d forgotten that useAsync was part of FileOptions enum.

I do feel that prior to release, we will want to either have a way to pass in a buffer, or have BufferSize added to this class, since it would be really weird to make the ability to chose a different to be mutually exclusive with any new parameters added.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label May 17, 2021
@adamsitnik
Copy link
Member Author

I've added the proposal for specifying buffer or buffer size and a possibility to rent the array to #15088 (comment)

@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label May 18, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Jun 17, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.IO tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants