-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a new System.Buffers namespace to the BCL for Resource Pooling #15725
Comments
Nice proposal. A few initial thoughts:
|
Got a couple of the same questions as Stephen.
|
@stephentoub Responses below
Good idea. I'll think on this and have an answer for the review.
Just out of curiosity, what is the reason for avoiding default arguments? I'm not partial one way or another, I'm just curious about what problems they cause so I can avoid them in the future :)
I made T a struct so that it is restricted to primitives and structs and not classes; this pool is explicitly not and Object pool. The use cases I have had so far, working with the ASP.NET folks, have been around primitive and struct arrays that benefit from Pooling.
I think that the use cases for Managed and Native pools are so different that swapping between them would only add to confusion around which to use. By keeping the usage patterns slightly off, the usage of one or the other becomes much more clear; for example, if one is using PInvoke calls with buffers, you usually want to pass pointers and Span has a way to retrieve that while Arrays require unsafe code to do so. Conversely, if you want to pass some buffers around from receiving I/O, to formatting or translating, and back out to I/O, the current API set uses Arrays, so it makes it immediately obvious which pool to use. Just my thought process here
I haven't planned anything right now. I think if we did add any of that, it would be good to do in Debug mode only. A P0 aim was to make this as efficient as possible and all of that adds overhead; however, it does add a lot of value while debugging, so maybe adding some Debug build-specific code to check for these scenarios would be useful. I'll think more on this
I made it use ref to try and remove all the easy ways to shoot yourself in the foot; the easiest being code like the following: byte[] buffer = Rent(...);
// Do things....
Return(buffer, true /*clean the array*/);
// A little later
buffer = ReadIo(); I agree that it's very hard to prevent misuse if the caller makes copies of the reference but making the parameter
So I'm 50/50 on if I want to include the Shared pool or not; there are definitely use-cases where a Shared pool is helpful, but I also think some extra glue code to get around not having a Shared pool wouldn't be the worst thing. In terms of defaults, the Shared pool that is in CoreFxLab uses the default parameters of the Pool constructor, which is a max buffer size of 2 MB (lazy loaded) and a max number of buffers at 50-per-bucket.
Ideally, this would be a new library with a few dependencies as possible; it would be great if the Pooling could be used in the framework in places where it makes sense, so limiting dependencies helps. Currently, the only dependencies in the CoreFxLab prototype are: System.Runtime.CompilerServices (for inlining tiny, math-based functions) |
@mellinoe Responses below
The reason this is restricted is because this is meant to be a Buffer pool, not an Object pool; the usage patterns and underlying implementation will differ between these types. Obviously, this can be worked around by callers by just making their Objects a struct, but that will be a misuse that we can't really avoid. The use-cases I designed this around came from ASP.NET and their current Buffer Pools, which required primitive arrays and struct arrays.
If you use the Shared pool, then yes; that is one big reason I'm not sold on the Shared pool and may take it out. If you use an instance of the BufferPool, your instance is tracked by the GC so if you only need the Pool for a certain amount of time, it will be reclaimed once you no longer need it just like any other managed object...leaking this or keeping the memory around for the lifetime of the process would be a usage problem and not a Pool problem |
So I think that you are saying it can be used with any struct type, not just primitives, right? Just wanted to clarify. This wording was a bit unclear:
I'm aware that you can't actually put a generic restriction on "primitive" types, but we do some weird stuff in System.Numerics.Vectors where we actually do artificially limit the generic type to "primitive" structs and prevent you from instantiating a non-primitive struct. I think what you're saying is that I could indeed create a
Yeah, that sounds like how I would expect it to behave. I was thinking more for our internal uses in low-level libraries, which for their uses might just keep around a static buffer pool, which is never reclaimed. This is how the long path stuff that @JeremyKuhne added works, right? As long as the overhead is tiny, it's not really a problem, but if the pattern proliferates throughout the BCL then it might become noticeable depending on your app's usage patterns. |
Yeah, sorry for being unclear. I'll go change that line; you can use either primitives OR structs.
Yeah, the more I think about it, the more the static Shared pool becomes a problem. I'm going to remove it |
I still don't understand the struct limitation. What's the difference between: class Bar { }
...
ManagedBufferPool<Bar> and: struct BarWrapper { public Bar Bar; }
class Bar { }
...
ManagedBufferPool<BarWrapper> ? You've stated that there are different use cases for buffer pools vs object pools. Sure. But why does a buffer of reference types not count as a buffer but a buffer of value types does? Given a reference type T, there's a difference between an I'm simply not understanding why we're artificially limiting it. What value does that limitation bring? I understand the primary focus is on primitives (which are structs / value types), but why prevent other usage? Especially when we're not really preventing it... we're just making folks work a bit harder to get it, by wrapping their reference types in a value type like I did above. |
I suppose that's true. It would make it confusing later on if we added an ObjectPool though...then it would be confusing on which one I use since both work. The limitation with using this for classes would be that the class must have a default constructor and be non-disposable; plus, renting arrays of complex objects seems...odd. I realize that this implementation can be coerced to work with classes but you have to try to do that and it is obvious that it is not the goal. Maybe a different solution would be to make this primitive-only and then make a separate ObjectPool that can take T and have that be a part of this proposal as well. What do you think about this alternative @stephentoub? Then the use-case for the Buffer Pool is obvious and if you want to use complex types, there is another class that is built for that in the same namespace. |
Why? I think we're misunderstanding each other. I'm not talking about a need to maintain the objects in the buffer; in fact, I expect the primary use case would involve setting clearBuffers to true when returning them to the pool. This is about not having to allocate the array / buffer itself. For example, consider params arrays in C#. Today when you have a function like: public void Bar(params object[] args); and you call: Bar("hello", 42); the compiler needs to allocate a As another example, the
And I'm trying to understand why that is. The goal here is to avoid needing to allocating arrays when we can reuse ones we've cached. What's so special about value types that we have a goal for working with value types and not for working with reference types? (Other than one of the most common uses for this being
It's not possible with C# to do that statically... we'd need to do run-time checks to validate the primitiveness, which makes the APIs more complicated to use, especially for something as hopefully general purpose as a buffer pool.
I'm fine with the idea of having an |
In sample, does If I attempt to
|
I suppose you're right...there, fundamentally, is no reason we can't remove the
That's up to the caller; since the buffers are all managed, if they have a logic error where they forget to return the buffer, it will be free'd when the buffer is GC'd. If this is in a loop, then they will drain the pool of resources, which will resort to allocating when they request buffers and we don't have any. In response to a comment @stephentoub made earlier, I'm thinking about adding some Debug-specific code that will try and trap these errors.
If you request a size > the initial maxSize passed to the constructor, the buffer is allocated on demand and is dropped when passed to |
The overhead of renting and returning buffers from the prototype pool is lower than the one we built internally 👍 nice! It will be really exciting to us to see something like this go into CoreFx because it means that community packages can start to take a dependency on it. Some feedback
We (ASP.NET) absolutely have the requirement to store reference types in a pooled buffer. I'm currently doing exactly what @stephentoub highlighted - creating a
I think this is a misunderstanding. Constructors aren't related at all, this ain't c++ where you have uninitialized memory, you have null. Structs can implement
I'm not renting an array of complex objects, I'm renting a place to store complex objects.
As discussed several times, our usage in this manner is very different from canonical object-pooling scenarios. I'd suggest not gating this proposal on other proposals that meet different sets of requirements. If this doesn't land in a way we can ship some time in the next few weeks, then it's at risk for us to use in v1. In answer to some of the other discussions about returning/leaking - consider an alternative from our humble buffer pooling strawman: https://github.com/aspnet/Common/blob/dev/src/Microsoft.Extensions.MemoryPool/IArraySegmentPool.cs The standard currency here is In this pattern the concrete implementation can also subclass Consider also here using The objection that was raised to this earlier was that not many APIs in .net take For instance in Kestrel, the memory pool allocates an LOH sized block of memory and then pins it and hands out chunks. This allows you to avoid having to 'graduate' objects that you know have a static lifetime via the normal GC process. This also should avoid the GC attempting to move the pooled memory around as part of compaction. |
There should be a security review here that depends on the internal implementation. Could I call ReturnBuffer, but hang onto the array pointer and watch the buffer change by other threads\code and watch the data that I shouldn't? It partially depends on where the ManagedBufferPool is created - static, local, thread, etc. so any issues of misuse would require access to the ManagedBufferPool instance. |
It would be good to have performance numbers with and without the ManagedBufferPool to verify the premise and make sure the gen0 collections and re-allocs are indeed expensive. |
Just read in the APi review
This is another case when having ContainsReferences property would be beneficial. |
@steveharter re: is GC a problem without pooling? For ASP.NET at its target speed around 7Mrps just for a single type it would be allocating 3.2 GBytes and 42.3M transitory objects per second and there are many more different types it allocates during the request processing; which is an enromous amount for the GC to clean up. Prior to pooling; under high load, after about 3 minutes, it would literally flat-line on network performance and stop serving requests while still at 100% cpu on 8 cores only performing GC. The GC is an exceptionally good general purpose allocator and deallocator; however object pooling has the advantage that comes from being a very single purpose object recycler. Its easier to reuse the objects already allocated then clean up the memory; and reallocate them. |
@sokket could an object based pool such as @rynowak mentions take types that implement e.g.
|
Also I'd heavily favor an interface over class or abstract base class; with a default/standard implementation then provided; its more flexible, composable and testable - and allows separation of contracts from implementation - even in different assemblies. (e.g. have interface assembly and implementation assembly - easier for pay-on-use) |
|
Just finished watching the DR Video.
The “Rationale and Usage” above states “This addition is also not meant for object pooling; this addition is specific to buffer pooling” But the class take on T and I can ask for RentBuffer(1) which will return an array with one element. The “Class Description” states: “The Pool is named ArrayBufferPool” But later on states: “The ManagedBufferPool class will be generic in order to allow callers to Rent arrays of type T, which is expected to be primitive types such as byte or char” I think we need a better understanding of why this pool is better than a “BufferPool” of byte arrays or a generic “ObjectPool” that has no “Buffer” (a.k.a byte array) specific semantics. |
Or simply |
Get is also good but is commonly paired with a Set operation. |
@benaadams I'm in the beginning stages of investigating what's necessary for an Object Pool; the ArrayPool has some very specific uses around renting buffers, admittedly I originally thought of purely value types but there was a request to relax that, that are different than the uses around Object Pooling. I definitely agree that there needs to be a way to reset the objects when returned. I'm also thinking heavily about how to make the Pool test-friendly, both internally and by also allowing consumers to swap it out to easily test their use of it; this is a very-high priority item and I'll be sure to have a story for how to do that when it is checked in. @clrjunky I named the function Sorry about the confusion around the name...I haven't had a chance to go through and fully update the original issue with the full rename and from all the feedback; I'm working through that now. |
@benaadams I don't see an email on your profile...would you mind shooting me an email to chat about some of the scenarios you have in mind with the extension of and testing of the Buffer Pool? |
From the DR video (17:15) "Once we give you a buffer nobody holds a reference to this T Array" Your "Renting" in theory the consumer is "Taking" in practice.
The semantics are already defined by the class being a pool (what does Return mean without the notion of a "Pool")
clrjunky wrote:
...and once you fill them how will I be able to copy them into my Dictionary without pinning the memory? How will I know when data is ready to be copied from the native memory? |
@sokket sent you and email, will send followup - but with BufferPool : IArrayPool<byte>, IArrayPool<char>, IArrayPool<int>
Re: object pools I put more or less what we use into this gist as I can't find open issue and so as not to derail the thread anymore https://gist.github.com/benaadams/5f5ea438d733de6f762a |
@sokket I like this proposal. My only concern is that you'd still have a lot of allocations if the sizes of pooled buffers varies significantly. I wrote a functioning thread-safe bufferpool that takes this approach a while ago. Here is the repo and accompanying blog post. The pool will expand or contract (by adding or removing slabs) as buffers are rented and returned and segments can span multiple slabs. My motivation was to prevent fragmentation during socket operations but it would work fine for simply pooling bytes. You'd have to modify methods that use the buffer to accept |
True, but this is an implementation detail that consumers won't know about...the use semantics should be such that the consumer can immediately tell that this buffer is not permanently theirs...it is still owned by the Pool and they are simply using it for the time being 😃
That problem is for the Native Pool :) for this pool, the cost of always pinning the pool is an overhead that many cases won't want. If the your use case requires pinning then you have ability to do so, but having the arrays always pinned seems overkill.
Indeed, there are lots of interesting things we can do with Span...I'll be sure to add you to that thread when the proposal is worked through.
Therein lies the problem; this Pool is going to be used inside the framework as well, and going back to rewrite all the functions that take in T[] to use ArraySegment or IEnumerable is a non-starter...there are simply too many places to update / replace those functions. Due to that, we need to have our base be T[], which will fit with all current code (in the framework as well as outside) so the Pool can just be plugged in and work.
True, but the allocations are front-loaded when you first request one...as an implementation detail, the buckets can be lazy loaded so that your app memory pressure doesn't explode. Once you pay the upfront cost, Renting is super cheap and you save on GC passes since the objects will be long-lived and won't need to be reclaimed; that is where the real cost savings come in. |
A pooled stream can be useful here, to work with existing methods that accept a stream. This is one distinction between an ObjectPool and a BufferPool. You can access a BufferPool as a stream but not an ObjectPool. |
It was stated afterwards that this pool "can’t leak" which led me to believe that consumer ownership of the returned buffer is actually part of the pool specification.
I believe this is already made clear by the class being a “Pool”. While “Rent” implies the pool has ownership of the buffer to me it sounds moot because nothing actually backs this up and it doesn’t really help in avoiding leaks (which the class doesn’t anyway). Just my 2c.
I never suggested that the pool includes pinning as part of its contract, all I said that I needed buffer pools when pinning was an issue which indeed seems like the common scenario judging by others recent posts on this thread. The DR should include at least two short non-pinning code examples (a.k.a use cases) to justify the existence of this class in the FW. |
Hi Is the "Proposed API" still the latest API? I modeled an interface for a buffer pool off this about a month ago - https://github.com/JamesNK/Newtonsoft.Json/blob/master/Src/Newtonsoft.Json/IJsonBufferPool.cs Has the API changed at all since then? Will it change? I might as well keep mine in sync until it is released and is locked down. |
Ah I see...sorry for the confusion there. I referred to "leak" in the context of alive-but-not-GC'd, which cannot happen.
Sorry about that, my mistake. I'll add an example with Pinning :)
The Proposed API is not the latest...I'm close to locking down the final API now; I'm finishing up some conversations about shared vs instance, as well as around testability around the pool (internal testing and the ability to swap out the shared pool for a custom one). Thanks for all the feedback folks. I'll be updating the above proposal with the concrete API by, hopefully, the end of the week. There's been a lot of great feedback from everyone and I have a good idea of what requirements everyone has. I'll spend some time coming up with a proposal that can benefit as many of the requirements as possible and update the proposal with the finalized API, along with usage descriptions and code samples. Thanks everyone for helping! |
Thank you @sokket, we adore your attitude. :) |
The initial implementation has been merged. Thanks for the feedback everyone! |
stream.CopyTo work incorrect, this bug occurce aftter https://github.com/dotnet/corefx/issues/4547 introduced
Summary
Currently, the BCL does not have support for resource pooling of any kind. A Buffer Pool type should be added to allow both the Framework and external engineers to utilize Pooling of primitive and typically-short-lived types.
Rationale and Usage
Current engineers, both internally and externally, are required to create their own Pool system for short-lived arrays or create and destroy arrays on demand, leading to lots of objects to be collected during Generation 0 Garbage Collection and therefore taking a runtime performance hit. Lots of simple pooling scenarios can be solved with a generic Rent-Return contract, allowing for a genetic Pool implementation that will prevent short-term objects or custom Pool implementations for a majority of cases.
Note
The new System.Buffers namespace is not meant to cover all Pooling cases; in some instances, the requirements are so specific to the application being written that it would not be feasible to make a pool for that case in the BCL. This addition is also not meant for object pooling; this addition is specific to buffer pooling of primitive or struct-based types. Object pooling has different requirements that are not necessarily fulfilled by buffer pooling.
Proposed API
Class Description
The ArrayPool is a new class that will manage tiered buffers for the caller. The Pool is named ArrayBufferPool due to the specific nature of this pool; the buffers from this pool are expected to be used in managed code and while they can be pinned and passed to Native code, it is not expected that they will be. There are preliminary talks to create a NativeBufferPool to help with the PInvoke scenarios, but that implementation will rely on Span, which is not ready yet. The ManagedBufferPool class will be generic in order to allow callers to Rent arrays of type T, which is expected to be primitive types such as
byte
orchar
. The Pool will be lightweight and thread-safe, allowing for fast Rent and Return calls from any thread within the process, along with minimal locking overhead, and 0 heap allocations on most Rent calls (exceptions to this will be called out below in the description of theRent
function).To allow for resizing and to minimize fragmentation, the Pool will use Bucketing to create different buffer sizes up to the specified maximum. This allows callers to request multiple buffer sizes without needing multiple pools; also, Bucket sizes will be determined ahead of time but will not allocate ahead of time. This trade off means many different Bucket sizes can be specified without putting a strain on memory utilization unless requested.
Usage examples can be seen in the Examples section below.
Public Function Descriptions
Constructor
public ManagedBufferPool(int maxBufferSize = <number>, int numberOfBuffersPerBucket = <number>);
The ManagedBufferPool constructor takes in two arguments: the maximum buffer that is expected to be requested, and the number of buffers per BufferBucket. Both of these arguments are optional and can be used to tweak the Pool to situation-specific circumstances as well as have default values that will be tailored to most situations if they are not specified. The constructor will not allocate the buffers at this time; the pool is Lazy Loaded so that the Bucket sizes will be determined, based on the maximum size, but the memory for each Bucket will not be allocated until requested.
RentBuffer
public T[] RentBuffer(int size, bool clearBuffer = false)
The Rent(..) function is used to request a buffer of a specific size from the Pool. The caller may request that the buffer be cleared before it is Rented, but this defaults to false for performance reasons. The Pool is guaranteed to return a Buffer of at least the specified size; the actual size may be larger to due buffer availability. If the Bucket containing the specified size has not been allocated yet, it will be created at this time; any further Rent calls that hit this Bucket will not allocate any data on the Heap. This function is thread-safe.
EnlargeBuffer
public void EnlargeBuffer(ref T[] buffer, int newSize, bool clearFreeSpace = false);
The EnlargeBuffer(..) function is used to request a larger buffer than the one specified. The new buffer returned will be at least the specified size and will contain all data in the passed in buffer. The buffer musted be passed as a reference since the previous reference will no longer be valid. The caller may also request that any excess space between the end of the previous buffer to the end of the new buffer be cleared; this defaults to false for performance reasons. Like
Rent
, this call may allocate if the new size hits a Bucket that has not been allocated yet; if the Bucket has been allocated, this call will not allocate any data on the Heap. Only buffers that have been received via calls toRentBuffer
should be passed to this function. This function is thread-safe.ReturnBuffer
public void ReturnBuffer(ref T[] buffer, bool clearBuffer = false);
The ReturnBuffer(..) function is used to give up ownership of a buffer received from calls to
RentBuffer
. The call takes a reference to a buffer since the reference will no longer be valid after the call returns. The buffer can be cleared by passing true to the optional parameter, but this defaults to false for performance reasons. Only buffers that have been received by calls toRentBuffer
should be passed in and a buffer can only be Returned once. This function is thread-safe.Static DeclarationsSharedBufferPoolpublic static ManagedBufferPool<T> SharedBufferPool {get; }
This static property allows for a Shared pool to be used in cases where multiple components in a system will require access to buffers for the duration of the process lifetime, such as a web server. This instance will be created with the default parameters and is readonly. It follows the same usage and allocation patterns described above, so if the caller does not use it, nothing will be allocated.Future Entries into System.Buffers
Going forward, we will look into adding more types of resource pooling into the System.Buffers namespace; currently, a NativeBufferPool and an ObjectPool are in the very initial stages.
Examples
Simple Rent and Return
Multithreaded Example
Updates
Update 1
After some thinking and initial feedback, removing the Shared pool due to the possibility of confusing or misuse and the possibility that components will use the Shared pool, growing the Process memory unnecessarily.
Update 2
Updated the API referenced based on the review feedback.
The text was updated successfully, but these errors were encountered: