-
-
Notifications
You must be signed in to change notification settings - Fork 887
Unmanaged pooling MemoryAllocator #1730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unmanaged pooling MemoryAllocator #1730
Conversation
Might be worth testing trimming solely via GC.AddMemoryPressure & gen2 callback? |
|
@br3aker I'm not sure if AddMemoryPressure / RemoveMemoryPressure contributes to Gen2 allocation budget or not. Might make sense to try out, but I think the timer is not that expensive, and trimming is configured to be done every 1 minute (mimicking ArrayPool.shared) anyways. |
Codecov Report
@@ Coverage Diff @@
## master #1730 +/- ##
=======================================
Coverage 87% 87%
=======================================
Files 935 944 +9
Lines 49300 49753 +453
Branches 6102 6165 +63
=======================================
+ Hits 43175 43575 +400
- Misses 5115 5157 +42
- Partials 1010 1021 +11
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
|
Cannot wait to get stuck into reading this! 🤩 |
|
@JimBobSquarePants been thinking a lot & chatting with folks on the C#/lowlevel discord channel, I'm no longer sure if using unsafe memory is the right thing to do. The problem is that an update to 1.1 may turn minor bugs and performance issues into security errors for users of On the other hand, SkiaSharp merged a similar API without spending a single minute on security concerns: mono/SkiaSharp#1242 |
|
@antonfirsov That's the I need to do some reading there and maybe see if we can get someone from the runtime team who worked in that area to comment. |
|
Strictly speaking, the finalizer warning doesn't apply to us since the
First I also thought so, but in fact it can be as simple as: var image = new Image<Rgba32>(w, h); // no using
Span<Rgba32> span = image.GetPixelRowSpan(0); // last use of the object `image`, finalizers may run after this point
// some relatively long running code here to allow the finalizers to finish
span[0] = default; // memory corruption |
|
I wonder if we could write an analyzer? |
That would be awesome, but I'm afraid they would share the concerns around unsafe memory and push back.
Would it work out of the box just by using the library? Note that it won't help existing users doing a package update without recompilation, and then running into a potential security issue. |
|
The trick, I think, would be to make the analyzer a dependency of the main library Like Xunit do. Have we made any breaking changes that require recompilation? Maybe we should just to ensure people should rebuild. 👿 |
There is a safe, breaking way to re-implement span accessors by using delegates, inspired by the comment above: public class Image<T>
{
- public Span<T> GetPixelRowSpan(int y);
- public bool TryGetSinglePixelSpan(out Span<TPixel> span);
+ public void ProcessPixels(Action<PixelAccessor<T>> rowProcessor);
+ public bool TryProcessSinglePixelSpan(SpanAction<T> pixelProcessor);
}
+ public ref struct PixelAcceessor<T>
+ {
+ Span<T> GetRowSpan(int y);
+ }The simplest thing would be to go ahead with this breaking change and bump ImageSharp version number to 2.0. The improvements will justify the change. |
2.0 was to be my kill all old target frameworks release. I want to ship a working V1 of Fonts and Drawing before starting work on it. |
|
That can be 3.0 then. We follow semantic versioning more or less, so no point to be afraid of major version jumps as breaking changes land IMO. |
|
But I'm also fine with a hard-breaking 1.1, this is more about PR and communication than anything else, However, renaming the milestones seems to be better thing to do for me, we can even benefit out of it. |
|
My only issue with jumping from 2.0 to 3.0 would that in real terms it would probably occur over a short timespan which, in my opinion does reflect well on the quality. 1.1 would be, by far, my next desired target. This is a massive breaking change though so I'm deeply conflicted. 🙁 |
I have a question regarding Btw amazing work on all this @antonfirsov, I'll also need to find some time to carefully go through all this like James said and have a proper read, as the whole investigation seems super interesting! 🚀 |
|
After careful consideration. I'm up for a V2 release. It's good opportunity to fix a few things plus we are already adding a significant amount of fixes/functionality to the release so let's make a show of it. |
UPDATE 2: Ready for review!
OutOfMemoryException. Consider retryingMarshal.AllocHGlobalonOutOfMemoryExceptionafter a short wait. DONE: We are blocking the thread on OOM to retry allocations. 32 bit is 2x slower with 20 Threads than 64 bit, but doesn't OOM. The retries alone are not responsible for the 2x slowdown, 32bit runtime seems to work 1.5x slower also with 10 threads, when there are no OOMs.PreferContiguousImageBuffers, removeMemoryAllocator.MinimumContiguousBlockSizeBytes.PixelAccessor<T>and other Pixel processing breaking changes & API discussion #1739 stuffArrayPoolMemoryAllocatorMemoryAllocator.Default. -- Need to changeMemoryAllocator.Default, it should be get-only.Prerequisites
Description
This PR introduces
UniformUnmanagedMemoryPoolMemoryAllocatorand sets it as default to fix #1596.UniformUnmanagedMemoryPoolMemoryAllocator functional characteristics
ArrayPool<byte>.SharedUniformUnmanagedMemoryPoolto allocate 4 MB blocks of discontiguous unmanaged memory, of up to the pool's limitPool size
According to my benchmaks, the pool should scale to the maximum desired size to achieve the best througput. There is no point placing an artificial pool limit, unless there is a physical limitation. I decided to set the maximum pool size to 1/8 th of the available physical memory in 64 bit .NET Core processes. This means that on a 16GB machine the pool can grow as large as 2 GB.
On 32 bit, and other (non-testable) platforms the pool limit is 128 MB.
Trimming
The trimming of the pools is triggered by both Gen 2 GC collect and a timer. (We need the timer since unmanaged allocations don't trigger GC) On high load we trim the entire pool, on low load we trim 50% of the pool every minute.
Finalizers
With
ArrayPoolMemoryAllocator, if an image is GC-d without being disposed, buffers are never returned to the pool. This means no hard memory leak, but the pools will be eventually exhausted, because the bucket's running index hitting the bucket limit.To avoid this,
MemoryGroup<T>.OwnedandUniformUnmanagedMemoryPool.FinalizableBuffer<T>have finalizers returning theUnmanagedMemoryHandleto the pool. This can get tricky, sinceUnmanagedMemoryHandleis also finalizable:ImageSharp/src/ImageSharp/Memory/Allocators/Internals/UnmanagedMemoryHandle.cs
Lines 58 to 78 in 1a41aaa
I'm moderately concerned about CA2015, but I don't think it applies to us. Dispose will also free the memory used by a span. Touching a span or a pointer to
SkiaSharpimage's memory would be also a bug if the image is finalized.API changes
Resolves #1739
Fixes #1675
Edit: API changes implemented according to #1739.
Benchmarking methodology
To determine these defaults I compared results of LoadResizeSaveParallelMemoryStress runs systematically, typically running them for a varying parameter a couple of times, while fixing all other parameters. I have a bunch of Excel documents comparing the tables, including all of them would be TLDR, but I can present information on request.
Benchmark results
I was benchmarking on a 10 Core (20 Thread) 19-10900X with 64 GB RAM. This means I was able to stress highly parallel workload very extensive allocation pressure.
Here is an median processing time (seconds) of 40 runs of
LoadResizeSaveParallelMemoryStress("Classic" meansArrayPoolMemoryAllocator):ImageSharp is about 8% faster with the new default memory allocator.
Results of
LoadResizeSaveStressBenchmarksBDN benchmark also show 7.5% improvement:VirtualAlloc commit lifetimes graph with ArrayPoolMemoryAllocator
VirtualAlloc commit lifetimes graph with the new allocator, demonstrating the trimming
VirtualAlloc commit lifetimes graph with pool size set to zero
I would be happy to see some expert feedback on this solution, especially for the finalizer tricks.
/cc @Sergio0694 @saucecontrol @br3aker