Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

list of usage examples that illustrate current problems with ArrayPool and what the desired behavior should be #52098

Open
Maoni0 opened this issue Apr 30, 2021 · 30 comments
Labels
Milestone

Comments

@Maoni0
Copy link
Member

Maoni0 commented Apr 30, 2021

this issue is for folks who want ArrayPool (or other memory pools) to be improved. please explain it with usage examples. a usage example should consist of -

  • pseudo code snippet to illustrate your scenario
  • description of current problems
  • description of your desired behavior, eg, your pool is obviously part of the memory usage in a process, do you want your pool to have an individual threshold that you get notified when it's exceeded and you will do something about it, or do you want to with a default size and leave it up to whenever GC decides to collect.
  • any other perf implications you want to mention, eg, does your pool only manage buffers that will need to pinned, what kind of objects do you want to have in this pool, things like byte[] only or any type

then we have a discussion what kinds of behavior our pooling should offer and then we can decide if we need support from the GC and what that would be.

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Apr 30, 2021
@Maoni0 Maoni0 added tenet-performance Performance related issue and removed untriaged New issue has not been triaged by the area owner labels Apr 30, 2021
@Maoni0 Maoni0 added this to the Future milestone Apr 30, 2021
@jkotas
Copy link
Member

jkotas commented Apr 30, 2021

Problem 1: ArrayPool does not ever release pooled memory, unless there is a memory pressure.

using System;
using System.Buffers;

DoStuff<int>();
DoStuff<uint>();
DoStuff<long>();
DoStuff<ulong>();
DoStuff<DateTime>();
DoStuff<DateTimeOffset>();

// Let's say the above was done during startup and the rest of the application won't use those ArrayPools again

Console.WriteLine(GC.GetTotalMemory(true));

void DoStuff<T>()
{
   ArrayPool<T>.Shared.Return(ArrayPool<T>.Shared.Rent(1000000));
}

Actual behavior: Array pools are keeping 50MB forever
Expected behavior: The memory is released after it is not used for a while

dotnet/runtime libraries are reluctant to use ArrayPools for anything but byte and char to avoid hitting this issue.

@jkotas
Copy link
Member

jkotas commented Apr 30, 2021

Problem 2: Array pool does not pool enough

using System;
using System.Buffers;

for (int i = 0; i < 1000000; i++)
    Nested(100);

Console.WriteLine(GC.CollectionCount(0));

void Nested(int depth)
{
   if (depth == 0) return;

   var a = ArrayPool<byte>.Shared.Rent(10);
   Nested(depth-1);
   ArrayPool<byte>.Shared.Return(a);
}

Expected: 0 or very few GCs
Actual: Hundreds of GCs

We have to watch for using the array pool too much that would lead to hitting the threasholds. The Array pool should dynamically resize based on the application behavior.

@jkotas
Copy link
Member

jkotas commented Apr 30, 2021

Problem 3: ArrayPool does not work for very large arrays

using System;
using System.Buffers;

for (int i = 0; i < 1000; i++)
   ArrayPool<byte>.Shared.Return(ArrayPool<byte>.Shared.Rent(10_000_000));

Console.WriteLine(GC.CollectionCount(2));

Expected: 0 or very few GCs
Actual: Hundred of Gen2 GCs

This implementation limitation largely exists to reduce severity of Problem 1.

A side-effect is that code tends to use array pool limits for buffer sizes instead of figuring about what the rigth buffer size should actually be. Example from this week: https://github.com/dotnet/runtime/pull/49304/files#diff-7721d7d6b3bf50b13337accfc3a3857f235a3901e99c6f0a9ff4c2c5f84c2784R49

@jkotas
Copy link
Member

jkotas commented Apr 30, 2021

cc @davidfowl

@ghost
Copy link

ghost commented Apr 30, 2021

Tagging subscribers to this area: @tannergooding
See info in area-owners.md if you want to be subscribed.

Issue Details

this issue is for folks who want ArrayPool (or other memory pools) to be improved. please explain it with usage examples. a usage example should consist of -

  • pseudo code snippet to illustrate your scenario
  • description of current problems
  • description of your desired behavior, eg, your pool is obviously part of the memory usage in a process, do you want your pool to have an individual threshold that you get notified when it's exceeded and you will do something about it, or do you want to with a default size and leave it up to whenever GC decides to collect.
  • any other perf implications you want to mention, eg, does your pool only manage buffers that will need to pinned, what kind of objects do you want to have in this pool, things like byte[] only or any type

then we have a discussion what kinds of behavior our pooling should offer and then we can decide if we need support from the GC and what that would be.

Author: Maoni0
Assignees: -
Labels:

area-System.Runtime, tenet-performance

Milestone: Future

@Maoni0
Copy link
Member Author

Maoni0 commented Apr 30, 2021

thanks @jkotas! I will take a look at these hopefully in the next couple of weeks.

@antonfirsov
Copy link
Member

Important aspect of Problem 1 (#52098 (comment)) is that ArrayPools are per-type. A 3rd library party may unintentionally retain a bunch of memory by using ArrayPool<SomeLargeUniqueStruct>.Shared (this actually happened in an early alpha of ImageSharp).

@davidfowl
Copy link
Member

Another thing I would like to do is get rid of the Kestrel memory pool since it's at this point a glorified bunch of pooled buffers 4K on the pinned object heap. Today these ArrayPool buffers aren't pinned making them less than ideal for networking scenarios.

@Maoni0
Copy link
Member Author

Maoni0 commented May 5, 2021

Problem 2: Array pool does not pool enough

using System;
using System.Buffers;

for (int i = 0; i < 1000000; i++)
    Nested(100);

Console.WriteLine(GC.CollectionCount(0));

void Nested(int depth)
{
   if (depth == 0) return;

   var a = ArrayPool<byte>.Shared.Rent(10);
   Nested(depth-1);
   ArrayPool<byte>.Shared.Return(a);
}

Expected: 0 or very few GCs
Actual: Hundreds of GCs

We have to watch for using the array pool too much that would lead to hitting the threasholds. The Array pool should dynamically resize based on the application behavior.

@jkotas for this problem I cannot repro what you saw. I printed out a bit more info -

elapsed 28222 ms, 0 gen0 GCs, 0 gen1, 0 gen2, allocated 17,264 bytes

it allocates very little and no GCs were triggered. this is with the current .net 6 build.

@jkotas
Copy link
Member

jkotas commented May 5, 2021

The array pool global cache size is 8 * Environment.ProcessorCount arrays per size bucket. I guess you have run it on 16+ core machine and so the pool was big to avoid allocations. Here is a modification that should produce the result reliably:

using System;
using System.Threading;
using System.Buffers;

int processors = Environment.ProcessorCount;
for (int i = 0; i < processors; i++)
   new Thread(Work).Start();
Work();

Console.WriteLine(GC.CollectionCount(0));

void Work()
{
    for (int i = 0; i < 1000000; i++)
        Nested(100);
}

void Nested(int depth)
{
   if (depth == 0) return;

   var a = ArrayPool<byte>.Shared.Rent(10);
   Nested(depth-1);
   ArrayPool<byte>.Shared.Return(a);
}

@Maoni0
Copy link
Member Author

Maoni0 commented May 5, 2021

I see. yes I did run it on a big machine (way more than 16 cores). I can repro with your new example. thanks!

@danmoseley
Copy link
Member

@geoffkizer @stephentoub

@br3aker
Copy link

br3aker commented May 10, 2021

Problem 4: ConfigurableArrayPool rent logic isn't too memory-friendly with this try-next-bucket scenario

Current pooling implementation can waste up to (currentBucketSize / 2) - 1 "elements" in the worst case when we try to rent arrays of size bigger by 1 than bucket size, e.g. rent(17) will fit in bucket of 32 elements and waste 15 elements of space.

const int MaxBucketsToTry = 2;
int i = index;
do
{
// Attempt to rent from the bucket. If we get a buffer from it, return it.
buffer = _buckets[i].Rent();
if (buffer != null)
{
if (log.IsEnabled())
{
log.BufferRented(buffer.GetHashCode(), buffer.Length, Id, _buckets[i].Id);
}
return buffer;
}
}
while (++i < _buckets.Length && i != index + MaxBucketsToTry);

With those jumps up to 2 buckets memory waste can skyrocket, e.g. rent(17) could possibly miss 32 bucket and hit 64, wasting up to 47 "elements" (73.4%) in the worst case scenario.
Without this edge case, potential pooled resource size can be a factor of 2, e.g rent(32) -> bucket:64. This can easily escalate to 1KB -> 2KB waste, 1MB -> 2MB and so on.

Not only it wastes memory, it blocks higher order memory requests and forces it to be allocated (which is a thing for LOH allocations atm).

*This behaviour is restricted to ConfigurableArrayPool only, TlsOverPerCoreLockedStacksArrayPool from ArrayPool<T>.Shared property is not affected.

One of use cases affected by this problem:
Batch image thumbnail generation

// Takes large buffer as expected
var original = LoadImage(...);
// Can potentially take another large buffer 
// Because last medium buffer is blocked
var medium = original.Clone(new Size(...));

This code can be run for multiple images in a batch which can cause said problem.

@antonfirsov
Copy link
Member

antonfirsov commented May 10, 2021

Speaking of ConfigurableArrayPool, I would also call it a problem that the closed implementation of ArrayPool<T>.Shared now strongly deviated from what is publicly available to instantiate for 3rd parties. @Maoni0 if there will be a fix for Problem 1, it would be great to also make it available for ConfigurableArrayPool.


[off] @br3aker --> https://gist.github.com/antonfirsov/d6d8457d09636692dba3b169c1e7790a

@geoffkizer
Copy link
Contributor

Thanks for making these issues known @jkotas. I was unaware of these.

Naively I would expect that
(a) Every allocation is pooled when returned
(b) The pool is scavenged periodically to trim the pool of recently unused items

This seems like it would solve problems (1)-(3) above. Perhaps more importantly, I think it would match user expectations as well.

@Maoni0
Copy link
Member Author

Maoni0 commented May 14, 2021

@geoffkizer I agree and b) is done to some extend currently. I was able to spend some time on this and will have a write up soon that illustrates both the problems and possible solutions.

@antonfirsov
Copy link
Member

antonfirsov commented Jul 24, 2021

Note that #55621 fixed "Problem 2: ArrayPool does not pool enough", but made "Problem 1: ArrayPool does not ever release pooled memory, unless there is a memory pressure." much worse.

This may lead to user complaints similar to SixLabors/ImageSharp#1590 (ArrayPool unexpectedly retains large blocks of memory forever). What would the runtime/BCL fix to this problem look like? Is it still expected to land in 6.0? Any recommendations for 3rd party library authors implementing their own pooling?

/cc @Maoni0 @stephentoub and also @sebastienros as OP of the ImageSharp discussion post.

@stephentoub
Copy link
Member

stephentoub commented Jul 24, 2021

but made "Problem 1: ArrayPool does not ever release pooled memory, unless there is a memory pressure." much worse.

Do you have evidence of that? All of the arrays that PR enabled to be pooled would already be on the LOH.

@antonfirsov
Copy link
Member

It's already on the LOH, but when (let's say a byte[] of 500MB) is returned to ArrayPool<byte>.Shared it might be retained there in it's bucket forever, with a chance that no code ever will reuse it. Or am I misunderstanding how it works?

@stephentoub
Copy link
Member

My point is that whatever memory pressure would result in a Gen2 collection to collect the array that had been allocated on the LOH would also trigger trimming of arrays stored in the pool. At that point, it's just a matter of tuning as to how much such trimming culls back.

@antonfirsov
Copy link
Member

antonfirsov commented Jul 24, 2021

If I'm getting #52098 (comment) right, trimming doesn't really trim, unless the process gets into medium/high memory pressure state. --> Users may see the BCL retaining large arrays for no reason.

@stephentoub
Copy link
Member

trimming doesn't really trim

It does, it just doesn't trim all the way to nothing in low pressure, in particular because it doesn't trim the thread-local cache unless there's high pressure today. And the cited example would have all of the returned arrays in the thread-local cache, which can store one array of each size in a thread. Run this example instead:

using System.Buffers;

DoStuff<int>();
DoStuff<uint>();
DoStuff<long>();
DoStuff<ulong>();
DoStuff<DateTime>();
DoStuff<DateTimeOffset>();
Console.WriteLine(GC.GetTotalMemory(false));

// Let's say the above was done during startup and the rest of the application won't use those ArrayPools again
while (true)
{
    Console.WriteLine(GC.GetTotalMemory(true));
    Thread.Sleep(10_000);
}

void DoStuff<T>()
{
    T[] one = ArrayPool<T>.Shared.Rent(1_000_000);
    T[] two = ArrayPool<T>.Shared.Rent(1_000_000);
    ArrayPool<T>.Shared.Return(two);
    ArrayPool<T>.Shared.Return(one);
}

the difference just being that it's creating two arrays of each size rather than one. Run it and wait 60 seconds... you'll see the total memory get cut in half when one of each array gets trimmed away.

This is why I said it's about tuning.

@antonfirsov
Copy link
Member

I think this issue is about revisiting the current tuning. On the current master, the following code will retain 2GB of memory, without ever freeing it up, using ArrayPool<byte>.Shared only:

for (long size = 1024 * 1024; size <= 1073741824; size *= 2)
{
    byte[] a = ArrayPool<byte>.Shared.Rent((int)size);
    ArrayPool<byte>.Shared.Return(a);
}

While this might seem an artificial scenario, there are use cases which could make it very real. Imagine an application working with very large images (of different sizes) assuming the underlying imaging library relies heavily on ArrayPool<byte>.Shared. The application might be very heavy on imaging operations during a warmup time, but not doing them anymore in the rest of it's execution time (eg. because of caching), so retaining the arrays doesn't make sense anymore. This is essentially what's SixLabors/ImageSharp#1590 is about, with the difference that currently we are using a custom instance of ConfigurableArrayPool<byte> to pool very large arrays.

@stephentoub
Copy link
Member

stephentoub commented Jul 25, 2021

I think this issue is about revisiting the current tuning.

You asked "What would the runtime/BCL fix to this problem look like?" and I'm answering that question: tune the Trim method.

@jkotas
Copy link
Member

jkotas commented Sep 24, 2021

#58974 is another example of unexpected behavior that comes as a surprise: ArrayPool does not always work well when renting/returning happens on different threads.

@Bio2hazard
Copy link

The memory pressure is determined through GC.GetGCMemoryInfo().HighMemoryLoadThresholdBytes ( https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/Buffers/Utilities.cs#L38-L56 ).

On docker containers that enforce a cgroup memory limit, the value for HighMemoryLoadThresholdBytes is actually greater than the TotalAvailableMemoryBytes. ( #58974 (comment) )

This means when running inside a container with a cgroup limit, the app will always OOM before Utilities.GetMemoryPressure() returns MemoryPressure.High ( because the threshold for MemoryPressure.High exceeds the amount of memory available ). The window for it to return return MemoryPressure.Medium is also much smaller.

Perhaps consider either applying the fix proposed in #58974 (comment) or changing the logic to use TotalAvailableMemoryBytes instead. 👍

@tmds
Copy link
Member

tmds commented Sep 27, 2021

The memory pressure is determined through GC.GetGCMemoryInfo().HighMemoryLoadThresholdBytes ( https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/Buffers/Utilities.cs#L38-L56 ).

This should also consider load of the managed heap in a container, and that will trigger aggressive trimming in @Bio2Hazzard's example application: https://gist.github.com/Bio2hazard/ee353c1042ee56a97c0d0b3d62c590bc.

@dennis-yemelyanov
Copy link
Contributor

I was surprised that ArrayPool<T>.Shared doesn't actually seem to share pooled arrays between threads. I would expect the following code to print "Got uninitialized array" just once, but it prints this message 10 times:

using System.Buffers;

var pool = ArrayPool<int?>.Shared;

// Uncomment next line to get expected behavior
//pool = ArrayPool<int?>.Create(1024 * 1024, 50);

for (int i = 0; i < 10; i++)
{
    var th = new Thread(() =>
    {
        var arr = pool.Rent(1);

        if (arr[0] == null)
        {
            Console.WriteLine("Got uninitialized array");

            arr[0] = 0;
        }

        pool.Return(arr);
    });

    th.Start();

    th.Join();
}

Interestingly, a pool created using ArrayPool<T>.Create() works as expected.

So what does the "Shared" name mean in this context?

@saucecontrol
Copy link
Member

ArrayPool<T>.Shared is implemented by TlsOverPerCoreLockedStacksArrayPool, which maintains a pool per thread in order to avoid having to take a lock when renting/returning in the majority case. The comments explain the setup:

The implementation uses a tiered caching scheme, with a small per-thread cache for each array size, followed by a cache per array size shared by all threads, split into per-core stacks meant to be used by threads running on that core. Locks are used to protect each per-core stack, because a thread can migrate after checking its processor number, because multiple threads could interleave on the same core, and because a thread is allowed to check other core's buckets if its core's bucket is empty/full.

You can, as you discovered, create your own pool if that behavior is not desirable for your use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests