Improve memory utilization in the case of continuous BGC #87715

cshung · 2023-06-16T22:28:40Z

In the repro provided by @arian2ashk in #78959, we observed that the GC is utilizing memory poorly.

This is because the application is continuously allocating large objects, leading to continuous background GC. Before this change, background GC is unable to decommit free regions, leading to them being accumulated.

This change allows them to be decommitted by calling distribute_free_regions at the beginning of background GC while the runtime is still suspended.

To make this actually work nicely, I need to modify the heuristic for generation size growth estimation. This application keeps allocating objects slightly larger than 16M, and so the application almost always wastes half of the 32M regions we default to. In that case, subtracting the reserved portion in the size growth estimation is inappropriate. In this change, I added some counters to estimate how likely we fail to use the reserve portion, and therefore use that to discount the reserved portion for subtraction.

Before this change, this application regularly use 3G of memory on my machine, after this change, it regularly uses less than 1G.

This is a work in progress. I haven't validated it against any other test cases yet.

Here are some typical numbers when running against the repro, these numbers are captured during estimate_gen_growth for LOH.

new_allocation_gen 17462144
usable_free_space 0
reserved_not_in_use 50331448
budget_gen_old -32869304 
uoh_try_fit_segment_end_fail_count 23805
uoh_try_fit_segment_end_count 24225
usable_reserved_not_in_use 872619
budget_gen 16589525

The budget_gen_old was computed by new_allocation_gen - usable_free_space - reserved_not_in_use, with the large reserved_not_in_use space, that make budget_gen_old ends up a negative number, leading distribute_free_region to free all the available free_region

But looking at the uoh_try_fit_segment_end_fail_count compared with the uoh_try_fit_segment_end_count, we knew that majority of the time these reserved end space is not usable, so we discounted the reserved_not_in_use by that ratio to a much smaller usable_reserved_not_in_use, now the subtraction leads to a positve number, which means distribute_free_region will keep 1 free region, and that's sufficient to handle the allocation during background GC execution.

It sounded like a paradox, why we wanted to keep the free regions when we wanted to optimize for space? Here is why:

If we decided to free a region - we put it in the global_regions_to_decommit list and let the gradual decommit process to decommit it. The problem is that the regions allocation rate is faster than the decommit rate, so regions end up queuing in the global_regions_to_decommit list and increase the memory usage instead of dropping.

ghost · 2023-06-16T22:28:50Z

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

In the repro provided by @arian2ashk in #78959, we observed that the GC is utilizing memory poorly.

This is because the application is continuously allocating large objects, leading to continuous background GC. Before this change, background GC is unable to decommit free regions, leading to them being accumulated.

This change allows them to be decommitted by calling distribute_free_regions at the beginning of background GC while the runtime is still suspended.

To make this actually work nicely, I need to modify the heuristic for generation size growth estimation. This application keeps allocating objects slightly larger than 16M, and so the application almost always wastes half of the 32M regions we default to. In that case, subtracting the reserved portion in the size growth estimation is inappropriate. In this change, I added some counters to estimate how likely we fail to use the reserve portion, and therefore use that to discount the reserved portion for subtraction.

Before this change, this application regularly use 3G of memory on my machine, after this change, it regularly uses less than 1G.

This is a work in progress. I haven't validated it against any other test cases yet.

Author:	cshung
Assignees:	cshung
Labels:	`area-GC-coreclr`
Milestone:	-

Maoni0 · 2023-06-22T22:15:36Z

I see you fixed the min problem, that's good.

for the general approach, I don't think having this kind of accumulative accounting is appropriate because after a while the counts will be so big that you will hardly notice if the workload goes into a phase where the unused space becomes significant. keeping the history for the last BGC or the last few BGCs seems more appropriate.

from the implementation's POV, you don't need to interlock. while we are still holding the msl we know exactly whether the allocation is going to succeed or not so you could do a normal inc/dec on a per heap count.

we might not want use the count but rather use the space to estimate to be more accurate. you know how many regions LOH has at the end of a BGC and at the beginning of a BGC, you know how much reserved space in those regions isn't used. so you can calculate a factor there and use that instead.

stephentoub · 2024-07-22T13:18:15Z

@cshung, are you still working on this?

cshung · 2024-07-22T13:21:15Z

@cshung, are you still working on this?

I believe @markples is working on it to take the idea of doing distribute_free_region in bgc forward, Mark?

markples · 2024-07-30T17:06:29Z

#105521 contains distribute_free_regions in bgc. I'm going to look at Maoni's suggestion in her past paragraph above.

dotnet-issue-labeler bot added the area-GC-coreclr label Jun 16, 2023

ghost assigned cshung Jun 16, 2023

cshung mentioned this pull request Jun 21, 2023

Asp.Net Core API controller memory leak repopend #45098 dotnet/aspnetcore#48641

Closed

cshung added 2 commits December 11, 2023 11:26

Improve memory utilization in the case of continuous BGC

d58b0a6

Avoid taking minimum

47b190a

cshung force-pushed the public/bgc-only-fix branch from 26b57c4 to 47b190a Compare December 11, 2023 19:29

markples mentioned this pull request Aug 9, 2024

[GC] Handle growing global_regions_to_decommit lists #106168

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve memory utilization in the case of continuous BGC #87715

Improve memory utilization in the case of continuous BGC #87715

cshung commented Jun 16, 2023 •

edited

Loading

ghost commented Jun 16, 2023

Maoni0 commented Jun 22, 2023

stephentoub commented Jul 22, 2024

cshung commented Jul 22, 2024

markples commented Jul 30, 2024

Improve memory utilization in the case of continuous BGC #87715

Are you sure you want to change the base?

Improve memory utilization in the case of continuous BGC #87715

Conversation

cshung commented Jun 16, 2023 • edited Loading

ghost commented Jun 16, 2023

Maoni0 commented Jun 22, 2023

stephentoub commented Jul 22, 2024

cshung commented Jul 22, 2024

markples commented Jul 30, 2024

cshung commented Jun 16, 2023 •

edited

Loading