Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve memory utilization in the case of continuous BGC #87715

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

cshung
Copy link
Member

@cshung cshung commented Jun 16, 2023

In the repro provided by @arian2ashk in #78959, we observed that the GC is utilizing memory poorly.

This is because the application is continuously allocating large objects, leading to continuous background GC. Before this change, background GC is unable to decommit free regions, leading to them being accumulated.

This change allows them to be decommitted by calling distribute_free_regions at the beginning of background GC while the runtime is still suspended.

To make this actually work nicely, I need to modify the heuristic for generation size growth estimation. This application keeps allocating objects slightly larger than 16M, and so the application almost always wastes half of the 32M regions we default to. In that case, subtracting the reserved portion in the size growth estimation is inappropriate. In this change, I added some counters to estimate how likely we fail to use the reserve portion, and therefore use that to discount the reserved portion for subtraction.

Before this change, this application regularly use 3G of memory on my machine, after this change, it regularly uses less than 1G.

This is a work in progress. I haven't validated it against any other test cases yet.

Here are some typical numbers when running against the repro, these numbers are captured during estimate_gen_growth for LOH.

new_allocation_gen 17462144
usable_free_space 0
reserved_not_in_use 50331448
budget_gen_old -32869304 
uoh_try_fit_segment_end_fail_count 23805
uoh_try_fit_segment_end_count 24225
usable_reserved_not_in_use 872619
budget_gen 16589525

The budget_gen_old was computed by new_allocation_gen - usable_free_space - reserved_not_in_use, with the large reserved_not_in_use space, that make budget_gen_old ends up a negative number, leading distribute_free_region to free all the available free_region

But looking at the uoh_try_fit_segment_end_fail_count compared with the uoh_try_fit_segment_end_count, we knew that majority of the time these reserved end space is not usable, so we discounted the reserved_not_in_use by that ratio to a much smaller usable_reserved_not_in_use, now the subtraction leads to a positve number, which means distribute_free_region will keep 1 free region, and that's sufficient to handle the allocation during background GC execution.

It sounded like a paradox, why we wanted to keep the free regions when we wanted to optimize for space? Here is why:

If we decided to free a region - we put it in the global_regions_to_decommit list and let the gradual decommit process to decommit it. The problem is that the regions allocation rate is faster than the decommit rate, so regions end up queuing in the global_regions_to_decommit list and increase the memory usage instead of dropping.

@ghost
Copy link

ghost commented Jun 16, 2023

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

In the repro provided by @arian2ashk in #78959, we observed that the GC is utilizing memory poorly.

This is because the application is continuously allocating large objects, leading to continuous background GC. Before this change, background GC is unable to decommit free regions, leading to them being accumulated.

This change allows them to be decommitted by calling distribute_free_regions at the beginning of background GC while the runtime is still suspended.

To make this actually work nicely, I need to modify the heuristic for generation size growth estimation. This application keeps allocating objects slightly larger than 16M, and so the application almost always wastes half of the 32M regions we default to. In that case, subtracting the reserved portion in the size growth estimation is inappropriate. In this change, I added some counters to estimate how likely we fail to use the reserve portion, and therefore use that to discount the reserved portion for subtraction.

Before this change, this application regularly use 3G of memory on my machine, after this change, it regularly uses less than 1G.

This is a work in progress. I haven't validated it against any other test cases yet.

Author: cshung
Assignees: cshung
Labels:

area-GC-coreclr

Milestone: -

@Maoni0
Copy link
Member

Maoni0 commented Jun 22, 2023

I see you fixed the min problem, that's good.

for the general approach, I don't think having this kind of accumulative accounting is appropriate because after a while the counts will be so big that you will hardly notice if the workload goes into a phase where the unused space becomes significant. keeping the history for the last BGC or the last few BGCs seems more appropriate.

from the implementation's POV, you don't need to interlock. while we are still holding the msl we know exactly whether the allocation is going to succeed or not so you could do a normal inc/dec on a per heap count.

we might not want use the count but rather use the space to estimate to be more accurate. you know how many regions LOH has at the end of a BGC and at the beginning of a BGC, you know how much reserved space in those regions isn't used. so you can calculate a factor there and use that instead.

@cshung cshung force-pushed the public/bgc-only-fix branch from 26b57c4 to 47b190a Compare December 11, 2023 19:29
@stephentoub
Copy link
Member

@cshung, are you still working on this?

@cshung
Copy link
Member Author

cshung commented Jul 22, 2024

@cshung, are you still working on this?

I believe @markples is working on it to take the idea of doing distribute_free_region in bgc forward, Mark?

@markples
Copy link
Member

#105521 contains distribute_free_regions in bgc. I'm going to look at Maoni's suggestion in her past paragraph above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants