-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve memory utilization in the case of continuous BGC #87715
base: main
Are you sure you want to change the base?
Conversation
Tagging subscribers to this area: @dotnet/gc Issue DetailsIn the repro provided by @arian2ashk in #78959, we observed that the GC is utilizing memory poorly. This is because the application is continuously allocating large objects, leading to continuous background GC. Before this change, background GC is unable to decommit free regions, leading to them being accumulated. This change allows them to be decommitted by calling To make this actually work nicely, I need to modify the heuristic for generation size growth estimation. This application keeps allocating objects slightly larger than 16M, and so the application almost always wastes half of the 32M regions we default to. In that case, subtracting the reserved portion in the size growth estimation is inappropriate. In this change, I added some counters to estimate how likely we fail to use the reserve portion, and therefore use that to discount the reserved portion for subtraction. Before this change, this application regularly use 3G of memory on my machine, after this change, it regularly uses less than 1G. This is a work in progress. I haven't validated it against any other test cases yet.
|
I see you fixed the min problem, that's good. for the general approach, I don't think having this kind of accumulative accounting is appropriate because after a while the counts will be so big that you will hardly notice if the workload goes into a phase where the unused space becomes significant. keeping the history for the last BGC or the last few BGCs seems more appropriate. from the implementation's POV, you don't need to interlock. while we are still holding the msl we know exactly whether the allocation is going to succeed or not so you could do a normal inc/dec on a per heap count. we might not want use the count but rather use the space to estimate to be more accurate. you know how many regions LOH has at the end of a BGC and at the beginning of a BGC, you know how much reserved space in those regions isn't used. so you can calculate a factor there and use that instead. |
26b57c4
to
47b190a
Compare
@cshung, are you still working on this? |
#105521 contains |
In the repro provided by @arian2ashk in #78959, we observed that the GC is utilizing memory poorly.
This is because the application is continuously allocating large objects, leading to continuous background GC. Before this change, background GC is unable to decommit free regions, leading to them being accumulated.
This change allows them to be decommitted by calling
distribute_free_regions
at the beginning of background GC while the runtime is still suspended.To make this actually work nicely, I need to modify the heuristic for generation size growth estimation. This application keeps allocating objects slightly larger than 16M, and so the application almost always wastes half of the 32M regions we default to. In that case, subtracting the reserved portion in the size growth estimation is inappropriate. In this change, I added some counters to estimate how likely we fail to use the reserve portion, and therefore use that to discount the reserved portion for subtraction.
Before this change, this application regularly use 3G of memory on my machine, after this change, it regularly uses less than 1G.
This is a work in progress. I haven't validated it against any other test cases yet.
Here are some typical numbers when running against the repro, these numbers are captured during
estimate_gen_growth
for LOH.The
budget_gen_old
was computed bynew_allocation_gen - usable_free_space - reserved_not_in_use
, with the largereserved_not_in_use
space, that makebudget_gen_old
ends up a negative number, leadingdistribute_free_region
to free all the available free_regionBut looking at the
uoh_try_fit_segment_end_fail_count
compared with theuoh_try_fit_segment_end_count
, we knew that majority of the time these reserved end space is not usable, so we discounted thereserved_not_in_use
by that ratio to a much smallerusable_reserved_not_in_use
, now the subtraction leads to a positve number, which meansdistribute_free_region
will keep 1 free region, and that's sufficient to handle the allocation during background GC execution.It sounded like a paradox, why we wanted to keep the free regions when we wanted to optimize for space? Here is why:
If we decided to free a region - we put it in the
global_regions_to_decommit
list and let the gradual decommit process to decommit it. The problem is that the regions allocation rate is faster than the decommit rate, so regions end up queuing in theglobal_regions_to_decommit
list and increase the memory usage instead of dropping.