Dynamic heap count #86245

PeterSolMS · 2023-05-15T12:58:15Z

This is an initial implementation for changing the GC heap count dynamically in response to changing load conditions.

Using more heaps will increase memory footprint, but in most cases also improve throughput because more GC work is parallelized, and lock contention on the allocation code path is reduced by spreading the load.

The algorithm used makes this tradeoff explicit and increases the heap count by comparing the estimated percentage increase in throughput with the estimated percentage increase in memory footprint. It increases the heap count if the throughput is estimated to increase at least one percentage point more than the memory footprint increase, and decreases the heap count if the estimated reduction in memory footprint is at least one percentage point more than the decrease in throughput.

Because the input data for GC pause etc. are quite noisy, we use a median of 3 filter before the data is used to make decisions. Preliminary data suggests this is effective, but probably not enough.

…e a bit of testing.

- park extra threads on gc_idle_thread_event - update thread count in join - null free lists and have background GC rebuild them - have redistribute_regions call fix_allocation_contexts - distribute free regions as well - fix up ephemeral_heap_segment and generation_allocation_segment

Add hack to dynamically enable heap verify when we change the heap count.

- move finalization data between heaps - update free list space per heap when rethreading free lists - update allocation contexts so they don't reference decommissioned heaps - allow redistribute_regions to fail - allow enter_spin_lock to fail when called from try_allocate_more_space - don't decommision heaps with a taken more space lock - poison dynamic data and generation table for decommissioned heaps

- Add instrumentation - Be more careful regarding signed vs. unsigned types to make GCC happy.

…ecking for containing heap to verify_free_lists, adding checking decommissioned heaps to verify_heap.

…ntly cannot handle that

…e a bit of testing.

- park extra threads on gc_idle_thread_event - update thread count in join - null free lists and have background GC rebuild them - have redistribute_regions call fix_allocation_contexts - distribute free regions as well - fix up ephemeral_heap_segment and generation_allocation_segment

Add hack to dynamically enable heap verify when we change the heap count.

- move finalization data between heaps - update free list space per heap when rethreading free lists - update allocation contexts so they don't reference decommissioned heaps - allow redistribute_regions to fail - allow enter_spin_lock to fail when called from try_allocate_more_space - don't decommision heaps with a taken more space lock - poison dynamic data and generation table for decommissioned heaps

- Add instrumentation - Be more careful regarding signed vs. unsigned types to make GCC happy.

ghost · 2023-05-15T12:58:37Z

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

This is an initial implementation for changing the GC heap count dynamically in response to changing load conditions.

Using more heaps will increase memory footprint, but in most cases also improve throughput because more GC work is parallelized, and lock contention on the allocation code path is reduced by spreading the load.

The algorithm used makes this tradeoff explicit and increases the heap count by comparing the estimated percentage increase in throughput with the estimated percentage increase in memory footprint. It increases the heap count if the throughput is estimated to increase at least one percentage point more than the memory footprint increase, and decreases the heap count if the estimated reduction in memory footprint is at least one percentage point more than the decrease in throughput.

Because the input data for GC pause etc. are quite noisy, we use a median of 3 filter before the data is used to make decisions. Preliminary data suggests this is effective, but probably not enough.

Author:	PeterSolMS
Assignees:	PeterSolMS
Labels:	`area-GC-coreclr`
Milestone:	-

- rename config to GCDynamicAdaptation - change dynamic heap count dprintf level so we can print just those messages - aim for a percentage overhead reading between 1 and 5% - if above 10%, ramp up agressively, if above 5%, ramp up a step, if below 1% and significant space gains are possible, ramp down a step. - make space cost computation per heap more realistic - use min gen 0 budget

…ze to 2.5 MB if COMPLUS_GCDynamicAdaption is enabled.

… confusion.

…ck into service.

…mic_data - we had moved regions to other heaps, so total_gen_size became 0. We had adjusted generation_free_list_space for this when rethreading the free lists, but not generation_free_obj_space. So dd_current_size became a large positive number as a result.

Maoni0

@PeterSolMS and I talked about this and we do want to get this in for Preview 5 to get more testing. we've already gone through the changes offline.

… for the assert, which is likely some faulty bookkeeping.

PeterSolMS · 2023-05-23T13:58:02Z

Regarding the assert failures in compute_new_dynamic_data, the cases I could repro were all for gen 1 where the dd_current_size is actually irrelevant for the budget computation. Nonetheless, it seemed safer to set dd_current_size to 0 rather than a huge positive value for the case of total_gen_size < dd_fragmentation (dd). My guess is that when we empty gen 1, we don't actually zero generation_free_obj_space.

mangod9 · 2023-05-23T14:32:58Z

Regarding the assert failures in compute_new_dynamic_data, the cases I could repro were all for gen 1 where the dd_current_size is actually irrelevant for the budget computation. Nonetheless, it seemed safer to set dd_current_size to 0 rather than a huge positive value for the case of total_gen_size < dd_fragmentation (dd). My guess is that when we empty gen 1, we don't actually zero generation_free_obj_space.

Is this something which needs to be fixed before merging?

PeterSolMS · 2023-05-23T15:22:40Z

No, this is a lurking issue that has nothing to do with dynamic heap count. It happened without the feature active, and all the cases I looked at were on WKS GC, and were benign in the sense that there was no impact on the budget computation or other correctness aspects.

mangod9 · 2023-05-23T15:41:41Z

Looks like the failures are known per Build-Analysis. Should be ok to merge.

PeterSolMS · 2023-05-23T15:44:11Z

Agree. Note that I have a PR out for the assert failure.

PeterSolMS and others added 30 commits March 9, 2023 09:20

Implementation of redistribute_regions along with some hacks to enabl…

e9b4114

…e a bit of testing.

Incorporate handle table changes, fix mark_list_piece, mark_list sizing.

5317634

Add hack to dynamically enable heap verify when we change the heap count.

rethreading freelists

dd8d536

Merge branch 'main' into redistribute_regions

8319f84

fixed #ifdef for segments version

9e87004

Rethread free lists for the other generations.

6deeb03

- Fix free list rethreading issue

6bb064a

- Add instrumentation - Be more careful regarding signed vs. unsigned types to make GCC happy.

Fixes for GCC compiler.

c2a8ef0

Rename restribute_regions to change_heap_count, refactor code, add ch…

5d9ebf8

…ecking for containing heap to verify_free_lists, adding checking decommissioned heaps to verify_heap.

Merge branch 'main' into redistribute_regions

588631d

Fix compile error with segments.

e6c8b5f

misc change to get repros

4951346

got rid of code no longer needed

9af5141

different way of handling decommissioned heaps

9d315e4

do not set some fields to 0 when recommissioing a heap as we concurre…

204645d

…ntly cannot handle that

enabled msl and loh alloc state recording

c2f1187

Compile coreclr.dll with regions, enable memory mapped stresslog

51f3c8c

Merge with main.

f539f60

Fix compilation error after merge.

72538a7

Implementation of redistribute_regions along with some hacks to enabl…

4833f63

…e a bit of testing.

Incorporate handle table changes, fix mark_list_piece, mark_list sizing.

bf782eb

Add hack to dynamically enable heap verify when we change the heap count.

rethreading freelists

51277e5

fixed #ifdef for segments version

8a45544

Rethread free lists for the other generations.

67677a8

- Fix free list rethreading issue

e9f45c7

- Add instrumentation - Be more careful regarding signed vs. unsigned types to make GCC happy.

Fixes for GCC compiler.

129c141

PeterSolMS requested review from mangod9 and mrsharm May 15, 2023 12:58

ghost assigned PeterSolMS May 15, 2023

dotnet-issue-labeler bot added the area-GC-coreclr label May 15, 2023

PeterSolMS added 10 commits May 15, 2023 15:00

Remove extra comma accidentally inserted.

5766439

Enable dynamic heap count instrumentation temporarily, limit gen 0 si…

3f79600

…ze to 2.5 MB if COMPLUS_GCDynamicAdaption is enabled.

Merge with main

91c44d9

Fix GCC warning messages.

c365d92

More GCC warning work.

55e0613

Merge with changes in redistribute_regions branch.

87c1bea

Disable TRACE_GC, fix to get rid of GCC warning about signed/unsigned…

1d98155

… confusion.

Re-initialize dd_fragmentation and dd_gc_clock when putting a heap ba…

60d42cd

…ck into service.

runfoapp bot mentioned this pull request May 22, 2023

Infra improvements for Helix #68176

Closed

Maoni0 approved these changes May 23, 2023

View reviewed changes

Replace assert with an if-statement. This does not fix the root cause…

e70d4a6

… for the assert, which is likely some faulty bookkeeping.

mangod9 approved these changes May 23, 2023

View reviewed changes

PeterSolMS merged commit f6f7d89 into dotnet:main May 23, 2023

This was referenced May 23, 2023

Tracking issue for CI build timeouts #76454

Closed

Unable to load Analyzer assembly .../Microsoft.CodeAnalysis.Analyzers.dll : Not a valid assembly #85082

Closed

Assert failure in GC/API/NoGCRegion/Callback_Svr test #86612

Closed

EgorBo mentioned this pull request Jun 1, 2023

Dynamic heap count perf regressions #87014

Closed

ghost locked as resolved and limited conversation to collaborators Jun 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic heap count #86245

Dynamic heap count #86245

PeterSolMS commented May 15, 2023

ghost commented May 15, 2023

Maoni0 left a comment

PeterSolMS commented May 23, 2023

mangod9 commented May 23, 2023

PeterSolMS commented May 23, 2023

mangod9 commented May 23, 2023

PeterSolMS commented May 23, 2023

Dynamic heap count #86245

Dynamic heap count #86245

Conversation

PeterSolMS commented May 15, 2023

ghost commented May 15, 2023

Maoni0 left a comment

Choose a reason for hiding this comment

PeterSolMS commented May 23, 2023

mangod9 commented May 23, 2023

PeterSolMS commented May 23, 2023

mangod9 commented May 23, 2023

PeterSolMS commented May 23, 2023