Doubly linked freelist fixes #43636

PeterSolMS · 2020-10-20T12:58:38Z

Fix bugs in the doubly linked free list logic and enable it.

Details:

#define DOUBLY_LINKED_FL to enable
Fix issue with accidentally setting MAKE_FREE_OBJ_IN_COMPACT bit in the plug_and_gap structure that gets put before a pinned plug - this overwrites any object that used to be there (after it has been saved elsewhere), so added logic to detect the situation and set MAKE_FREE_OBJ_IN_COMPACT in the saved object instead. To make this work, added new field "saved_pinned_plug_index" to remember which pinned plug to look at.
Changed several comparisons because an object <= min_free_item_no_prev cannot be on the free list.
Fixed issue in should_set_bgc_mark_bit where an assert fired because current_sweep_seg was null because the background gc thread was about to sweep, but hadn't set current_sweep_seg just yet. Fix is to instead look at current_bgc_state, and if it is bgc_sweep_soh, we know sweeping has not started yet, but is about to, so we should return TRUE.

Details: - #define DOUBLY_LINKED_FL to enable - Fix issue with accidentally setting MAKE_FREE_OBJ_IN_COMPACT bit in the plug_and_gap structure that gets put before a pinned plug - this overwrites any object that used to be there (after it has been saved elsewhere), so added logic to detect the situation and set MAKE_FREE_OBJ_IN_COMPACT in the saved object instead. To make this work, added new field "saved_pinned_plug_index" to remember which pinned plug to look at. - Changed several comparisons because an object <= min_free_item_no_prev cannot be on the free list. - Fixed issue in should_set_bgc_mark_bit where an assert fired because current_sweep_seg was null because the background gc thread was about to sweep, but hadn't set current_sweep_seg just yet. Fix is to instead look at current_bgc_state, and if it is bgc_sweep_soh, we know sweeping has not started yet, but is about to, so we should return TRUE.

ghost · 2020-10-20T12:58:48Z

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

- set current_sweep_seg and current_sweep_pos early - this allows us to go back to the original code in should_set_bgc_mark_bit - in adjust_limit, fix case of a 0x18 byte plug in from of a pinned plug - for this we need a range test to catch the case, plus add this offset to the beginning of saved_pre_plug_reloc

mangod9 · 2020-10-22T05:22:38Z

Assume we have done analysis showing benefits of enabling this? Would be good to quantify the perf benefits for this.

PeterSolMS · 2020-10-22T12:56:16Z

@mangod9 I am doing this analysis now.

What we did know is that there are scenarios where the old scheme caused unnecessary heap growth, because we deleteted the free lists in gen 2 at the start of a background sweep, and so a gen 1 GC occurring at the wrong time may find fail to find a free space to compact items into.

The new scheme updates free lists incrementally, so should not suffer from this problem.

mangod9 · 2020-10-22T15:39:09Z

ok, thanks for the info. Would be great to see which metrics are improved as part of this change.

PeterSolMS · 2020-10-22T16:26:16Z

I did some experiments today and found that indeed maximum gen 2 size is substantially lower with the new scheme - this is with a GCPerfSim scenario that keeps about 2 GB of objects live.

I will do some more in depth analysis of the gathered traces to make sure this effect is indeed due to the change.

The idea is that the object pointed can only be at the very start of a saved plug_and_gap, or one pointer size further, because the object must be at least the minimum size.

PeterSolMS · 2020-11-03T14:26:31Z

Here are some charts from a perf experiment with and without this change.

The perf experiment allocates a large number of small objects and retains a small fraction of them. It was run on a large machine (128 cores, 256 virtual processors), using 250 GC heaps. It made use of our GC test application GCPerfSim which was run with the following command line:

-tc 125 -tagb 20000 -tlgb 2 -lohar 0 -sohsi 50 -sohSizeRange 96-256 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -pohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time

This means: using 125 allocating threads, allocate 20,000 GB of objects between 96 and 256 bytes, surviving every 50th object, and holding on to 2 GB of live objects.

The top two charts illustrate how the space available on Gen 2 free lists occasionally dips close to zero with the existing singly linked free list implementation (baseline) - it doesn't do that with the new doubly linked free list.

The next two charts show that this causes the size of the Gen 2 heap to expand for the existing implementation, but that Gen 2 heap size stays flat with the new doubly linked free list.

To be sure, there is a price to pay - adding or removing items is slightly more expensive with the doubly linked list. Our traces indicate that allocate_in_older_generation (which removes items from the free list) becomes about 25% more expensive in gen 1 GCs, but that function is itself only 1% of plan_phase for gen 1GCs. The other side of the coin is that background_sweep (which adds items to the free list) becomes about 7.4% more expensive. Background GC doesn't take much time itself, so the overall effect on GC CPU cost of this change is very small.

The problem of gen 2 heap growth caused by gen 1 GCs during background mark&sweep is much less common for real world applications that do other things besides allocating, but avoiding unnecessary gen 2 heap growth in the first place also avoids unnecessary gen 2 compacting GCs which may take multiple seconds of collection time with large heaps.

Maoni0

LGTM, and thanks for the perf data!

PeterSolMS added 8 commits August 21, 2020 14:57

delete backport.yml so I can push to master

3d639f2

Merge remote-tracking branch 'upstream/master'

c78344e

Undo unintended changes

0fc9aeb

Merge remote-tracking branch 'upstream/master'

9073278

Merge remote-tracking branch 'upstream/master'

eb4f188

Merge remote-tracking branch 'upstream/master'

c809f97

Merge remote-tracking branch 'upstream/master'

808750e

PeterSolMS requested review from cshung and Maoni0 October 20, 2020 12:58

Dotnet-GitSync-Bot added the area-GC-coreclr label Oct 20, 2020

Add assert in adjust_limit as suggested in code review:

2ef731b

The idea is that the object pointed can only be at the very start of a saved plug_and_gap, or one pointer size further, because the object must be at least the minimum size.

Maoni0 approved these changes Nov 3, 2020

View reviewed changes

PeterSolMS merged commit 86f14a2 into dotnet:master Nov 4, 2020

jkotas mentioned this pull request Nov 4, 2020

System.Runtime.Serialization.Formatters.Tests crashing with DOUBLY_LINKED_FL #44250

Closed

ghost locked as resolved and limited conversation to collaborators Dec 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doubly linked freelist fixes #43636

Doubly linked freelist fixes #43636

PeterSolMS commented Oct 20, 2020

ghost commented Oct 20, 2020

mangod9 commented Oct 22, 2020

PeterSolMS commented Oct 22, 2020

mangod9 commented Oct 22, 2020

PeterSolMS commented Oct 22, 2020

PeterSolMS commented Nov 3, 2020

Maoni0 left a comment

Doubly linked freelist fixes #43636

Doubly linked freelist fixes #43636

Conversation

PeterSolMS commented Oct 20, 2020

ghost commented Oct 20, 2020

mangod9 commented Oct 22, 2020

PeterSolMS commented Oct 22, 2020

mangod9 commented Oct 22, 2020

PeterSolMS commented Oct 22, 2020

PeterSolMS commented Nov 3, 2020

Maoni0 left a comment

Choose a reason for hiding this comment