Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doubly linked freelist fixes #43636

Merged
merged 10 commits into from
Nov 4, 2020

Conversation

PeterSolMS
Copy link
Contributor

Fix bugs in the doubly linked free list logic and enable it.

Details:

  • #define DOUBLY_LINKED_FL to enable
  • Fix issue with accidentally setting MAKE_FREE_OBJ_IN_COMPACT bit in the plug_and_gap structure that gets put before a pinned plug - this overwrites any object that used to be there (after it has been saved elsewhere), so added logic to detect the situation and set MAKE_FREE_OBJ_IN_COMPACT in the saved object instead. To make this work, added new field "saved_pinned_plug_index" to remember which pinned plug to look at.
  • Changed several comparisons because an object <= min_free_item_no_prev cannot be on the free list.
  • Fixed issue in should_set_bgc_mark_bit where an assert fired because current_sweep_seg was null because the background gc thread was about to sweep, but hadn't set current_sweep_seg just yet. Fix is to instead look at current_bgc_state, and if it is bgc_sweep_soh, we know sweeping has not started yet, but is about to, so we should return TRUE.

Details:
 - #define DOUBLY_LINKED_FL to enable
 - Fix issue with accidentally setting MAKE_FREE_OBJ_IN_COMPACT bit in the plug_and_gap structure that gets put before a pinned plug - this overwrites any object that used to be there (after it has been saved elsewhere), so added logic to detect the situation and set MAKE_FREE_OBJ_IN_COMPACT in the saved object instead. To make this work, added new field "saved_pinned_plug_index" to remember which pinned plug to look at.
- Changed several comparisons because an object <= min_free_item_no_prev cannot be on the free list.
- Fixed issue in should_set_bgc_mark_bit where an assert fired because current_sweep_seg was null because the background gc thread was about to sweep, but hadn't set current_sweep_seg just yet. Fix is to instead look at current_bgc_state, and if it is bgc_sweep_soh, we know sweeping has not started yet, but is about to, so we should return TRUE.
@ghost
Copy link

ghost commented Oct 20, 2020

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

- set current_sweep_seg and current_sweep_pos early - this allows us to go back to the original code in should_set_bgc_mark_bit
- in adjust_limit, fix case of a 0x18 byte plug in from of a pinned plug - for this we need a range test to catch the case, plus add this offset to the beginning of saved_pre_plug_reloc
@mangod9
Copy link
Member

mangod9 commented Oct 22, 2020

Assume we have done analysis showing benefits of enabling this? Would be good to quantify the perf benefits for this.

@PeterSolMS
Copy link
Contributor Author

@mangod9 I am doing this analysis now.

What we did know is that there are scenarios where the old scheme caused unnecessary heap growth, because we deleteted the free lists in gen 2 at the start of a background sweep, and so a gen 1 GC occurring at the wrong time may find fail to find a free space to compact items into.

The new scheme updates free lists incrementally, so should not suffer from this problem.

@mangod9
Copy link
Member

mangod9 commented Oct 22, 2020

ok, thanks for the info. Would be great to see which metrics are improved as part of this change.

@PeterSolMS
Copy link
Contributor Author

I did some experiments today and found that indeed maximum gen 2 size is substantially lower with the new scheme - this is with a GCPerfSim scenario that keeps about 2 GB of objects live.

I will do some more in depth analysis of the gathered traces to make sure this effect is indeed due to the change.

The idea is that the object pointed can only be at the very start of a saved plug_and_gap, or one pointer size further, because the object must be at least the minimum size.
@PeterSolMS
Copy link
Contributor Author

Here are some charts from a perf experiment with and without this change.

The perf experiment allocates a large number of small objects and retains a small fraction of them. It was run on a large machine (128 cores, 256 virtual processors), using 250 GC heaps. It made use of our GC test application GCPerfSim which was run with the following command line:

-tc 125 -tagb 20000 -tlgb 2 -lohar 0 -sohsi 50 -sohSizeRange 96-256 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -pohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time

This means: using 125 allocating threads, allocate 20,000 GB of objects between 96 and 256 bytes, surviving every 50th object, and holding on to 2 GB of live objects.

Doubly linked free list

The top two charts illustrate how the space available on Gen 2 free lists occasionally dips close to zero with the existing singly linked free list implementation (baseline) - it doesn't do that with the new doubly linked free list.

The next two charts show that this causes the size of the Gen 2 heap to expand for the existing implementation, but that Gen 2 heap size stays flat with the new doubly linked free list.

To be sure, there is a price to pay - adding or removing items is slightly more expensive with the doubly linked list. Our traces indicate that allocate_in_older_generation (which removes items from the free list) becomes about 25% more expensive in gen 1 GCs, but that function is itself only 1% of plan_phase for gen 1GCs. The other side of the coin is that background_sweep (which adds items to the free list) becomes about 7.4% more expensive. Background GC doesn't take much time itself, so the overall effect on GC CPU cost of this change is very small.

The problem of gen 2 heap growth caused by gen 1 GCs during background mark&sweep is much less common for real world applications that do other things besides allocating, but avoiding unnecessary gen 2 heap growth in the first place also avoids unnecessary gen 2 compacting GCs which may take multiple seconds of collection time with large heaps.

Copy link
Member

@Maoni0 Maoni0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, and thanks for the perf data!

@PeterSolMS PeterSolMS merged commit 86f14a2 into dotnet:master Nov 4, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants