Feature: dynamic expansion for generic dictionaries #26262

fadimounir · 2019-08-19T20:31:32Z

These changes introduce dynamic size expansion for generic dictionary layouts when we run out of slots.
The original implementation allowed for an expansion, but using a linked list structure, which made it
impossible to use fast lookup slots once we're out of slots in the first bucket.

This new implementation allows for the usage of fast lookup slots always, for all generic lookups.

This also removes the constraint we had on R2R, where we disabled the usage of fast slots all-together.

src/inc/corinfo.h

src/vm/genericdict.h

jkotas · 2019-08-20T02:07:04Z

Do you have any performance numbers for this?

fadimounir · 2019-08-21T17:52:10Z

@jkotas I still don't have perf numbers. I'm trying to figure out how perf jobs are executed nowadays. The old links no longer work

fadimounir · 2019-08-21T18:39:47Z

cc @billwert @brianrob
We can dogfood the new perf jobs using this PR once the infra is ready :)

AndyAyersMS · 2019-08-21T18:41:18Z

Which perf jobs are you trying to run? I think a lot of our old perf infrastructure is in flux and there may be little or no CI support right now.

You should probably clone dotnet/performance and run those tests locally. I think there are both microbenchmarks and some app-level benchmarks.

cc @adamsitnik

billwert · 2019-08-21T18:44:46Z

@AndyAyersMS we have brand new infra right now that @adiaaida is working on. I guided @fadimounir to this.

AndyAyersMS · 2019-08-21T18:47:26Z

@billwert Good. Looking forward to learning more about it.

davidwrighton · 2019-08-22T17:23:39Z

I've looked through the code, and it looks generally acceptable. I think we need to have perf numbers that answer the following questions before I sign off though.

How much does this change the performance of R2R code? (Probably measure this in a test run where tiered compilation is disabled)
How much does this change the performance of code once tiering kicks in?
How much additional memory usage is this actually costing?

fadimounir · 2019-08-24T03:27:00Z

Performance numbers indicate a slowdown: https://dev.azure.com/dnceng/public/_build/results?buildId=322857&view=ms.vss-test-web.build-test-results-tab

This will need to be investigated further.

fadimounir · 2019-08-27T20:28:09Z

diff2.txt

Here is the diff of the perf run I executed locally on my machine.
This measures performance with tiered compilation enabled, and R2R code for platform/corefx assemblies (typical shipping scenario). It's not possible right now to R2R the benchmark assembly: this requires a huge amount of work, and might not be that useful given that most of the actual code we measure runs in corefx/runtime assemblies, not the actual benchmark code.

I'm not really sure why there are drastic differences (either positive or negative) for some of the benchmarks, which seem to be unrelated to my changes.

Example: System.Memory.Constructors.ArrayAsSpan, 2.07x slower, but this is weird because this instantiation shouldn't use dictionaries, with is the core of my changes. Also on manually rerun, the numbers were different and slightly in favor of my changes, showing a slight perf win.

Example: System.MathBenchmarks.Double.Asinh, 1.35x slower, but I highly doubt this benchmark uses any generics at all (it uses Math.Asinh).

@AndyAyersMS @billwert Any thoughts?

fadimounir · 2019-08-28T19:52:08Z

Here are some numbers I collected manually, using a separate and more accurate benchmark I wrote:

With Tiered Jitting

Baseline:
- IL: 2.60 seconds
- R2R: 2.38 seconds
With fix:
- IL: 1.89 seconds (27.3% faster)
- R2R: 1.70 seconds (28.6% faster)

Without Tiered Jitting

Baseline:
- IL: 2.26 seconds
- R2R: 2.70 seconds
With fix:
- IL: 1.58 seconds (29.7% faster)
- R2R: 1.73 seconds (36.1% faster)

In terms of memory used by the hashtables of type/method dependencies, for the msix WPF app there is a total of 3104 entries. At 16 bytes per entry (based on a number I got from @davidwrighton), that's a 48.5 KB memory usage. App uses about 54.1 MB of memory, so the memory used by the data structures is negligible. I didn't count the memory used by the actual dictionary slots allocation, but it should be also negligible.

I also measured the C# roslyn performance, building roslyn. Here are the numbers I got:
Average Baseline = 2.278 seconds
Average Fix = 2.107 seconds (7.5% faster)

jkotas · 2019-08-28T21:55:33Z

separate and more accurate benchmark I wrote

Could you please get the benchmark checked in to https://github.com/dotnet/performance ?

billwert · 2019-08-28T22:39:35Z

I'm digging into the noise issues that we're seeing here. It's not blocking this at this point, so I'll get to it next week after I'm OOF.

fadimounir · 2019-08-28T23:12:56Z

Could you please get the benchmark checked in to https://github.com/dotnet/performance ?

Done (dotnet/performance#836)

davidwrighton

Still a few threading issues I found. The FlushProcessWriteBuffers call is in a slightly wrong spot.

src/vm/genericdict.h

src/vm/genericdict.cpp

fadimounir · 2019-08-30T06:20:38Z

/azp run coreclr-ci

azure-pipelines · 2019-08-30T06:20:51Z

Azure Pipelines successfully started running 1 pipeline(s).

AndyAyersMS · 2019-09-03T17:56:43Z

@fadimounir can you also look at and summarize jit codegen diffs (via jit-diff)? We should see a number of methods with diffs where we're no longer calling back into the runtime as there are enough fast slots to cover all the uses in the method.

Somebody on @dotnet/jit-contrib can help you if you're not familiar with how to do this.

fadimounir · 2019-09-19T00:53:18Z

Added an extra slot in generic dictionaries to store the size of a dictionary. This was needed to fix a race condition between the type loader and the dictionary expansion code.
David and I had a good offline discussion about this idea.

In terms of memory usage, testing with the MSIX catalog wpf app, I could not see any meaningful difference between baseline, and with all of the changes in this PR, including the extra dictionary slot.

src/vm/genericdict.cpp

…pansion code

The main problem was that we were publishing InstantiatedMethodDescs before recording them for dictionary expansions, making it possible for other threads to use old dictionary data with expanded slots, and therefore reading incorrect memory locations Fixes include: 1) Recording newly created InstantiatedMethodDescs for dictionary expansion before publishing them 2) Not adding multiple instances of the same method to the expansion hashtable 3) Use FastInterlockedExchange for dictionary pointer updates 4) Fixes around the "pAltMD == pRet" assert: use GetExistingWrappedMethodDesc instead of GetWrappedMethodDesc

…s set. This fixes a race condition found with the final level of type loading, which does not use the typeloader lock used by the other load levels. Added some debug-only checks

asm formatting

Note on old dictionaries not getting deallocated

davidwrighton

I'm excited for this. This should be a nice performance win for some of our customers, and I'm glad to see this finally reach a good quality bar.

Documentation/botr/shared-generics.md

Note on thread synchronization

Documentation/botr/shared-generics.md

Feedback from Jan

maryamariyan · 2019-11-06T21:04:17Z

Thank you for your contribution. As announced in dotnet/coreclr#27549 this repository will be moving to dotnet/runtime on November 13. If you would like to continue working on this PR after this date, the easiest way to move the change to dotnet/runtime is:

In your coreclr repository clone, create patch by running git format-patch origin
In your runtime repository clone, apply the patch by running git apply --directory src/coreclr <path to the patch created in step 1>

This reverts commit d840c75.

fadimounir requested review from jkotas and davidwrighton August 19, 2019 20:31

jkotas reviewed Aug 20, 2019

View reviewed changes

src/inc/corinfo.h Outdated Show resolved Hide resolved

jkotas reviewed Aug 20, 2019

View reviewed changes

src/vm/genericdict.h Outdated Show resolved Hide resolved

fadimounir added the * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) label Aug 24, 2019

fadimounir force-pushed the MakeDictLayoutDynamic branch from fd724ba to d54026d Compare August 28, 2019 20:01

fadimounir removed the * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) label Aug 28, 2019

jkotas closed this Aug 28, 2019

jkotas reopened this Aug 28, 2019

davidwrighton suggested changes Aug 29, 2019

View reviewed changes

src/vm/genericdict.h Outdated Show resolved Hide resolved

src/vm/genericdict.cpp Outdated Show resolved Hide resolved

src/vm/genericdict.cpp Outdated Show resolved Hide resolved

fadimounir force-pushed the MakeDictLayoutDynamic branch from 3a6c486 to 81fe998 Compare September 4, 2019 20:07

fadimounir added the * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) label Sep 6, 2019

jkotas reviewed Sep 19, 2019

View reviewed changes

src/vm/genericdict.cpp Outdated Show resolved Hide resolved

Fadi Hanna added 9 commits November 4, 2019 11:39

CR feedback from David

0e1c6c8

Fixing a race condition between type loader and generic dictionary ex…

c0eb536

…pansion code

Code review feedback changes

90e2669

Add debug slot with pointer to old dictionary

cc30cf1

Move type recording to just before the CLASS_LOAD_EXACTPARENTS flag i…

59ea03e

…s set. This fixes a race condition found with the final level of type loading, which does not use the typeloader lock used by the other load levels. Added some debug-only checks

Diagnostic pointer to dynamic dictionary precessor.

cca6f1b

Fix merge issues

1028e72

Adding a chapter in the BOTR describing generic dictionaries

78c5b44

fadimounir force-pushed the MakeDictLayoutDynamic branch from 3f8fcbf to 78c5b44 Compare November 4, 2019 19:47

Fadi Hanna added 2 commits November 4, 2019 11:50

Update shared-generics.md

e06136e

asm formatting

Update shared-generics.md

cd7a18f

Note on old dictionaries not getting deallocated

fadimounir requested a review from davidwrighton November 5, 2019 00:28

davidwrighton approved these changes Nov 5, 2019

View reviewed changes

sdmaclea reviewed Nov 5, 2019

View reviewed changes

Documentation/botr/shared-generics.md Show resolved Hide resolved

jkotas reviewed Nov 5, 2019

View reviewed changes

Documentation/botr/shared-generics.md Show resolved Hide resolved

jkotas reviewed Nov 5, 2019

View reviewed changes

Documentation/botr/shared-generics.md Outdated Show resolved Hide resolved

Update shared-generics.md

1e0439d

Note on thread synchronization

jkotas reviewed Nov 5, 2019

View reviewed changes

Documentation/botr/shared-generics.md Outdated Show resolved Hide resolved

jkotas reviewed Nov 5, 2019

View reviewed changes

Documentation/botr/shared-generics.md Outdated Show resolved Hide resolved

Update shared-generics.md

11965aa

Feedback from Jan

fadimounir merged commit d840c75 into dotnet:master Nov 6, 2019

jkotas mentioned this pull request Nov 9, 2019

[master] Update dependencies from dotnet/coreclr dotnet/corefx#42443

Merged

stephentoub added a commit that referenced this pull request Nov 9, 2019

Revert "Feature: dynamic expansion for generic dictionaries (#26262)"

853d7de

This reverts commit d840c75.

stephentoub mentioned this pull request Nov 9, 2019

Revert "Feature: dynamic expansion for generic dictionaries" #27786

Merged

stephentoub added a commit that referenced this pull request Nov 9, 2019

Revert "Feature: dynamic expansion for generic dictionaries (#26262)"

ab8467a

This reverts commit d840c75.

alpencolt mentioned this pull request Jan 31, 2020

[ARM/Linux] Various build crashes after #26262 dotnet/runtime#13765

Closed

fadimounir mentioned this pull request Feb 14, 2020

Dynamic generic dictionary expansion feature dotnet/runtime#32270

Merged

1 task

jkotas mentioned this pull request Mar 8, 2023

[JIT] Add support to inline the field access of primitive types marked with TLS dotnet/runtime#82973

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: dynamic expansion for generic dictionaries #26262

Feature: dynamic expansion for generic dictionaries #26262

fadimounir commented Aug 19, 2019

jkotas commented Aug 20, 2019

fadimounir commented Aug 21, 2019

fadimounir commented Aug 21, 2019

AndyAyersMS commented Aug 21, 2019

billwert commented Aug 21, 2019

AndyAyersMS commented Aug 21, 2019

davidwrighton commented Aug 22, 2019

fadimounir commented Aug 24, 2019

fadimounir commented Aug 27, 2019

fadimounir commented Aug 28, 2019 •

edited

Loading

jkotas commented Aug 28, 2019

billwert commented Aug 28, 2019

fadimounir commented Aug 28, 2019 •

edited

Loading

davidwrighton left a comment

fadimounir commented Aug 30, 2019

azure-pipelines bot commented Aug 30, 2019

AndyAyersMS commented Sep 3, 2019

fadimounir commented Sep 19, 2019

davidwrighton left a comment

maryamariyan commented Nov 6, 2019

Feature: dynamic expansion for generic dictionaries #26262

Feature: dynamic expansion for generic dictionaries #26262

Conversation

fadimounir commented Aug 19, 2019

jkotas commented Aug 20, 2019

fadimounir commented Aug 21, 2019

fadimounir commented Aug 21, 2019

AndyAyersMS commented Aug 21, 2019

billwert commented Aug 21, 2019

AndyAyersMS commented Aug 21, 2019

davidwrighton commented Aug 22, 2019

fadimounir commented Aug 24, 2019

fadimounir commented Aug 27, 2019

fadimounir commented Aug 28, 2019 • edited Loading

With Tiered Jitting

Without Tiered Jitting

jkotas commented Aug 28, 2019

billwert commented Aug 28, 2019

fadimounir commented Aug 28, 2019 • edited Loading

davidwrighton left a comment

Choose a reason for hiding this comment

fadimounir commented Aug 30, 2019

azure-pipelines bot commented Aug 30, 2019

AndyAyersMS commented Sep 3, 2019

fadimounir commented Sep 19, 2019

davidwrighton left a comment

Choose a reason for hiding this comment

maryamariyan commented Nov 6, 2019

fadimounir commented Aug 28, 2019 •

edited

Loading

fadimounir commented Aug 28, 2019 •

edited

Loading