Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Feature: dynamic expansion for generic dictionaries #26262

Merged
merged 21 commits into from
Nov 6, 2019

Conversation

fadimounir
Copy link

These changes introduce dynamic size expansion for generic dictionary layouts when we run out of slots.
The original implementation allowed for an expansion, but using a linked list structure, which made it
impossible to use fast lookup slots once we're out of slots in the first bucket.

This new implementation allows for the usage of fast lookup slots always, for all generic lookups.

This also removes the constraint we had on R2R, where we disabled the usage of fast slots all-together.

src/inc/corinfo.h Outdated Show resolved Hide resolved
src/vm/genericdict.h Outdated Show resolved Hide resolved
@jkotas
Copy link
Member

jkotas commented Aug 20, 2019

Do you have any performance numbers for this?

@fadimounir
Copy link
Author

@jkotas I still don't have perf numbers. I'm trying to figure out how perf jobs are executed nowadays. The old links no longer work

@fadimounir
Copy link
Author

cc @billwert @brianrob
We can dogfood the new perf jobs using this PR once the infra is ready :)

@AndyAyersMS
Copy link
Member

Which perf jobs are you trying to run? I think a lot of our old perf infrastructure is in flux and there may be little or no CI support right now.

You should probably clone dotnet/performance and run those tests locally. I think there are both microbenchmarks and some app-level benchmarks.

cc @adamsitnik

@billwert
Copy link
Member

@AndyAyersMS we have brand new infra right now that @adiaaida is working on. I guided @fadimounir to this.

@AndyAyersMS
Copy link
Member

@billwert Good. Looking forward to learning more about it.

@davidwrighton
Copy link
Member

I've looked through the code, and it looks generally acceptable. I think we need to have perf numbers that answer the following questions before I sign off though.

  1. How much does this change the performance of R2R code? (Probably measure this in a test run where tiered compilation is disabled)
  2. How much does this change the performance of code once tiering kicks in?
  3. How much additional memory usage is this actually costing?

@fadimounir
Copy link
Author

Performance numbers indicate a slowdown: https://dev.azure.com/dnceng/public/_build/results?buildId=322857&view=ms.vss-test-web.build-test-results-tab

This will need to be investigated further.

@fadimounir fadimounir added the * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) label Aug 24, 2019
@fadimounir
Copy link
Author

diff2.txt

Here is the diff of the perf run I executed locally on my machine.
This measures performance with tiered compilation enabled, and R2R code for platform/corefx assemblies (typical shipping scenario). It's not possible right now to R2R the benchmark assembly: this requires a huge amount of work, and might not be that useful given that most of the actual code we measure runs in corefx/runtime assemblies, not the actual benchmark code.

I'm not really sure why there are drastic differences (either positive or negative) for some of the benchmarks, which seem to be unrelated to my changes.

Example: System.Memory.Constructors.ArrayAsSpan, 2.07x slower, but this is weird because this instantiation shouldn't use dictionaries, with is the core of my changes. Also on manually rerun, the numbers were different and slightly in favor of my changes, showing a slight perf win.

Example: System.MathBenchmarks.Double.Asinh, 1.35x slower, but I highly doubt this benchmark uses any generics at all (it uses Math.Asinh).

@AndyAyersMS @billwert Any thoughts?

@fadimounir
Copy link
Author

fadimounir commented Aug 28, 2019

Here are some numbers I collected manually, using a separate and more accurate benchmark I wrote:

With Tiered Jitting

  • Baseline:
    • IL: 2.60 seconds
    • R2R: 2.38 seconds
  • With fix:
    • IL: 1.89 seconds (27.3% faster)
    • R2R: 1.70 seconds (28.6% faster)

Without Tiered Jitting

  • Baseline:
    • IL: 2.26 seconds
    • R2R: 2.70 seconds
  • With fix:
    • IL: 1.58 seconds (29.7% faster)
    • R2R: 1.73 seconds (36.1% faster)

In terms of memory used by the hashtables of type/method dependencies, for the msix WPF app there is a total of 3104 entries. At 16 bytes per entry (based on a number I got from @davidwrighton), that's a 48.5 KB memory usage. App uses about 54.1 MB of memory, so the memory used by the data structures is negligible. I didn't count the memory used by the actual dictionary slots allocation, but it should be also negligible.

I also measured the C# roslyn performance, building roslyn. Here are the numbers I got:
Average Baseline = 2.278 seconds
Average Fix = 2.107 seconds (7.5% faster)

@fadimounir fadimounir force-pushed the MakeDictLayoutDynamic branch from fd724ba to d54026d Compare August 28, 2019 20:01
@fadimounir fadimounir removed the * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) label Aug 28, 2019
@jkotas
Copy link
Member

jkotas commented Aug 28, 2019

separate and more accurate benchmark I wrote

Could you please get the benchmark checked in to https://github.com/dotnet/performance ?

@jkotas jkotas closed this Aug 28, 2019
@jkotas jkotas reopened this Aug 28, 2019
@billwert
Copy link
Member

I'm digging into the noise issues that we're seeing here. It's not blocking this at this point, so I'll get to it next week after I'm OOF.

@fadimounir
Copy link
Author

fadimounir commented Aug 28, 2019

Could you please get the benchmark checked in to https://github.com/dotnet/performance ?

Done (dotnet/performance#836)

Copy link
Member

@davidwrighton davidwrighton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still a few threading issues I found. The FlushProcessWriteBuffers call is in a slightly wrong spot.

src/vm/genericdict.h Outdated Show resolved Hide resolved
src/vm/genericdict.cpp Outdated Show resolved Hide resolved
src/vm/genericdict.cpp Outdated Show resolved Hide resolved
@fadimounir
Copy link
Author

/azp run coreclr-ci

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@AndyAyersMS
Copy link
Member

@fadimounir can you also look at and summarize jit codegen diffs (via jit-diff)? We should see a number of methods with diffs where we're no longer calling back into the runtime as there are enough fast slots to cover all the uses in the method.

Somebody on @dotnet/jit-contrib can help you if you're not familiar with how to do this.

@fadimounir fadimounir force-pushed the MakeDictLayoutDynamic branch from 3a6c486 to 81fe998 Compare September 4, 2019 20:07
@fadimounir fadimounir added the * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) label Sep 6, 2019
@fadimounir
Copy link
Author

Added an extra slot in generic dictionaries to store the size of a dictionary. This was needed to fix a race condition between the type loader and the dictionary expansion code.
David and I had a good offline discussion about this idea.

In terms of memory usage, testing with the MSIX catalog wpf app, I could not see any meaningful difference between baseline, and with all of the changes in this PR, including the extra dictionary slot.

src/vm/genericdict.cpp Outdated Show resolved Hide resolved
Fadi Hanna added 9 commits November 4, 2019 11:39
The main problem was that we were publishing InstantiatedMethodDescs before recording them for dictionary expansions, making it possible for other threads to use old dictionary data with expanded slots, and therefore reading incorrect memory locations

Fixes include:
1) Recording newly created InstantiatedMethodDescs for dictionary expansion before publishing them
2) Not adding multiple instances of the same method to the expansion hashtable
3) Use FastInterlockedExchange for dictionary pointer updates
4) Fixes around the "pAltMD == pRet" assert: use GetExistingWrappedMethodDesc instead of GetWrappedMethodDesc
…s set.

This fixes a race condition found with the final level of type loading, which does not use the typeloader lock used by the other load levels.
Added some debug-only checks
@fadimounir fadimounir force-pushed the MakeDictLayoutDynamic branch from 3f8fcbf to 78c5b44 Compare November 4, 2019 19:47
Fadi Hanna added 2 commits November 4, 2019 11:50
asm formatting
Note on old dictionaries not getting deallocated
Copy link
Member

@davidwrighton davidwrighton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm excited for this. This should be a nice performance win for some of our customers, and I'm glad to see this finally reach a good quality bar.

Note on thread synchronization
Feedback from Jan
@maryamariyan
Copy link
Member

Thank you for your contribution. As announced in dotnet/coreclr#27549 this repository will be moving to dotnet/runtime on November 13. If you would like to continue working on this PR after this date, the easiest way to move the change to dotnet/runtime is:

  1. In your coreclr repository clone, create patch by running git format-patch origin
  2. In your runtime repository clone, apply the patch by running git apply --directory src/coreclr <path to the patch created in step 1>

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants