-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Feature: dynamic expansion for generic dictionaries #26262
Conversation
Do you have any performance numbers for this? |
@jkotas I still don't have perf numbers. I'm trying to figure out how perf jobs are executed nowadays. The old links no longer work |
Which perf jobs are you trying to run? I think a lot of our old perf infrastructure is in flux and there may be little or no CI support right now. You should probably clone dotnet/performance and run those tests locally. I think there are both microbenchmarks and some app-level benchmarks. cc @adamsitnik |
@AndyAyersMS we have brand new infra right now that @adiaaida is working on. I guided @fadimounir to this. |
@billwert Good. Looking forward to learning more about it. |
I've looked through the code, and it looks generally acceptable. I think we need to have perf numbers that answer the following questions before I sign off though.
|
Performance numbers indicate a slowdown: https://dev.azure.com/dnceng/public/_build/results?buildId=322857&view=ms.vss-test-web.build-test-results-tab This will need to be investigated further. |
Here is the diff of the perf run I executed locally on my machine. I'm not really sure why there are drastic differences (either positive or negative) for some of the benchmarks, which seem to be unrelated to my changes. Example: System.Memory.Constructors.ArrayAsSpan, 2.07x slower, but this is weird because this instantiation shouldn't use dictionaries, with is the core of my changes. Also on manually rerun, the numbers were different and slightly in favor of my changes, showing a slight perf win. Example: System.MathBenchmarks.Double.Asinh, 1.35x slower, but I highly doubt this benchmark uses any generics at all (it uses Math.Asinh). @AndyAyersMS @billwert Any thoughts? |
Here are some numbers I collected manually, using a separate and more accurate benchmark I wrote: With Tiered Jitting
Without Tiered Jitting
In terms of memory used by the hashtables of type/method dependencies, for the msix WPF app there is a total of 3104 entries. At 16 bytes per entry (based on a number I got from @davidwrighton), that's a 48.5 KB memory usage. App uses about 54.1 MB of memory, so the memory used by the data structures is negligible. I didn't count the memory used by the actual dictionary slots allocation, but it should be also negligible. I also measured the C# roslyn performance, building roslyn. Here are the numbers I got: |
fd724ba
to
d54026d
Compare
Could you please get the benchmark checked in to https://github.com/dotnet/performance ? |
I'm digging into the noise issues that we're seeing here. It's not blocking this at this point, so I'll get to it next week after I'm OOF. |
Done (dotnet/performance#836) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still a few threading issues I found. The FlushProcessWriteBuffers call is in a slightly wrong spot.
/azp run coreclr-ci |
Azure Pipelines successfully started running 1 pipeline(s). |
@fadimounir can you also look at and summarize jit codegen diffs (via jit-diff)? We should see a number of methods with diffs where we're no longer calling back into the runtime as there are enough fast slots to cover all the uses in the method. Somebody on @dotnet/jit-contrib can help you if you're not familiar with how to do this. |
3a6c486
to
81fe998
Compare
Added an extra slot in generic dictionaries to store the size of a dictionary. This was needed to fix a race condition between the type loader and the dictionary expansion code. In terms of memory usage, testing with the MSIX catalog wpf app, I could not see any meaningful difference between baseline, and with all of the changes in this PR, including the extra dictionary slot. |
The main problem was that we were publishing InstantiatedMethodDescs before recording them for dictionary expansions, making it possible for other threads to use old dictionary data with expanded slots, and therefore reading incorrect memory locations Fixes include: 1) Recording newly created InstantiatedMethodDescs for dictionary expansion before publishing them 2) Not adding multiple instances of the same method to the expansion hashtable 3) Use FastInterlockedExchange for dictionary pointer updates 4) Fixes around the "pAltMD == pRet" assert: use GetExistingWrappedMethodDesc instead of GetWrappedMethodDesc
…s set. This fixes a race condition found with the final level of type loading, which does not use the typeloader lock used by the other load levels. Added some debug-only checks
3f8fcbf
to
78c5b44
Compare
asm formatting
Note on old dictionaries not getting deallocated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm excited for this. This should be a nice performance win for some of our customers, and I'm glad to see this finally reach a good quality bar.
Note on thread synchronization
Feedback from Jan
Thank you for your contribution. As announced in dotnet/coreclr#27549 this repository will be moving to dotnet/runtime on November 13. If you would like to continue working on this PR after this date, the easiest way to move the change to dotnet/runtime is:
|
This reverts commit d840c75.
This reverts commit d840c75.
These changes introduce dynamic size expansion for generic dictionary layouts when we run out of slots.
The original implementation allowed for an expansion, but using a linked list structure, which made it
impossible to use fast lookup slots once we're out of slots in the first bucket.
This new implementation allows for the usage of fast lookup slots always, for all generic lookups.
This also removes the constraint we had on R2R, where we disabled the usage of fast slots all-together.