Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

5x performance regression from 3.1 to 5.0 due to generic dictionary CSE #40298

Closed
jkotas opened this issue Aug 4, 2020 · 8 comments · Fixed by #40355
Closed

5x performance regression from 3.1 to 5.0 due to generic dictionary CSE #40298

jkotas opened this issue Aug 4, 2020 · 8 comments · Fixed by #40355
Assignees
Labels
area-VM-coreclr tenet-performance Performance related issue
Milestone

Comments

@jkotas
Copy link
Member

jkotas commented Aug 4, 2020

See repro at #38660 (comment)

This is regression introduced by the growable generic dictionary feature (#32270). The lazy initialization of the generic dictionary can reallocate it and the JITed code may be running with the stale copy of the dictionary that is never going to be initialized.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Collections untriaged New issue has not been triaged by the area owner labels Aug 4, 2020
@ghost
Copy link

ghost commented Aug 4, 2020

Tagging subscribers to this area: @eiriktsarpalis
See info in area-owners.md if you want to be subscribed.

@jkotas
Copy link
Member Author

jkotas commented Aug 4, 2020

We may need to make sure that the stale dictionary copies always get initialized as well to fix this.

@jkotas jkotas added this to the 5.0.0 milestone Aug 4, 2020
@mangod9 mangod9 removed the untriaged New issue has not been triaged by the area owner label Aug 4, 2020
@trylek
Copy link
Member

trylek commented Aug 4, 2020

I have made a first pass over Sergey's and Fadi's changes and over Fadi's description of the dictionary expansion algorithm. @jkotas, am I right to understand your suggestion so that, whenever we expand a given dictionary, we should retain something like a pointer to its previous (smaller) version so that, upon subsequent slot resolution we should back-propagate the resolved function pointer to all earlier versions of the dictionary?

@jkotas
Copy link
Member Author

jkotas commented Aug 4, 2020

Yes, I think that's the best way to fix it.

@mangod9
Copy link
Member

mangod9 commented Aug 4, 2020

I am guessing we need to add a perf benchmark here to ensure this doesnt regress again, unless System.Collections already has one for this scenario?

@jkotas
Copy link
Member Author

jkotas commented Aug 4, 2020

System.Collections already has one for this scenario?

I am pretty sure that it does not. This is startup regression. All our feature-specific benchmarks are testing steady state throughput.

Similar regression can be be hit by any generic method with high-enough complexity. The pattern that triggered it in the original repro was use of Linq that is known to produce complex generics-heavy code.

@trylek
Copy link
Member

trylek commented Aug 5, 2020

@jkotas - I have stood up an initial implementation of the linked lists as discussed above, no visible perf change. I subsequently added some counters and a ring buffer only to find out that, during the repro test mentioned on top of this issue, we keep calling PopulateEntry for the same generic method dictionary slot about 11M times:

m_pszDebugMethodName: Test.TestDict[System.String](System.Collections.Generic.IDictionary`2<System.String,Boolean>, System.Func`2<Int32,System.String>)

We do this on the following call stack:

>	coreclr.dll!JIT_GenericHandleWorker(MethodDesc * pMD, MethodTable * pMT, void * signature, unsigned long dictionaryIndexAndSlot, Module * pModule) Line 3322	C++
 	coreclr.dll!JIT_GenericHandle_Framed(CORINFO_CLASS_STRUCT_ * classHnd, CORINFO_METHOD_STRUCT_ * methodHnd, void * signature, unsigned long dictionaryIndexAndSlot, CORINFO_MODULE_STRUCT_ * moduleHnd) Line 3350	C++
 	coreclr.dll!JIT_GenericHandleMethod(CORINFO_METHOD_STRUCT_ * methodHnd, void * signature) Line 3379	C++
 	00007ff7c781a8c9()	Unknown
 	00007ff7c7819087()	Unknown
 	coreclr.dll!CallDescrWorkerInternal() Line 100	Unknown
 	coreclr.dll!MethodDescCallSite::CallTargetWorker(const unsigned __int64 * pArguments, unsigned __int64 * pReturnValue, int cbReturnValue) Line 552	C++
 	[Inline Frame] coreclr.dll!MethodDescCallSite::Call_RetArgSlot(const unsigned __int64 *) Line 458	C++
 	coreclr.dll!RunMainInternal(Param * pParam) Line 1455	C++

Even though we update the generic dictionary slot in PopulateEntry, we always end in the method again. As you're much more familiar with this logic than I am, I'd be grateful for any suggestions how to continue the investigation to get to the bottom of this - where should we pick up the dictionary slot value instead of trying to resolve it again? At one point I was speculating whether tiered compilation may not be affecting this in some manner but I received exactly the same results after turning it off.

Thanks a lot

Tomas

@jkotas
Copy link
Member Author

jkotas commented Aug 5, 2020

Could you please share your current delta?

This is the key piece of code from the TestDict method:

rcx points to the old dictionary here. [rcx+38h] stays null and we keep calling the JIT_GenericHandleMethod helper. This old
dictionary needs to be in the link list and that's how [rcx+38h] will get updated to be non-null.

00007ffc`b9f37ebc 4c8b5938        mov     r11,qword ptr [rcx+38h]
00007ffc`b9f37ec0 4d85db          test    r11,r11
00007ffc`b9f37ec3 7402            je      repro!Test.TestDict[[System.__Canon, System.Private.CoreLib]](System.Collections.Generic.IDictionary`2<System.__Canon,Boolean>, System.Func`2<Int32,System.__Canon>)+0x237 (00007ffc`b9f37ec7)
00007ffc`b9f37ec5 eb15            jmp     repro!Test.TestDict[[System.__Canon, System.Private.CoreLib]](System.Collections.Generic.IDictionary`2<System.__Canon,Boolean>, System.Func`2<Int32,System.__Canon>)+0x24c (00007ffc`b9f37edc)
00007ffc`b9f37ec7 488bcf          mov     rcx,rdi
00007ffc`b9f37eca 48ba587002bafc7f0000 mov rdx,7FFCBA027058h
00007ffc`b9f37ed4 e8f750a15f      call    coreclr!JIT_GenericHandleMethod (00007ffd`1994cfd0)

@ghost ghost locked as resolved and limited conversation to collaborators Dec 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-VM-coreclr tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants