-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Patch vtable slots and similar when tiering is enabled #21292
Conversation
Do you have any numbers for the costs (memory, time) of this extra bookkeeping? |
For perf, here is what I have so far. I'm in the process of gathering data for some more tests, and more samples for some cases to weed out error, I'll update inline once I have more data. Startup perf is a bit lower in some JitBench tests (measurably but slightly), and same as before in TechEmpower ASP.NET tests. Steady-state perf is better in several TechEmpower ASP.NET tests. For memory, I took a count of number of methods eligible for vtable slot backpatching and in JitBench tests it's approximately 1/3 of methods eligible for tiering. Typically, there are no slots recorded for a method eligible for vtable slot backpatching. In the same tests, cases where there is a slot recorded, there is typically only one or two. From that high level info I figured the memory impact would be minimal and scales proportionally with number of virtual methods. I can gather more specific data if you would like, let me know. JitBench - Before
JitBench - After
TechEmpower ASP.NET
|
Most of the slight startup time degradation I believe is from finding and dealing with |
799d319
to
c294ed4
Compare
Updated commit description |
Some concrete data on memory impact is below. I have not accounted for unused space in arrays/hashtables and with the numbers below it looks like the amount of unused space would be small.
Number of slots recorded per called tiered vtable method is low. The MethodDescVirtualInfo itself appears to be more significant by byte count than the number of slots recorded in it. There may be possibilities to tune for 1 and 2 slots without as much overhead. From below though, the overhead looks fairly low to me. JitBench MusicStore
JitBench AllReady
JitBench Word2Vec
JitBench CscBuildRoslynSource (repeated in same process many times)
JitBench CscBuildHelloWorld (repeated in same process many times)
Json Windows KestrelSockets
MvcJil Windows KestrelSockets
MvcJson Windows KestrelSockets
MvcPlaintext Windows KestrelSockets
Plaintext Windows KestrelSockets
ResponseCachingPlaintextCached Windows KestrelSockets
ResponseCachingPlaintextCachedDelete Windows KestrelSockets
ResponseCachingPlaintextVaryByCached Windows KestrelSockets
StaticFiles Windows KestrelSockets
|
Could you write up a description of the various stages a vtable entry goes through, starting from initial, which points to prestub, to the end case where its finally entirely using a tier 2 pointer? |
c294ed4
to
823cf1a
Compare
Updated perf numbers and added count of new |
Why is all this infrastructure labeled with Virtual this and VTable that? I don't see any fundamental reason why this stuff should be tied to virtual functions. It seems to me that this would also be useful for other places where we hold a function pointer, that ideally would just point to the target method address. For instance, in R2R code all calls to other methods go through an indirection cell, and this concept of a backpatchable address could be used to improve the performance of those calls for applications which are relatively short lived. Also, I'm in the process of building out infrastructure to support making interface dispatch calls to interfaces which only have 1 implementation. This would involve placing code pointers directly into VSD indirection cells. Interestingly enough, this exactly matches the logic here, except that those cells need to be able to be updated to not be target address pointers. It feels like that would be a straightforward addition to this logic, but I'd love to hear your opinion. |
Finally, I've been looking at this code for a few days, and I've concluded that the memory management is correct, and to the best of my ability to code review the implementation is correct, but I'm uneasy with the way that cross loader allocator data is handled.
Overall, I believe the current state is acceptable for checkin, but I would like to see the data structure management turn into a generic service of the LoaderAllocator concept in the next few months. |
Thanks for the feedback. You're right, the actual storage doesn't need to be specific to virtualness. I'll go ahead and remove association with virtualness for the storage portion. As discussed offline, the LoaderAllocatorSet can be eliminated in favor of using a multi-hash instead and that could simplify some of the data structures and save some space. I'll look into how easy it would be to make that change as well, so that it would be easy to move to the generic service in the future. |
cad50f4
to
03d38f9
Compare
Rebased, last three commits are new. After discussing further offline, I kept the storage mechanism as it was, and did some cleanup to make it easier to move to a new storage mechanism in the future and to use the same backpatching logic for other purposes. |
@dotnet-bot test this please |
@davidwrighton, would you be able to make a pass over the latest commits (starting from the one titled "Rename VirtualInfo -> BackpatchInfo")? I'd like to get this into 3.0 preview 2 (I'm told that would be around mid Jan) with some bake time to flush things out. |
@davidwrighton I see you're on vacation, this can wait, have a good time! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry its taken me so long to get to this Kount with my December vacation, but I spend a decent chunk of time this weekend to get this cleared away. I don't have the expertise in VSD/MethodTable construction that I suspect David or Jan might have, but from what I can tell looks quite solid. Most of my comments wound up being cosmetic around naming, factoring, and commenting suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6ee2c0a
to
fb3fb5f
Compare
@@ -880,10 +880,10 @@ EXTERN_C PCODE VirtualMethodFixupWorker(TransitionBlock * pTransitionBlock, CORC | |||
INSTALL_MANAGED_EXCEPTION_DISPATCHER; | |||
INSTALL_UNWIND_AND_CONTINUE_HANDLER_NO_PROBE; | |||
|
|||
if (pMD->IsTieredVtableMethod()) | |||
if (pMD->IsVersionableWithVtableSlotBackpatch()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It still seems a little odd to me to refer to vtable slots specifically when we actually patch a variety of different slots. However given that the comments accurately describe all the slots that are handled I think its a minor concern and I can be fine with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I went back and forth on that. The VSD slots that are recorded are sort of extensions of vtable slots, in the sense they copy the value of a vtable slot and so they must be backpatched whenever the source vtable slot changes. They are not actually vtable slots but slots that cache vtable slot values, I thought the naming was close enough. Another reason to include "vtable" or something similar in the name is because the versioning scheme is specific to methods that have a vtable slot, which would have different restrictions (like not having a precode or stable entry point for example) than another versioning scheme that may apply to non-vtable slots or caller slots.
|
||
MethodDescBackpatchInfo *backpatchInfo = | ||
mdLoaderAllocator->GetMethodDescBackpatchInfoTracker()->GetOrAddBackpatchInfo_Locked(this); | ||
if (slotLoaderAllocator == mdLoaderAllocator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above only a suggestion for the future (I unresolved to ensure visibility, not sure what happens if i respond to a resovled conversation)
I'm not sure when it would be a common case.
The case I am refering to is module A.dll with method A in it calls module B.dll with method B in it. Assuming neither module is collectible they should both have AppDomain lifetime. This should make it OK to store A.dll's slot in method B's backpatch info. If our customers don't go crazy with collectible assemblies this should be true for most cross-module calls.
SetEntryPointToBackpatch_Locked(entryPoint, isPrestubEntryPoint); | ||
} | ||
|
||
void MethodDesc::SetCodeEntryPoint(PCODE entryPoint) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I don't want to rathole on it and I'm fine if you want to check it in as-is, I do think there is some method to madness (no pun intended?) in the naming and this new method isn't following convention as I understand it. Let me explain my understanding of it, then I'm happy to chat about it offline if you want to discuss other ideas in the future or let it lie if not.
In the context of MethodDesc the runtime has taken somewhat generic terms CS terms 'entry point' and 'native code' and turned them into proper nouns 'EntryPoint' and 'NativeCode' that have specific meanings that are much narrower than the generic CS terms might suggest. Some addresses only meet the criteria of an EntryPoint, others only meet the criteria of 'NativeCode' and some addresses meet the criteria of both which allows them to be refered to as either term.
I think of 'EntryPoint's as the values returned by GetMethodEntryPoint()/GetTemporaryEntryPoint()/GetStableEntryPoint() and they have properties:
- It is can be called with the managed ABI, as defined by the MethodDesc's signature
- All callable methods have at least one such address
- Invoking it always executes the current version of the method
- It can be stored in the method's slot
I think of 'NativeCode' as the value returned by GetNativeCode() and it has properties:
- It is always the direct address of jitted (or prejitted code), never any other code generator.
- It is generally the jitted code that implements the default version of the method, however EnC is a special case (which I hope to eliminate in the future) that uses NativeCode as the jitted code for the most recent version of the method.
- NativeCode is the only thing that is permitted to be stored in the NativeCodeSlot, but it does not have to be stored in a NativeCodeSlot if it is possible to compute it from other fields.
Using those criteria, most of the names in the MethodDesc code make sense to me. Some examples:
- When a non-versionable method is jitted, the jitted code qualifies as both an 'EntryPoint' and 'NativeCode'.
- The start address of a Precode qualifies as an EntryPoint but it isn't NativeCode. The target of the Precode might be NativeCode, but it is never that method's EntryPoint.
- For a method implemented by a stub, the target of the Precode is neither an EntryPoint, nor NativeCode.
- For versionable code with JumpStamp, the NativeCode is also the EntryPoint. The jitted code for other versions is neither the EntryPoint nor the NativeCode.
When I look at SetCodeEntryPoint it doesn't seem to be consistent with those patterns (and on further inspection neither was my suggested name PublishNativeCode so I take back that suggestion). By default I assumed 'EntryPoint' in that name carries the same meaning as EntryPoint elsewhere but it doesn't match up because the values in the 2nd and 3rd scopes of the if/else are not EntryPoint values. Right now my best attempt to make it fit my pre-existing understanding is to define 'CodeEntryPoint' as a new term used only by that method which has a meaning that is distinct from both 'NativeCode' and 'EntryPoint'. I hope this makes the challenge a little more apparent.
For a method eligible for tiered compilation and vtable slot backpatching: - The entry point to the final code is versionable, as for any method eligible for tiering - It does not have a precode (`HasPrecode()` returns false) - It does not have a stable entry point (`HasStableEntryPoint()` returns false) - A call to the method may be: - An indirect call through the `MethodTable`'s backpatchable vtable slot - A direct call to a backpatchable `FuncPtrStub`, perhaps through a `JumpStub` - For interface methods, an indirect call through the virtual stub dispatch (VSD) indirection cell to a backpatchable `DispatchStub` or a `ResolveStub` that refers to a backpatchable `ResolveCacheEntry` - The purpose is that typical calls to the method have no additional overhead when tiering is enabled Recording and backpatching slots: - In order for all vtable slots for the method to be backpatchable: - A vtable slot initially points to the `MethodDesc`'s temporary entry point, even when the method is inherited by a derived type (the slot's value is not copied from the parent) - The temporary entry point always points to the prestub and is never backpatched, in order to be able to discover new vtable slots through which the method may be called - The prestub, as part of `DoBackpatch()`, records any slots that are transitioned from the temporary entry point to the method's at-the-time current, non-prestub entry point - Any further changes to the method's entry point cause recorded slots to be backpatched in `BackpatchEntryPointSlots()` - In order for the `FuncPtrStub` to be backpatchable: - After the `FuncPtrStub` is created and exposed, it is patched to point to the method's at-the-time current entry point if necessary - Any further changes to the method's entry point cause the `FuncPtrStub` to be backpatched in `BackpatchEntryPointSlots()` - In order for VSD entities to be backpatchable: - A `DispatchStub`'s entry point target is aligned and recorded for backpatching in `BackpatchEntryPointSlots()` - The DispatchStub was modified on x86 and x64 such that the entry point target is aligned to a pointer to make it backpatchable - A `ResolveCacheEntry`'s entry point target is recorded for backpatching in `BackpatchEntryPointSlots()` Slot lifetime and management of recorded slots: - A slot is recorded in the `LoaderAllocator` in which the slot is allocated, see `RecordAndBackpatchEntryPointSlot()` - An inherited slot that has a shorter lifetime than the `MethodDesc`, when recorded, needs to be accessible by the `MethodDesc` for backpatching, so the dependent `LoaderAllocator` with the slot to backpatch is also recorded in the `MethodDesc`'s `LoaderAllocator`, see `MethodDescVirtualInfo::AddDependentLoaderAllocatorsWithSlotsToBackpatch_Locked()` - At the end of a `LoaderAllocator`'s lifetime, the `LoaderAllocator` is unregistered from dependency `LoaderAllocators`, see `LoaderAllocator::ClearDependencyMethodDescEntryPointSlotsToBackpatchHash()` - When a `MethodDesc`'s entry point changes, backpatching also includes iterating over recorded dependent `LoaderAllocators` to backpatch the relevant slots recorded there, see `BackpatchEntryPointSlots()` Synchronization between entry point changes and backpatching slots - A global lock is used to ensure that all recorded backpatchable slots corresponding to a `MethodDesc` point to the same entry point, see `DoBackpatch()` and `BackpatchEntryPointSlots()` for examples Due to startup time perf issues: - `IsEligibleForTieredCompilation()` is called more frequently with this change and in hotter paths. I chose to use a `MethodDesc` flag to store that information for fast retreival. The flag is initialized by `DetermineAndSetIsEligibleForTieredCompilation()`. - Initially, I experimented with allowing a tiered vtable method to have a precode, and allocated a new precode that would also be the stable entry point when a direct call is necessary. That also allows recording a new slot to be optional - in the event of an OOM, the slot may just point to the stable entry point. There are a large number of such methods and the allocations were slowing down startup perf. So, I had to eliminate precodes for tiered vtable methods and that in turn means that recording slots is necessary for versionability. @jkotas @davidwrighton @noahfalk @AndyAyersMS
…oTracker, to consolidate related functionality
…atching infrastructure
1152bba
to
aa6d7e3
Compare
Rebased for conflicts |
@dotnet-bot test Tizen armel Cross Checked Innerloop Build and Test |
1 similar comment
@dotnet-bot test Tizen armel Cross Checked Innerloop Build and Test |
Tizen armel seems to be some infra issue that is unrelated to this change:
|
Fixes https://github.com/dotnet/coreclr/issues/22103 - There were reports of build failure from dotnet#21292, worked around it for now with a todo
Fixes https://github.com/dotnet/coreclr/issues/22103 - There were reports of build failure from #21292, worked around it for now with a todo
…r#21292) Patch vtable slots and similar when tiering is enabled For a method eligible for code versioning and vtable slot backpatch: - It does not have a precode (`HasPrecode()` returns false) - It does not have a stable entry point (`HasStableEntryPoint()` returns false) - A call to the method may be: - An indirect call through the `MethodTable`'s backpatchable vtable slot - A direct call to a backpatchable `FuncPtrStub`, perhaps through a `JumpStub` - For interface methods, an indirect call through the virtual stub dispatch (VSD) indirection cell to a backpatchable `DispatchStub` or a `ResolveStub` that refers to a backpatchable `ResolveCacheEntry` - The purpose is that typical calls to the method have no additional overhead when code versioning is enabled Recording and backpatching slots: - In order for all vtable slots for the method to be backpatchable: - A vtable slot initially points to the `MethodDesc`'s temporary entry point, even when the method is inherited by a derived type (the slot's value is not copied from the parent) - The temporary entry point always points to the prestub and is never backpatched, in order to be able to discover new vtable slots through which the method may be called - The prestub, as part of `DoBackpatch()`, records any slots that are transitioned from the temporary entry point to the method's at-the-time current, non-prestub entry point - Any further changes to the method's entry point cause recorded slots to be backpatched in `BackpatchEntryPointSlots()` - In order for the `FuncPtrStub` to be backpatchable: - After the `FuncPtrStub` is created and exposed, it is patched to point to the method's at-the-time current entry point if necessary - Any further changes to the method's entry point cause the `FuncPtrStub` to be backpatched in `BackpatchEntryPointSlots()` - In order for VSD entities to be backpatchable: - A `DispatchStub`'s entry point target is aligned and recorded for backpatching in `BackpatchEntryPointSlots()` - The `DispatchStub` was modified on x86 and x64 such that the entry point target is aligned to a pointer to make it backpatchable - A `ResolveCacheEntry`'s entry point target is recorded for backpatching in `BackpatchEntryPointSlots()` Slot lifetime and management of recorded slots: - A slot is recorded in the `LoaderAllocator` in which the slot is allocated, see `RecordAndBackpatchEntryPointSlot()` - An inherited slot that has a shorter lifetime than the `MethodDesc`, when recorded, needs to be accessible by the `MethodDesc` for backpatching, so the dependent `LoaderAllocator` with the slot to backpatch is also recorded in the `MethodDesc`'s `LoaderAllocator`, see `MethodDescBackpatchInfo::AddDependentLoaderAllocator_Locked()` - At the end of a `LoaderAllocator`'s lifetime, the `LoaderAllocator` is unregistered from dependency `LoaderAllocators`, see `MethodDescBackpatchInfoTracker::ClearDependencyMethodDescEntryPointSlots()` - When a `MethodDesc`'s entry point changes, backpatching also includes iterating over recorded dependent `LoaderAllocators` to backpatch the relevant slots recorded there, see `BackpatchEntryPointSlots()` Synchronization between entry point changes and backpatching slots - A global lock is used to ensure that all recorded backpatchable slots corresponding to a `MethodDesc` point to the same entry point, see `DoBackpatch()` and `BackpatchEntryPointSlots()` for examples Due to startup time perf issues: - `IsEligibleForTieredCompilation()` is called more frequently with this change and in hotter paths. I chose to use a `MethodDesc` flag to store that information for fast retreival. The flag is initialized by `DetermineAndSetIsEligibleForTieredCompilation()`. - Initially, I experimented with allowing a method versionable with vtable slot backpatch to have a precode, and allocated a new precode that would also be the stable entry point when a direct call is necessary. That also allows recording a new slot to be optional - in the event of an OOM, the slot may just point to the stable entry point. There are a large number of such methods and the allocations were slowing down startup perf. So, I had to eliminate precodes for methods versionable with vtable slot backpatch and that in turn means that recording slots is necessary for versionability. Commit migrated from dotnet/coreclr@37b9d85
Fixes https://github.com/dotnet/coreclr/issues/22103 - There were reports of build failure from dotnet/coreclr#21292, worked around it for now with a todo Commit migrated from dotnet/coreclr@29d442f
For a method eligible for tiered compilation and vtable slot backpatching:
HasPrecode()
returns false)HasStableEntryPoint()
returns false)MethodTable
's backpatchable vtable slotFuncPtrStub
, perhaps through aJumpStub
DispatchStub
or aResolveStub
that refers to a backpatchableResolveCacheEntry
Recording and backpatching slots:
MethodDesc
's temporary entry point, even when the method is inherited by a derived type (the slot's value is not copied from the parent)DoBackpatch()
, records any slots that are transitioned from the temporary entry point to the method's at-the-time current, non-prestub entry pointBackpatchEntryPointSlots()
FuncPtrStub
to be backpatchable:FuncPtrStub
is created and exposed, it is patched to point to the method's at-the-time current entry point if necessaryFuncPtrStub
to be backpatched inBackpatchEntryPointSlots()
DispatchStub
's entry point target is aligned and recorded for backpatching inBackpatchEntryPointSlots()
ResolveCacheEntry
's entry point target is recorded for backpatching inBackpatchEntryPointSlots()
Slot lifetime and management of recorded slots:
LoaderAllocator
in which the slot is allocated, seeRecordAndBackpatchEntryPointSlot()
MethodDesc
, when recorded, needs to be accessible by theMethodDesc
for backpatching, so the dependentLoaderAllocator
with the slot to backpatch is also recorded in theMethodDesc
'sLoaderAllocator
, seeMethodDescBackpatchInfo::AddDependentLoaderAllocatorsWithSlotsToBackpatch_Locked()
LoaderAllocator
's lifetime, theLoaderAllocator
is unregistered from dependencyLoaderAllocators
, seeMethodDescBackpatchInfoTracker::ClearDependencyMethodDescEntryPointSlotsToBackpatchHash()
MethodDesc
's entry point changes, backpatching also includes iterating over recorded dependentLoaderAllocators
to backpatch the relevant slots recorded there, seeBackpatchEntryPointSlots()
Synchronization between entry point changes and backpatching slots
MethodDesc
point to the same entry point, seeDoBackpatch()
andBackpatchEntryPointSlots()
for examplesTypical slot value transitions:
Due to startup time perf issues:
IsEligibleForTieredCompilation()
is called more frequently with this change and in hotter paths. I chose to use aMethodDesc
flag to store that information for fast retreival. The flag is initialized byDetermineAndSetIsEligibleForTieredCompilation()
.@jkotas @davidwrighton @noahfalk @AndyAyersMS