Cleanup EH / stackwalk interpreter frame handling#124296
Cleanup EH / stackwalk interpreter frame handling#124296janvorli wants to merge 14 commits intodotnet:mainfrom
Conversation
|
Tagging subscribers to this area: @BrzVlad, @janvorli, @kg |
There was a problem hiding this comment.
Pull request overview
This PR refactors CoreCLR’s interpreter-related stack walking / EH integration by hiding the InterpreterFrame::DummyCallerIP transition inside StackFrameIterator, removing EH-side special cases, and ensuring interpreter-related transitions always have a valid TransitionBlock (including callee-saved FP regs) via new/updated asm helpers.
Changes:
- Hide the
DummyCallerIPtransition withinStackFrameIteratorand remove corresponding special handling in exception handling. - Extend transition-block prologs to optionally save callee-saved floating-point registers, and add interpreter asm helpers to ensure a real
TransitionBlockexists for funclets / async continuation resume. - Switch
AsyncHelpers_ResumeInterpreterContinuationfrom QCALL to an InternalCall surfaced via FCALL, backed by asm helpers that pass aTransitionBlock*to a C++ worker.
Reviewed changes
Copilot reviewed 29 out of 29 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/vm/stackwalk.h | Removes interpreter-specific saved-register fields from StackFrameIterator. |
| src/coreclr/vm/stackwalk.cpp | Skips DummyCallerIP frames in the iterator and adjusts interpreter-frame walking behavior. |
| src/coreclr/vm/frames.h | Reworks FP-register update as a pseudo-virtual dispatch with a new (pRD, targetSP) shape. |
| src/coreclr/vm/frames.cpp | Implements the new FP-register dispatch and updates interpreter/inlined-call-frame handling. |
| src/coreclr/vm/exceptionhandling.cpp | Removes EH special casing for DummyCallerIP and interpreter native-transition logic. |
| src/coreclr/vm/eetwain.cpp | Routes interpreter EH funclet invocation through asm helper + worker w/ TransitionBlock*. |
| src/coreclr/vm/interpexec.cpp | Introduces AsyncHelpers_ResumeInterpreterContinuationWorker(..., TransitionBlock*) and updates frame setup. |
| src/coreclr/vm/qcallentrypoints.cpp | Removes QCALL entry for AsyncHelpers_ResumeInterpreterContinuation. |
| src/coreclr/vm/ecalllist.h | Adds FCALL mapping for AsyncHelpers_ResumeInterpreterContinuation and AsyncHelpers class entry. |
| src/coreclr/vm/corelib.cpp | Declares the internal-call entrypoint for AsyncHelpers_ResumeInterpreterContinuation. |
| src/coreclr/vm/amd64/cgenamd64.cpp | Adds interpreter-specific callee-saved XMM restore from transition-block-adjacent save area. |
| src/coreclr/vm/amd64/asmhelpers.S | Adds unix amd64 asm helper for AsyncHelpers_ResumeInterpreterContinuation and interpreter funclets. |
| src/coreclr/vm/amd64/AsmHelpers.asm | Adds Windows amd64 asm helpers and uses new prolog option to save callee-saved XMM regs. |
| src/coreclr/vm/amd64/AsmMacros.inc | Extends PROLOG_WITH_TRANSITION_BLOCK to optionally save callee-saved XMM regs. |
| src/coreclr/vm/amd64/VirtualCallStubAMD64.asm | Updates prolog invocation for new macro parameter ordering. |
| src/coreclr/vm/amd64/ExternalMethodFixupThunk.asm | Updates prolog invocation for new macro parameter ordering. |
| src/coreclr/vm/arm/stubs.cpp | Adds interpreter FP restore from transition-block-adjacent save area. |
| src/coreclr/vm/arm/asmhelpers.S | Uses new prolog option and adds interpreter funclet + async continuation asm helpers. |
| src/coreclr/vm/arm64/stubs.cpp | Adds interpreter FP restore from transition-block-adjacent save area. |
| src/coreclr/vm/arm64/asmmacros.h | Extends transition-block prolog to optionally save callee-saved FP regs. |
| src/coreclr/vm/arm64/asmhelpers.asm | Uses new prolog option and adds interpreter funclet + async continuation asm helpers. |
| src/coreclr/vm/arm64/asmhelpers.S | Uses new prolog option and adds interpreter funclet + async continuation asm helpers (unix). |
| src/coreclr/vm/riscv64/stubs.cpp | Adds interpreter FP restore from transition-block-adjacent save area. |
| src/coreclr/vm/riscv64/asmhelpers.S | Uses new prolog option and adds interpreter funclet + async continuation asm helpers. |
| src/coreclr/pal/inc/unixasmmacrosarm.inc | Extends unix ARM prolog to optionally save callee-saved FP regs. |
| src/coreclr/pal/inc/unixasmmacrosarm64.inc | Extends unix ARM64 prolog to optionally save callee-saved FP regs. |
| src/coreclr/pal/inc/unixasmmacrosriscv64.inc | Extends unix RISC-V prolog to optionally save callee-saved FP regs. |
| src/coreclr/System.Private.CoreLib/src/System/Runtime/ExceptionServices/AsmOffsets.cs | Removes FEATURE_INTERPRETER size/offset variants for StackFrameIterator. |
| src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncHelpers.CoreCLR.cs | Switches from QCALL LibraryImport to [MethodImpl(InternalCall)] for interpreter continuation resume. |
src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncHelpers.CoreCLR.cs
Outdated
Show resolved
Hide resolved
src/coreclr/vm/interpexec.cpp
Outdated
| } | ||
|
|
||
| #ifdef TARGET_WASM | ||
| extern "C" ContinuationObject* AsyncHelpers_ResumeInterpreterContinuation(ContinuationObject* cont, uint8_t* resultStorage) |
There was a problem hiding this comment.
Use FCIMPL macro here to make it clear that it is managed calling convention?
| #if FEATURE_INTERPRETER | ||
| [LibraryImport(RuntimeHelpers.QCall, EntryPoint = "AsyncHelpers_ResumeInterpreterContinuation")] | ||
| [MethodImpl(MethodImplOptions.InternalCall)] | ||
| [StackTraceHidden] |
There was a problem hiding this comment.
| [StackTraceHidden] |
Not needed anymore?
There was a problem hiding this comment.
It had this attribute before my change, so I assume it is needed.
There was a problem hiding this comment.
It was PInvoke before your change. PInvokes get a named frame that is seen by the stackwalker.
It is FCall now. FCalls do not get a named frame that is seen by the stackwalker, so there should be nothing to hide.
| NESTED_ENTRY InterpreterStub, _TEXT | ||
|
|
||
| PROLOG_WITH_TRANSITION_BLOCK | ||
| PROLOG_WITH_TRANSITION_BLOCK 0, <PushCalleeSavedFloatRegs> |
There was a problem hiding this comment.
My understanding is that, instead of using libunwind to obtain the set of callee saved registers at this interpreter entry location, we store them explicitly each time into the transition block. Trying to understand the benefits, do we have a strong motivation to do this ? Given we would make interpreter entry even more expensive.
There was a problem hiding this comment.
I've tried to describe the motivations in the PR description. But let me share more details:
- There is an ongoing effort to get rid of libunwind dependency, @am11 has been making changes towards that goal. If we don't capture callee saved floats in the transition block, then it seems the only other way to restore them properly is to unwind from a known native context.
- Debugger single stepping machinery for interpreted code needs to start on interpreter context. But it unwinds the stack from there in some cases and if it leaves the first interpreted frame under an
InterpreterFrame, it needs to move to the managed or native context of the caller of that interpreted frame. In non-debugger scenarios, it uses the saved SP, FP, IP and the first argument register to restore the context to the native context inside of theInterpExecMethodand then unwinds it until it finds the caller mentioned above. Saving of these registers in the context happens in the stack frame iterator when it hits theInterpreterFramefor the first time before it overwrites them by the interpreter context. But that doesn't happen in the debugger single stepping scenario when we start the stack frame iterator in the middle of the interpreted code. I guess we can solve this in a different way if we had to.
The main motivation was the first one, the second one just moving the needle more in the motivation.
I am not worried about adding extra cost at the exceptionally called funclet entry. For AOT to interpreter transitions, my hope is that pushing the extra floating point callee saved registers won't have visible perf effect given all the other machinery and cost of interpreted code execution. Since you now have made some perf benchmarking, it seems we have a baseline to see if it has any visible effect or not.
A minor extra motivation is that it makes the stack frame iterator code cleaner, but that was more of a nice outcome than anything else.
There was a problem hiding this comment.
Thanks for the explanation. I was wondering mainly about the main motivation behind getting rid of libunwind. (And whether we want to do this at the expense of minor potential regressions)
There was a problem hiding this comment.
libunwind’s unw_step interprets DWARF CFI data to walk stack frames. That interpretation step is inherently more expensive than adjusting our own prologue/epilogue metadata and performing direct register updates for predictable, deterministic transitions, where a lightweight unwinding strategy is sufficient.
From a maintenance perspective, porting libunwind to new platforms is typically non-trivial and requires us to carry and service the external codebase (src/native/external/libunwind). Reducing this dependency would simplify our platform surface area.
Dependency footprint in terms of PAL_VirtualUnwind (which is the entrypoint to unw_step) call sites:
$ git grep 'PAL_VirtualUnwind' dotnet/release/8.0 -- src/coreclr | wc -l
41
$ git grep 'PAL_VirtualUnwind' dotnet/release/9.0 -- src/coreclr | wc -l
42
$ git grep 'PAL_VirtualUnwind' dotnet/release/10.0 -- src/coreclr | wc -l
25
$ git grep 'PAL_VirtualUnwind' dotnet/main -- src/coreclr | wc -l
25We may not be able to eliminate all usages, but we can continue reducing the dependency and evaluate alternative strategies for the rest, however feasible.
Runtimes that control their code generation (go, java..) typically also control their unwinding strategy and do not rely on general OS unwinding for managed frames. At native–managed boundaries, behavior is usually constrained, and cross-boundary unwinding is either disallowed or treated as a fatal condition.
There was a problem hiding this comment.
Even if unwinding via libunwind is slower, it seems we would want to prioritize normal execution path, rather than the exceptional flow. If we have the option of using it on some platforms, to me it seems like we should keep this option open. Maybe have the possibility of using it to populate the full register context only for a subset of frame types, like the interpreter transitions. Of course, these are only some thoughts from a high level perspective, this might not necessarily be feasible for what it's worth.
There was a problem hiding this comment.
There is an ongoing effort to get rid of libunwind dependency, @am11 has been making changes towards that goal. If we don't capture callee saved floats in the transition block, then it seems the only other way to restore them properly is to unwind from a known native context.
The other pattern that can be used to get rid of libunwind dependencies is try/catch the C++ exception in C++ and return non-exceptionally to the caller. #123482 is proposing that we do that for QCalls. You pay the cost of C++ try/catch and error return check vs. the cost of arch-specific asm helpers that must save a bunch of registers. (I understand that this alternative may not be suitable in this case.)
For AOT to interpreter transitions, my hope is that pushing the extra floating point callee saved registers won't have visible perf effect
Some perf numbers would be nice.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 31 out of 31 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (2)
src/coreclr/vm/frames.cpp:692
- InlinedCallFrame::UpdateFloatingPointRegisters_Impl takes targetSP but doesn't use it, which can trigger unused-parameter warnings (often treated as errors in this repo). Consider either omitting the parameter name, adding UNREFERENCED_PARAMETER(targetSP), or using targetSP as part of the unwind logic if intended.
void InlinedCallFrame::UpdateFloatingPointRegisters_Impl(const PREGDISPLAY pRD, TADDR targetSP)
{
#ifdef FEATURE_INTERPRETER
if (IsInInterpreter())
{
InterpreterFrame *pInterpreterFrame = (InterpreterFrame *)m_Next;
pInterpreterFrame->UpdateFloatingPointRegisters(pRD, pInterpreterFrame->GetInterpExecMethodSP());
pInterpreterFrame->SetContextToInterpMethodContextFrame(pRD->pCurrentContext);
return;
}
#endif // FEATURE_INTERPRETER
{
_ASSERTE(!ExecutionManager::IsManagedCode(::GetIP(pRD->pCurrentContext)));
while (!ExecutionManager::IsManagedCode(::GetIP(pRD->pCurrentContext)))
{
#ifdef TARGET_UNIX
PAL_VirtualUnwind(pRD->pCurrentContext, NULL);
#else
Thread::VirtualUnwindCallFrame(pRD);
#endif
}
}
}
src/coreclr/vm/interpexec.cpp:1030
- AsyncHelpers_ResumeInterpreterContinuationWorker uses contRef (OBJECTREF/CONTINUATIONREF) across GC-triggering operations (e.g., GetResumeInfo(), InterpExecMethod) without a GC protection scope. This can introduce a GC hole if a GC occurs before the ref is anchored in a reported root. Wrap contRef in GCPROTECT_BEGIN/END (or equivalent) for the duration it must remain live.
CONTINUATIONREF contRef = (CONTINUATIONREF)ObjectToOBJECTREF(cont);
NULL_CHECK(contRef);
// We are working with an interpreter async continuation, move things around to get the InterpAsyncSuspendData
size_t resumeDataOffset = offsetof(InterpAsyncSuspendData, resumeInfo);
InterpAsyncSuspendData* pSuspendData = (InterpAsyncSuspendData*)(void*)(((uint8_t*)contRef->GetResumeInfo()) - resumeDataOffset);
_ASSERTE(&pSuspendData->resumeInfo.Resume == contRef->GetResumeInfo());
|
Update: I am working on fixing a problem that copilot has identified - this change causes the InterpreterFrame to not to be ever reported by the stack frame iterator, which leads to a GC hole as the InterpreterFrame contains the continuation object. While I could protect the continuation member explicitly at interpreter call sites or even move it out of the frame, I'd prefer to still report the interpreter frames as any other frame and not introducing this special case. I am working on such update to this PR. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 31 out of 31 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
src/coreclr/vm/interpexec.cpp:1031
- contRef is created from the raw managed pointer argument and then used across GC-triggering operations (e.g., GetOrCreateInterpThreadContext, InterpExecMethod, memmoveGCRefs). Unlike the prior QCall ObjectHandleOnStack, this local OBJECTREF isn’t explicitly protected, which can become a GC hole if a collection moves the object. Please protect contRef (and any subsequently assigned continuation) with the usual GCPROTECT pattern or another GC-safe handle/rooting mechanism for the duration it is used.
frames(pTransitionBlock);
CONTINUATIONREF contRef = (CONTINUATIONREF)ObjectToOBJECTREF(cont);
NULL_CHECK(contRef);
// We are working with an interpreter async continuation, move things around to get the InterpAsyncSuspendData
size_t resumeDataOffset = offsetof(InterpAsyncSuspendData, resumeInfo);
InterpAsyncSuspendData* pSuspendData = (InterpAsyncSuspendData*)(void*)(((uint8_t*)contRef->GetResumeInfo()) - resumeDataOffset);
_ASSERTE(&pSuspendData->resumeInfo.Resume == contRef->GetResumeInfo());
src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncHelpers.CoreCLR.cs
Show resolved
Hide resolved
|
I've fixed the problem with InterpreterFrame reporting. It is now not being skipped anymore. |
Currently, the transition from the topmost interpreted frame belonging to
an
InterpreterFrameto the managed / native caller is not hidden inthe stack frame iterator, but exposed internally. The iterator returns
a frame with instruction pointer set to a dummy value, the
InterpreterFrame::DummyCallerIP. The EH code then handles thatin a special way in the
SfiNextWorkerto detect transitions to nativecalling code.
This PR changes that so that this is completely hidden in the stack frame
iterator and all the special handling in the EH code is removed.
Another change this PR makes is to get rid of the saved SP, IP and FP
when transitioning to interpreted frames and from them. That was
complicating implementation of debugger single stepping. The way
to achieve that is to always have transition frame for interpreter invocation
and to extend that transition frame by callee saved floating point registers.
This also removes the need to use virtual unwinding to restore the callee
saved floating point register values when moving out of interpreted
frames under an
InterpreterFrame. This is in line with another effortthat is trying to reduce and possibly remove usage of libunwind.
I had to change the
AsyncHelpers_ResumeInterpreterContinuationfrom a QCALL to an asm helper to let it also have valid transition block.
Also the interpreted EH funclets are now called through an asm helper
for the same reason. Before that, both of these cases used NULL transition
block and needed to handle it in a special way.