Skip to content

Cleanup EH / stackwalk interpreter frame handling#124296

Open
janvorli wants to merge 14 commits intodotnet:mainfrom
janvorli:cleanup-eh-interpreter-frame-handling
Open

Cleanup EH / stackwalk interpreter frame handling#124296
janvorli wants to merge 14 commits intodotnet:mainfrom
janvorli:cleanup-eh-interpreter-frame-handling

Conversation

@janvorli
Copy link
Member

Currently, the transition from the topmost interpreted frame belonging to
an InterpreterFrame to the managed / native caller is not hidden in
the stack frame iterator, but exposed internally. The iterator returns
a frame with instruction pointer set to a dummy value, the
InterpreterFrame::DummyCallerIP. The EH code then handles that
in a special way in the SfiNextWorker to detect transitions to native
calling code.

This PR changes that so that this is completely hidden in the stack frame
iterator and all the special handling in the EH code is removed.

Another change this PR makes is to get rid of the saved SP, IP and FP
when transitioning to interpreted frames and from them. That was
complicating implementation of debugger single stepping. The way
to achieve that is to always have transition frame for interpreter invocation
and to extend that transition frame by callee saved floating point registers.
This also removes the need to use virtual unwinding to restore the callee
saved floating point register values when moving out of interpreted
frames under an InterpreterFrame. This is in line with another effort
that is trying to reduce and possibly remove usage of libunwind.
I had to change the AsyncHelpers_ResumeInterpreterContinuation
from a QCALL to an asm helper to let it also have valid transition block.
Also the interpreted EH funclets are now called through an asm helper
for the same reason. Before that, both of these cases used NULL transition
block and needed to handle it in a special way.

@janvorli janvorli added this to the 11.0.0 milestone Feb 11, 2026
@janvorli janvorli self-assigned this Feb 11, 2026
Copilot AI review requested due to automatic review settings February 11, 2026 20:50
@janvorli janvorli requested review from BrzVlad and kg as code owners February 11, 2026 20:50
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @BrzVlad, @janvorli, @kg
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors CoreCLR’s interpreter-related stack walking / EH integration by hiding the InterpreterFrame::DummyCallerIP transition inside StackFrameIterator, removing EH-side special cases, and ensuring interpreter-related transitions always have a valid TransitionBlock (including callee-saved FP regs) via new/updated asm helpers.

Changes:

  • Hide the DummyCallerIP transition within StackFrameIterator and remove corresponding special handling in exception handling.
  • Extend transition-block prologs to optionally save callee-saved floating-point registers, and add interpreter asm helpers to ensure a real TransitionBlock exists for funclets / async continuation resume.
  • Switch AsyncHelpers_ResumeInterpreterContinuation from QCALL to an InternalCall surfaced via FCALL, backed by asm helpers that pass a TransitionBlock* to a C++ worker.

Reviewed changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
src/coreclr/vm/stackwalk.h Removes interpreter-specific saved-register fields from StackFrameIterator.
src/coreclr/vm/stackwalk.cpp Skips DummyCallerIP frames in the iterator and adjusts interpreter-frame walking behavior.
src/coreclr/vm/frames.h Reworks FP-register update as a pseudo-virtual dispatch with a new (pRD, targetSP) shape.
src/coreclr/vm/frames.cpp Implements the new FP-register dispatch and updates interpreter/inlined-call-frame handling.
src/coreclr/vm/exceptionhandling.cpp Removes EH special casing for DummyCallerIP and interpreter native-transition logic.
src/coreclr/vm/eetwain.cpp Routes interpreter EH funclet invocation through asm helper + worker w/ TransitionBlock*.
src/coreclr/vm/interpexec.cpp Introduces AsyncHelpers_ResumeInterpreterContinuationWorker(..., TransitionBlock*) and updates frame setup.
src/coreclr/vm/qcallentrypoints.cpp Removes QCALL entry for AsyncHelpers_ResumeInterpreterContinuation.
src/coreclr/vm/ecalllist.h Adds FCALL mapping for AsyncHelpers_ResumeInterpreterContinuation and AsyncHelpers class entry.
src/coreclr/vm/corelib.cpp Declares the internal-call entrypoint for AsyncHelpers_ResumeInterpreterContinuation.
src/coreclr/vm/amd64/cgenamd64.cpp Adds interpreter-specific callee-saved XMM restore from transition-block-adjacent save area.
src/coreclr/vm/amd64/asmhelpers.S Adds unix amd64 asm helper for AsyncHelpers_ResumeInterpreterContinuation and interpreter funclets.
src/coreclr/vm/amd64/AsmHelpers.asm Adds Windows amd64 asm helpers and uses new prolog option to save callee-saved XMM regs.
src/coreclr/vm/amd64/AsmMacros.inc Extends PROLOG_WITH_TRANSITION_BLOCK to optionally save callee-saved XMM regs.
src/coreclr/vm/amd64/VirtualCallStubAMD64.asm Updates prolog invocation for new macro parameter ordering.
src/coreclr/vm/amd64/ExternalMethodFixupThunk.asm Updates prolog invocation for new macro parameter ordering.
src/coreclr/vm/arm/stubs.cpp Adds interpreter FP restore from transition-block-adjacent save area.
src/coreclr/vm/arm/asmhelpers.S Uses new prolog option and adds interpreter funclet + async continuation asm helpers.
src/coreclr/vm/arm64/stubs.cpp Adds interpreter FP restore from transition-block-adjacent save area.
src/coreclr/vm/arm64/asmmacros.h Extends transition-block prolog to optionally save callee-saved FP regs.
src/coreclr/vm/arm64/asmhelpers.asm Uses new prolog option and adds interpreter funclet + async continuation asm helpers.
src/coreclr/vm/arm64/asmhelpers.S Uses new prolog option and adds interpreter funclet + async continuation asm helpers (unix).
src/coreclr/vm/riscv64/stubs.cpp Adds interpreter FP restore from transition-block-adjacent save area.
src/coreclr/vm/riscv64/asmhelpers.S Uses new prolog option and adds interpreter funclet + async continuation asm helpers.
src/coreclr/pal/inc/unixasmmacrosarm.inc Extends unix ARM prolog to optionally save callee-saved FP regs.
src/coreclr/pal/inc/unixasmmacrosarm64.inc Extends unix ARM64 prolog to optionally save callee-saved FP regs.
src/coreclr/pal/inc/unixasmmacrosriscv64.inc Extends unix RISC-V prolog to optionally save callee-saved FP regs.
src/coreclr/System.Private.CoreLib/src/System/Runtime/ExceptionServices/AsmOffsets.cs Removes FEATURE_INTERPRETER size/offset variants for StackFrameIterator.
src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncHelpers.CoreCLR.cs Switches from QCALL LibraryImport to [MethodImpl(InternalCall)] for interpreter continuation resume.

Copilot AI review requested due to automatic review settings February 12, 2026 00:02
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 31 out of 31 changed files in this pull request and generated 3 comments.

}

#ifdef TARGET_WASM
extern "C" ContinuationObject* AsyncHelpers_ResumeInterpreterContinuation(ContinuationObject* cont, uint8_t* resultStorage)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use FCIMPL macro here to make it clear that it is managed calling convention?

#if FEATURE_INTERPRETER
[LibraryImport(RuntimeHelpers.QCall, EntryPoint = "AsyncHelpers_ResumeInterpreterContinuation")]
[MethodImpl(MethodImplOptions.InternalCall)]
[StackTraceHidden]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[StackTraceHidden]

Not needed anymore?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It had this attribute before my change, so I assume it is needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was PInvoke before your change. PInvokes get a named frame that is seen by the stackwalker.

It is FCall now. FCalls do not get a named frame that is seen by the stackwalker, so there should be nothing to hide.

NESTED_ENTRY InterpreterStub, _TEXT

PROLOG_WITH_TRANSITION_BLOCK
PROLOG_WITH_TRANSITION_BLOCK 0, <PushCalleeSavedFloatRegs>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that, instead of using libunwind to obtain the set of callee saved registers at this interpreter entry location, we store them explicitly each time into the transition block. Trying to understand the benefits, do we have a strong motivation to do this ? Given we would make interpreter entry even more expensive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to describe the motivations in the PR description. But let me share more details:

  • There is an ongoing effort to get rid of libunwind dependency, @am11 has been making changes towards that goal. If we don't capture callee saved floats in the transition block, then it seems the only other way to restore them properly is to unwind from a known native context.
  • Debugger single stepping machinery for interpreted code needs to start on interpreter context. But it unwinds the stack from there in some cases and if it leaves the first interpreted frame under an InterpreterFrame, it needs to move to the managed or native context of the caller of that interpreted frame. In non-debugger scenarios, it uses the saved SP, FP, IP and the first argument register to restore the context to the native context inside of the InterpExecMethod and then unwinds it until it finds the caller mentioned above. Saving of these registers in the context happens in the stack frame iterator when it hits the InterpreterFrame for the first time before it overwrites them by the interpreter context. But that doesn't happen in the debugger single stepping scenario when we start the stack frame iterator in the middle of the interpreted code. I guess we can solve this in a different way if we had to.

The main motivation was the first one, the second one just moving the needle more in the motivation.
I am not worried about adding extra cost at the exceptionally called funclet entry. For AOT to interpreter transitions, my hope is that pushing the extra floating point callee saved registers won't have visible perf effect given all the other machinery and cost of interpreted code execution. Since you now have made some perf benchmarking, it seems we have a baseline to see if it has any visible effect or not.

A minor extra motivation is that it makes the stack frame iterator code cleaner, but that was more of a nice outcome than anything else.

Copy link
Member

@BrzVlad BrzVlad Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. I was wondering mainly about the main motivation behind getting rid of libunwind. (And whether we want to do this at the expense of minor potential regressions)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libunwind’s unw_step interprets DWARF CFI data to walk stack frames. That interpretation step is inherently more expensive than adjusting our own prologue/epilogue metadata and performing direct register updates for predictable, deterministic transitions, where a lightweight unwinding strategy is sufficient.

From a maintenance perspective, porting libunwind to new platforms is typically non-trivial and requires us to carry and service the external codebase (src/native/external/libunwind). Reducing this dependency would simplify our platform surface area.

Dependency footprint in terms of PAL_VirtualUnwind (which is the entrypoint to unw_step) call sites:

$ git grep 'PAL_VirtualUnwind' dotnet/release/8.0 -- src/coreclr | wc -l
      41
$ git grep 'PAL_VirtualUnwind' dotnet/release/9.0 -- src/coreclr | wc -l
      42
$ git grep 'PAL_VirtualUnwind' dotnet/release/10.0 -- src/coreclr | wc -l
      25
$ git grep 'PAL_VirtualUnwind' dotnet/main -- src/coreclr | wc -l
      25

We may not be able to eliminate all usages, but we can continue reducing the dependency and evaluate alternative strategies for the rest, however feasible.

Runtimes that control their code generation (go, java..) typically also control their unwinding strategy and do not rely on general OS unwinding for managed frames. At native–managed boundaries, behavior is usually constrained, and cross-boundary unwinding is either disallowed or treated as a fatal condition.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if unwinding via libunwind is slower, it seems we would want to prioritize normal execution path, rather than the exceptional flow. If we have the option of using it on some platforms, to me it seems like we should keep this option open. Maybe have the possibility of using it to populate the full register context only for a subset of frame types, like the interpreter transitions. Of course, these are only some thoughts from a high level perspective, this might not necessarily be feasible for what it's worth.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an ongoing effort to get rid of libunwind dependency, @am11 has been making changes towards that goal. If we don't capture callee saved floats in the transition block, then it seems the only other way to restore them properly is to unwind from a known native context.

The other pattern that can be used to get rid of libunwind dependencies is try/catch the C++ exception in C++ and return non-exceptionally to the caller. #123482 is proposing that we do that for QCalls. You pay the cost of C++ try/catch and error return check vs. the cost of arch-specific asm helpers that must save a bunch of registers. (I understand that this alternative may not be suitable in this case.)

For AOT to interpreter transitions, my hope is that pushing the extra floating point callee saved registers won't have visible perf effect

Some perf numbers would be nice.

Copilot AI review requested due to automatic review settings February 12, 2026 12:18
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 31 out of 31 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (2)

src/coreclr/vm/frames.cpp:692

  • InlinedCallFrame::UpdateFloatingPointRegisters_Impl takes targetSP but doesn't use it, which can trigger unused-parameter warnings (often treated as errors in this repo). Consider either omitting the parameter name, adding UNREFERENCED_PARAMETER(targetSP), or using targetSP as part of the unwind logic if intended.
void InlinedCallFrame::UpdateFloatingPointRegisters_Impl(const PREGDISPLAY pRD, TADDR targetSP)
{
#ifdef FEATURE_INTERPRETER
    if (IsInInterpreter())
    {
        InterpreterFrame *pInterpreterFrame = (InterpreterFrame *)m_Next;
        pInterpreterFrame->UpdateFloatingPointRegisters(pRD, pInterpreterFrame->GetInterpExecMethodSP());
        pInterpreterFrame->SetContextToInterpMethodContextFrame(pRD->pCurrentContext);
        return;
    }
#endif // FEATURE_INTERPRETER
    {
        _ASSERTE(!ExecutionManager::IsManagedCode(::GetIP(pRD->pCurrentContext)));
        while (!ExecutionManager::IsManagedCode(::GetIP(pRD->pCurrentContext)))
        {
    #ifdef TARGET_UNIX
            PAL_VirtualUnwind(pRD->pCurrentContext, NULL);
    #else
            Thread::VirtualUnwindCallFrame(pRD);
    #endif
        }
    }
}

src/coreclr/vm/interpexec.cpp:1030

  • AsyncHelpers_ResumeInterpreterContinuationWorker uses contRef (OBJECTREF/CONTINUATIONREF) across GC-triggering operations (e.g., GetResumeInfo(), InterpExecMethod) without a GC protection scope. This can introduce a GC hole if a GC occurs before the ref is anchored in a reported root. Wrap contRef in GCPROTECT_BEGIN/END (or equivalent) for the duration it must remain live.
    CONTINUATIONREF contRef = (CONTINUATIONREF)ObjectToOBJECTREF(cont);
    NULL_CHECK(contRef);

    // We are working with an interpreter async continuation, move things around to get the InterpAsyncSuspendData
    size_t resumeDataOffset = offsetof(InterpAsyncSuspendData, resumeInfo);
    InterpAsyncSuspendData* pSuspendData = (InterpAsyncSuspendData*)(void*)(((uint8_t*)contRef->GetResumeInfo()) - resumeDataOffset);
    _ASSERTE(&pSuspendData->resumeInfo.Resume == contRef->GetResumeInfo());

@janvorli
Copy link
Member Author

Update: I am working on fixing a problem that copilot has identified - this change causes the InterpreterFrame to not to be ever reported by the stack frame iterator, which leads to a GC hole as the InterpreterFrame contains the continuation object. While I could protect the continuation member explicitly at interpreter call sites or even move it out of the frame, I'd prefer to still report the interpreter frames as any other frame and not introducing this special case. I am working on such update to this PR.

Copilot AI review requested due to automatic review settings February 13, 2026 20:14
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 31 out of 31 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

src/coreclr/vm/interpexec.cpp:1031

  • contRef is created from the raw managed pointer argument and then used across GC-triggering operations (e.g., GetOrCreateInterpThreadContext, InterpExecMethod, memmoveGCRefs). Unlike the prior QCall ObjectHandleOnStack, this local OBJECTREF isn’t explicitly protected, which can become a GC hole if a collection moves the object. Please protect contRef (and any subsequently assigned continuation) with the usual GCPROTECT pattern or another GC-safe handle/rooting mechanism for the duration it is used.
    frames(pTransitionBlock);

    CONTINUATIONREF contRef = (CONTINUATIONREF)ObjectToOBJECTREF(cont);
    NULL_CHECK(contRef);

    // We are working with an interpreter async continuation, move things around to get the InterpAsyncSuspendData
    size_t resumeDataOffset = offsetof(InterpAsyncSuspendData, resumeInfo);
    InterpAsyncSuspendData* pSuspendData = (InterpAsyncSuspendData*)(void*)(((uint8_t*)contRef->GetResumeInfo()) - resumeDataOffset);
    _ASSERTE(&pSuspendData->resumeInfo.Resume == contRef->GetResumeInfo());

@janvorli
Copy link
Member Author

I've fixed the problem with InterpreterFrame reporting. It is now not being skipped anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants