-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Change unhandled exception detection #117620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change unhandled exception detection #117620
Conversation
Currently, the code detecting whether an exception should be propagated into native frames and not reported as unhandled relies on presence of the DebuggerU2MFrame with "CatchAllExceptions" set. Recent fixes due to this marking missing causing problems revealed that it would be actually better to do the detection in an inverse manner. That means always propagate the exception to native frames unless there is an explicit marker indicating that the exception needs to be reported as unhandled. Such a marker should only mark the main entry point for an application and a managed thread entry. All other places should just let the exception flow into the native code. This change fixes it by creating a new kind of explicit frame to mark the code paths that should report unhandled exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR changes the approach for detecting unhandled exceptions in the CoreCLR runtime. Instead of relying on the presence of DebuggerU2MCatchHandlerFrame
with "CatchAllExceptions" to determine exception propagation behavior, it introduces a new explicit marker frame that indicates when exceptions should be reported as unhandled.
Key changes include:
- Introduction of
UnhandledExceptionMarkerFrame
for explicit unhandled exception marking - Removal of the
CatchesAllExceptions
functionality fromDebuggerU2MCatchHandlerFrame
- Updated exception handling logic to use the new marker frame approach
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
src/coreclr/vm/FrameTypes.h | Adds new frame type UnhandledExceptionMarkerFrame |
src/coreclr/vm/frames.h | Defines the new UnhandledExceptionMarkerFrame class and removes CatchesAllExceptions from DebuggerU2MCatchHandlerFrame |
src/coreclr/vm/threads.cpp | Updates managed thread dispatch to use new marker frame |
src/coreclr/vm/corhost.cpp | Adds unhandled exception marker frame to assembly execution |
src/coreclr/vm/exceptionhandling.cpp | Updates exception handling logic to check for new marker frame |
src/coreclr/vm/jitinterface.cpp | Removes debugger catch handler frame usage |
src/coreclr/vm/interoplibinterface_comwrappers.cpp | Removes debugger catch handler frame usage |
src/coreclr/vm/dispatchinfo.cpp | Updates constructor call for simplified DebuggerU2MCatchHandlerFrame |
{ | ||
LIMITED_METHOD_DAC_CONTRACT; | ||
return m_catchesAllExceptions; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The UnhandledExceptionMarkerFrame class is missing the GetTransitionType() method override that is present in DebuggerU2MCatchHandlerFrame. This inconsistency may cause issues with frame type identification during stack walks.
Copilot uses AI. Check for mistakes.
And finalizer thread? |
Ah, yes, that one needs it too. |
Actually, not. The finalizer thread uses ManagedThreadBase_DispatchOuter that already has that treatment. |
/azp run runtime-coreclr outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
bool isNotHandledByRuntime = | ||
(pFrame == FRAME_TOP) || | ||
(IsTopmostDebuggerU2MCatchHandlerFrame(pFrame) && !((DebuggerU2MCatchHandlerFrame*)pFrame)->CatchesAllExceptions()) | ||
bool reportUnhandledException = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment about DebuggerU2MCatchHandlerFrame above needs updating
FRAME_TYPE_NAME(DebuggerExitFrame) | ||
FRAME_TYPE_NAME(DebuggerU2MCatchHandlerFrame) | ||
FRAME_TYPE_NAME(ExceptionFilterFrame) | ||
FRAME_TYPE_NAME(UnhandledExceptionMarkerFrame) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the new Frame type need to be hooked up into cDac?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, I am wondering whether it would be better to have a managed code at the root of these threads so that the unhandled exception handling for them works the same way as exception handling for external threads. Would it allow us to get rid of this special casing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand how that would work and how it would work the same way as for external threads. For external threads, we assume that the thread will either catch it or the exception will be reported as c++ unhandled. So we let it flow into the native code. But we cannot do that for the unhandled exceptions that we want to report as such, because at the point we would reach some top level catch, the stack frames of the throwing location would be gone. So when we would break into the debugger or have a dump caused by the unhandled exception, there would be no evidence on why it happened.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is an unhandled external exception thrown on external thread (and the external thread does not have any catch in native code), we will end up reporting it like a regular unhandled managed exception as far as I can tell.
I have tried this: https://gist.github.com/jkotas/2d4f15056768c1b221a78684b4245fec
Console output:
Thread 24600 started.
Unhandled exception. System.Exception: My exception
at Program.ThreadFunction(IntPtr lpParameter) in C:\repro\Program.cs:line 74
Stacktrace in the crash dump is - notice that it includes ThreadFunction
where the exception was thrown originally:
kernel32!WerpReportFault+0xc5 [onecore\windows\feedback\core\faultrep\lib\faultrep.cpp @ 1384]
KERNELBASE!UnhandledExceptionFilter+0x34c [minkernel\kernelbase\xcpt.c @ 701]
ntdll!RtlpThreadExceptionFilter+0x2e [minkernel\ldr\rtlstrt.c @ 958]
ntdll!RtlUserThreadStart$filt$0+0x3f [minkernel\ldr\rtlstrt.c @ 1185]
ntdll!__C_specific_handler+0x93 [minkernel\crts\crtw32\misc\riscchandler.c @ 446]
ntdll!RtlpExecuteHandlerForException+0xf [minkernel\ntos\rtl\amd64\xcptmisc.asm @ 132]
ntdll!RtlDispatchException+0x437 [minkernel\ntos\rtl\amd64\exdsptch.c @ 680]
ntdll!RtlRaiseException+0x206 [minkernel\ntos\rtl\amd64\raise.c @ 240]
KERNELBASE!RaiseException+0x8a [minkernel\kernelbase\xcpt.c @ 954]
coreclr!SfiNext+0x1e5 [D:\a\_work\1\s\src\runtime\src\coreclr\vm\exceptionhandling.cpp @ 4048]
System_Private_CoreLib!System.Runtime.EH.DispatchEx+0x48e [_/src/runtime/src/coreclr/nativeaot/Runtime.Base/src/System/Runtime/ExceptionHandling.cs @ 900]
System_Private_CoreLib!System.Runtime.EH.RhThrowEx+0x2d [_/src/runtime/src/coreclr/nativeaot/Runtime.Base/src/System/Runtime/ExceptionHandling.cs @ 673]
coreclr!CallDescrWorkerInternal+0x83 [D:\a\_work\1\s\src\runtime\src\coreclr\vm\amd64\CallDescrWorkerAMD64.asm @ 74]
coreclr!DispatchCallSimple+0x66 [D:\a\_work\1\s\src\runtime\src\coreclr\vm\callhelpers.cpp @ 238]
coreclr!DispatchManagedException+0x1f9 [D:\a\_work\1\s\src\runtime\src\coreclr\vm\exceptionhandling.cpp @ 1665]
coreclr!IL_Throw+0x10e [D:\a\_work\1\s\src\runtime\src\coreclr\vm\jithelpers.cpp @ 852]
repro!Program.ThreadFunction+0x6b
repro!ILStubClass.IL_STUB_ReversePInvoke(Int64)+0x87
kernel32!BaseThreadInitThunk+0x17 [clientcore\base\win32\client\thread.c @ 77]
ntdll!RtlUserThreadStart+0x2c [minkernel\ldr\rtlstrt.c @ 1184]
I do not see what we are getting by special casing finalizer and other threads created by the runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying to understand why this is so complicated. If it is too complicated for no good reason and it is too risky to simplify for .NET 10, it is fine to postpone the simplification to .NET 11.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would not let the exception flow to the native caller on Unix. The idea is that the finalizer and other threads would start with a regular reverse P/Invoke transition (same way as on Native AOT). Unhandled exceptions stop at reverse P/Invoke transition on non-Windows. I believe that we preserve the stacktrace of the original fault when exception goes unhandled at reverse P/Invoke transition on Unix.
In other words, the main reason for this special casing seems to be that these few places do not enter the runtime via a regular reverse P/Invoke transition. They use a custom scheme that we need to compensate for.
I can believe that the diagnostic tests depend on implementation details and changing anything is going to break them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure how it would work for the Main method entry. We call the Main via CallDescrWorker. Where would we fit in the reverse pinvoke transition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Runtime C/C++ code would call the reverse PInvoke directly, without CallDescrWorker. The reverse PInvoke would take function pointer that points to Main method, some information about Main method signature (there are a few possible shapes - returning int vs. returning void, etc.), and the arguments to pass to main method.
The extra reverse PInvoke may impact diagnostic as discussed above. It would be show up in stacktraces, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related #117844 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your example works because we report the exception as unhandled in the EH and then re-raise it to flow into the native code to give the external native code a chance to catch it. If we didn't treat it as unhandled, the EH would currently unwind stack upto the native code frame and raise the exception from there. That's the way we use for CallDescrWorker
. But that way looses the managed frames that triggered the exception.
If we used the same mechanism as we currently use for re-raising the exception when we consider it unhandled, but wanted to do it for any reverse pinvoke transition without checking if there are managed frames on the stack on top of that, then we would have a problem. This exception is re-raised when there are all the managed frames upto the one that has thrown the exception still on the stack. So re-raising from that point requires us to make sure that when the exception calls ProcessCLRException
for all managed frames below the native frame, we don't do anything. That's currently done by marking the thread by TSNC_ProcessedUnhandledException
. But if the exception would end up flowing through the native frames uncaught and then reach managed code again, we want the ProcessCLRException
to start doing its job again past that point.
I am not trying to say we cannot do that, but it complicates things and will require some extra state to store the address of the boundary frame that would decide whether the ProcessCLRException
should ignore the exception for a given frame or not. And in case the exception will be caught in the native code between those blocks of managed code, the state will become stale and we would need to clean it up at a right time later.
|
||
#ifdef LOG_EXECUTABLE_ALLOCATOR_STATISTICS | ||
ExecutableAllocator::DumpHolderUsage(); | ||
ExecutionManager::DumpExecutionManagerUsage(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does ExecuteInDefaultAppDomain below need the same change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I've missed that one.
unhandledExceptionMarkerFrame.Push(pThread); | ||
} | ||
|
||
INSTALL_UNHANDLED_MANAGED_EXCEPTION_TRAP; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the unhandledExceptionMarkerFrame be coupled with INSTALL_UNHANDLED_MANAGED_EXCEPTION_TRAP
? They seem to be handling same scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the suggestion, it looks like it would make sense to put the frame creation / destruction into the INSTALL_UNHANDLED_MANAGED_EXCEPTION_TRAP
/ UNINSTALL_UNHANDLED_MANAGED_EXCEPTION_TRAP
Closing this since the known issues were handled by another PR. |
Currently, the code detecting whether an exception should be propagated into native frames and not reported as unhandled relies on presence of the DebuggerU2MFrame with "CatchAllExceptions" set. Recent fixes due to this marking missing causing problems revealed that it would be actually better to do the detection in an inverse manner. That means always propagate the exception to native frames unless there is an explicit marker indicating that the exception needs to be reported as unhandled. Such a marker should only mark the main entry point for an application and a managed thread entry. All other places should just let the exception flow into the native code.
This change fixes it by creating a new kind of explicit frame to mark the code paths that should report unhandled exception. It also removes recent fixes in this area that are no longer needed.