-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix new EH hang on DebugBreak #112640
Fix new EH hang on DebugBreak #112640
Conversation
The new exception handling doesn't work well with the DebugBreak in some cases. E.g. when it is invoked from FATAL_GC_ERROR. The new EH attempts to handle the STATUS_BREAKPOINT stemming from the DebugBreak, allocate a managed exception object and hangs since it cannot do that when the GC is running. The cause is a missing check for the breakpoint exception in the ProcessCLRExceptionNew that is present in the old ProcessCLRException. To fix it, I've copied that code to the ProcessCLRExceptionNew. Close dotnet#112599
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot wasn't able to review any files in this pull request.
Files not reviewed (1)
- src/coreclr/vm/exceptionhandling.cpp: Language not supported
@jkotas I've also found that the presence of the DebugBreak in the FATAL_GC_ERROR prevents the following assert to occur, since the process exits on the DebugBreak. The same happens with the legacy EH. So it might make sense to remove this DebugBreak from the FATAL_GC_ERROR. |
Is this actually fixing the problem with debug breaks in the GC? If I am reading the code correctly, (I understand that this change makes the new EH same as the legacy EH.)
IIRC, I wanted to do that at one point, but @Maoni0 wanted to keep it since it makes GC crashes more diagnosable. Calling the fatal error handler overwrites values of volatile registers they may contain important clues for the diagnosing the crash. I think we should make sure that the behavior of breakpoint instructions that are statically compiled into coreclr.dll is reasonable without debugger attached. GC is not the only place where we have them. Would it make sense to intercept the breakpoint exceptions only when |
Yes, it is. The instruction that triggers the STATUS_BREAKPOINT is in the DebugBreak() that is in the kernel32.dll, so it is evaluated as external. Only if we had the breakpoint instruction directly in coreclr.dll, it would be evaluated as internal. |
Maybe, let me see how it would behave |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The instruction that triggers the STATUS_BREAKPOINT is in the DebugBreak() that is in the kernel32.dll, so it is evaluated as external.
Ah ok, I have realized that.
CORDebuggerAttached
BTW: The managed debuggers should not need to intercept breakpoints in the native runtime once we complete the cleanup of place where we depend on it today (it breaks managed debugging on macOS Arm64 - it is not possible to set breakpoints in statically compiled code on macOS Arm64 unless you are a proper out-of-proc debugger that managed debuggers are not). #110677 may be the last remaining change to finish this cleanup.
Actually, thinking about it more, does it even make sense to handle the breakpoint using the managed exception handling code at all? It doesn't seem so. So I've tried to just return ExceptionContinueSearch when STATUS_BREAKPOINT or STATUS_SINGLE_STEP are encountered without checking for where it came from. The dotnet-diagnostictests EH tests are passing fine with it that way. |
That sounds even better. |
src/coreclr/vm/exceptionhandling.cpp
Outdated
// If the exception is a breakpoint, but outside of the runtime or managed code, | ||
// let it go. It is not ours, so someone else will handle it, or we'll see | ||
// it again as an unhandled exception. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment correct? I do not think we know that the breakpoint is outside of the runtime or managed code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, forgotten to remove this comment together with the other stuff.
11083aa
to
afeb7bc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
* main: (27 commits) Fold null checks against known non-null values (dotnet#109164) JIT: Always track the context for late devirt (dotnet#112396) JIT: array allocation fixes (dotnet#112676) [H/3] Fix test closing connection too fast (dotnet#112691) Fix LINQ handling of iterator.Take(...).Last(...) (dotnet#112680) [browser][MT] move wasm MT CI legs to extra-platforms (dotnet#112690) JIT: Don't use `Compiler::compFloatingPointUsed` to check if FP kills are needed (dotnet#112668) [LoongArch64] Fix a typo within PR#112166. (dotnet#112672) Fix new EH hang on DebugBreak (dotnet#112640) Use encode callback instead of renting a buffer to write to in DSAKeyFormatHelper Move some links to other doc (dotnet#112574) Reflection-based XmlSerializer - Deserialize empty collections and allow for sub-types in collection items. (dotnet#111723) JIT: Use `fgCalledCount` for OSR method entry weight (dotnet#112662) Use Avx10.2 Instructions in Floating Point Conversions (dotnet#111775) Expose StressLog via CDAC and port StressLogAnalyzer to managed code (dotnet#104999) JIT: Use linear block order for MinOpts in LSRA (dotnet#108147) Update dependencies from https://github.com/dotnet/arcade build 20250213.2 (dotnet#112625) JIT: Clean up and optimize call arg lowering (dotnet#112639) Update dependencies from https://github.com/dotnet/emsdk build 20250217.1 (dotnet#112645) JIT: Support `FIELD_LIST` for returns (dotnet#112308) ...
The new exception handling doesn't work well with the DebugBreak in some cases. E.g. when it is invoked from FATAL_GC_ERROR. The new EH attempts to handle the STATUS_BREAKPOINT stemming from the DebugBreak, allocate a managed exception object and hangs since it cannot do that when the GC is running.
The cause is a missing check for the breakpoint exception in the ProcessCLRExceptionNew that is present in the old ProcessCLRException. To fix it, I've copied that code to the ProcessCLRExceptionNew.
Close #112599