Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix new EH hang on DebugBreak #112640

Merged
merged 2 commits into from
Feb 19, 2025
Merged

Fix new EH hang on DebugBreak #112640

merged 2 commits into from
Feb 19, 2025

Conversation

janvorli
Copy link
Member

The new exception handling doesn't work well with the DebugBreak in some cases. E.g. when it is invoked from FATAL_GC_ERROR. The new EH attempts to handle the STATUS_BREAKPOINT stemming from the DebugBreak, allocate a managed exception object and hangs since it cannot do that when the GC is running.

The cause is a missing check for the breakpoint exception in the ProcessCLRExceptionNew that is present in the old ProcessCLRException. To fix it, I've copied that code to the ProcessCLRExceptionNew.

Close #112599

The new exception handling doesn't work well with the DebugBreak in some
cases. E.g. when it is invoked from FATAL_GC_ERROR. The new EH attempts
to handle the STATUS_BREAKPOINT stemming from the DebugBreak, allocate a
managed exception object and hangs since it cannot do that when the GC
is running.

The cause is a missing check for the breakpoint exception in the
ProcessCLRExceptionNew that is present in the old ProcessCLRException.
To fix it, I've copied that code to the ProcessCLRExceptionNew.

Close dotnet#112599
@janvorli janvorli requested a review from jkotas February 17, 2025 23:05
@janvorli janvorli self-assigned this Feb 17, 2025
@Copilot Copilot bot review requested due to automatic review settings February 17, 2025 23:05

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review any files in this pull request.

Files not reviewed (1)
  • src/coreclr/vm/exceptionhandling.cpp: Language not supported
@janvorli
Copy link
Member Author

@jkotas I've also found that the presence of the DebugBreak in the FATAL_GC_ERROR prevents the following assert to occur, since the process exits on the DebugBreak. The same happens with the legacy EH. So it might make sense to remove this DebugBreak from the FATAL_GC_ERROR.

@jkotas
Copy link
Member

jkotas commented Feb 18, 2025

Close #112599

Is this actually fixing the problem with debug breaks in the GC? If I am reading the code correctly, fExternalException is going to be false for debug break in (non-standalone) GC and we take the same path as before.

(I understand that this change makes the new EH same as the legacy EH.)

it might make sense to remove this DebugBreak from the FATAL_GC_ERROR.

IIRC, I wanted to do that at one point, but @Maoni0 wanted to keep it since it makes GC crashes more diagnosable. Calling the fatal error handler overwrites values of volatile registers they may contain important clues for the diagnosing the crash.

I think we should make sure that the behavior of breakpoint instructions that are statically compiled into coreclr.dll is reasonable without debugger attached. GC is not the only place where we have them.

Would it make sense to intercept the breakpoint exceptions only when CORDebuggerAttached is true?

@janvorli
Copy link
Member Author

Is this actually fixing the problem with debug breaks in the GC?

Yes, it is. The instruction that triggers the STATUS_BREAKPOINT is in the DebugBreak() that is in the kernel32.dll, so it is evaluated as external. Only if we had the breakpoint instruction directly in coreclr.dll, it would be evaluated as internal.

@janvorli
Copy link
Member Author

Would it make sense to intercept the breakpoint exceptions only when CORDebuggerAttached is true?

Maybe, let me see how it would behave

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instruction that triggers the STATUS_BREAKPOINT is in the DebugBreak() that is in the kernel32.dll, so it is evaluated as external.

Ah ok, I have realized that.

CORDebuggerAttached

BTW: The managed debuggers should not need to intercept breakpoints in the native runtime once we complete the cleanup of place where we depend on it today (it breaks managed debugging on macOS Arm64 - it is not possible to set breakpoints in statically compiled code on macOS Arm64 unless you are a proper out-of-proc debugger that managed debuggers are not). #110677 may be the last remaining change to finish this cleanup.

@janvorli
Copy link
Member Author

Actually, thinking about it more, does it even make sense to handle the breakpoint using the managed exception handling code at all? It doesn't seem so. So I've tried to just return ExceptionContinueSearch when STATUS_BREAKPOINT or STATUS_SINGLE_STEP are encountered without checking for where it came from. The dotnet-diagnostictests EH tests are passing fine with it that way.

@jkotas
Copy link
Member

jkotas commented Feb 18, 2025

That sounds even better.

Comment on lines 949 to 951
// If the exception is a breakpoint, but outside of the runtime or managed code,
// let it go. It is not ours, so someone else will handle it, or we'll see
// it again as an unhandled exception.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment correct? I do not think we know that the breakpoint is outside of the runtime or managed code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, forgotten to remove this comment together with the other stuff.

@janvorli janvorli force-pushed the fix-debugbreak-hang branch from 11083aa to afeb7bc Compare February 18, 2025 23:56
Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@jkotas jkotas merged commit 8192afa into dotnet:main Feb 19, 2025
94 checks passed
grendello added a commit to grendello/runtime that referenced this pull request Feb 19, 2025
* main: (27 commits)
  Fold null checks against known non-null values (dotnet#109164)
  JIT: Always track the context for late devirt (dotnet#112396)
  JIT: array allocation fixes (dotnet#112676)
  [H/3] Fix test closing connection too fast (dotnet#112691)
  Fix LINQ handling of iterator.Take(...).Last(...) (dotnet#112680)
  [browser][MT] move wasm MT CI legs to extra-platforms (dotnet#112690)
  JIT: Don't use `Compiler::compFloatingPointUsed` to check if FP kills are needed (dotnet#112668)
  [LoongArch64] Fix a typo within PR#112166. (dotnet#112672)
  Fix new EH hang on DebugBreak (dotnet#112640)
  Use encode callback instead of renting a buffer to write to in DSAKeyFormatHelper
  Move some links to other doc (dotnet#112574)
  Reflection-based XmlSerializer - Deserialize empty collections and allow for sub-types in collection items. (dotnet#111723)
  JIT: Use `fgCalledCount` for OSR method entry weight (dotnet#112662)
  Use Avx10.2 Instructions in Floating Point Conversions (dotnet#111775)
  Expose StressLog via CDAC and port StressLogAnalyzer to managed code (dotnet#104999)
  JIT: Use linear block order for MinOpts in LSRA (dotnet#108147)
  Update dependencies from https://github.com/dotnet/arcade build 20250213.2 (dotnet#112625)
  JIT: Clean up and optimize call arg lowering (dotnet#112639)
  Update dependencies from https://github.com/dotnet/emsdk build 20250217.1 (dotnet#112645)
  JIT: Support `FIELD_LIST` for returns (dotnet#112308)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FATAL_GC_ERROR produces hard to diagnose hangs or crashes
2 participants