Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port NativeAOT exception handling to CoreCLR #88034

Merged
merged 5 commits into from
Aug 24, 2023

Conversation

janvorli
Copy link
Member

This change ports NativeAOT exception handling to CoreCLR, resulting in 3.5..4 times speedup in exception handling. Due to time constraints and various complexities, thread abort and debugger support is not completed yet, so this change is enabled only when DOTNET_EnableNewExceptionHandling env variable is set. By default, the old way is used.
This change supports all OSes and targets we support except of x86 Windows. That may be doable too in the future, but the difference in exception handling on x86 Windows adds quite a lot of complexity into the picture.

Notes for the PR:

  • I have left the ExceptionHandling.cs and StackFrameIterator.cs in the nativeaot folder to simplify the review. I can move it to some common location after the change is reviewed. Also it was not clear to me where that should be, so advise would be welcome here.
  • Naming of the native helpers like RhpCallCatchFunclet was left the same as in the NativeAOT for now.
  • There are still some little things I'd like to eventually clean up, like ExInfo encapsulation and possibly moving REGDISPLAY and CONTEXT it uses into the ExInfo itself or moving debug members of StackFrameIterator and REGDISPLAY to the end of those structures so that the AsmOffsets.cs can be simplified. It also may be possible to unify the exception handling callback that's used for ObjectiveC to use the managed version. I've tried and there were some ugly complications, so I've left it separated.
  • There are two bug fixes for bugs unrelated to this PR and a removal of unused parameter in existing code that could be made as separate PRs before this PR.
    • ProfilerEnter and ProfilerLeave for the case of UnmanagedCallersOnly method were being called in preemptive mode.
    • NativeAOT code for rethrowing exception was incorrectly calling DispatchEx with last argument set to activeExInfo._idxCurClause to start at the last clause processed when the rethrown exception was originally thrown instead of starting from the first one again. I have a accidentally came with a simple test that discovered this bug and causes failures in the original NativeAOT too.
  • Changes in the stackwalk.cpp add support for
    • Usage of ExInfo instead of ExceptionTracker
    • Handling of case when GC runs while finally funclet is on the stack and then again when the code is back in the new exception handling code in managed code before other finally or catch funclet is called. The NativeAOT solves that by disabling GC for the 2nd pass of EH, for this change it would not be reasonable.
    • Handling the GC reporting when funclet is found while walking the stack. It needs to scan frames of the managed code that handles the exception too, since it contains live references. The old EH way doesn't have this case.
  • I needed to add GCFrame::Remove method that can remove the GCFrame from any location in the chain. There is a managed runtime method that calls GCReporting::Unregister that was popping it with my changes out of order due to the exception handling code being managed.

@jkotas jkotas requested a review from davidwrighton June 26, 2023 17:26
@danmoseley
Copy link
Member

How do the perf improvements compare between OS?

In a typical workload, would we expect this to only be noticeable in "exception storms" (e.g. due loss of connectivity)?

@jkotas
Copy link
Member

jkotas commented Jun 28, 2023

How do the perf improvements compare between OS?

#77568 (comment)

@janvorli
Copy link
Member Author

janvorli commented Jun 28, 2023

I have found (by running a separate testing PR (#88113) just for the NativeAOT fix for rethrowing, that that fix is not correct in all cases. So clearly the last argument of the DispatchEx needs to be sometimes activeExInfo._idxCurClause and sometimes MaxTryRegionIdx. After debugging the issue that lead me to change this, I was convinced that it should always be MaxTryRegionIdx, which means to start at the first clause. So I'll need to debug the case failing in the PR to see why it is not the case here.
For the current PR, I am going to revert this change. The failure I have seen (exception going unhandled) was not happening in any of the tests we have, I was just lucky to create a test myself that was failing without the change for both NativeAOT and the new EH.

The test I have created is as simple as this. The rethrown exception is unhandled. When I comment out the throw; and uncomment the `throw new ArgumentException("aE");', it works.

class Program
{
    static void Main(string[] args)
    {
        try
        {
            throw new Exception("boo");
        }
        catch (Exception ex2)
        {
            Console.WriteLine($"1: {ex2}");
            try
            {
                throw;
                //throw new ArgumentException("aE");
            }
            catch (Exception ex3)
            {
                Console.WriteLine($"2: {ex3}");
            }
        }
    }
}
When running result of dotnet publish -c Release -p:PublishAot=true
1: System.Exception: boo
   at Program.Main(String[]) + 0x4c
Unhandled Exception: System.Exception: boo
   at Program.Main(String[]) + 0x151
   at failingnativeaot!<BaseAddress>+0x16bc2b

@janvorli
Copy link
Member Author

Fortunately, I have found what was causing the NativeAOT test suite to fail with my rethrow fix. It was just a problem of stack trace, the actual fix is correct.

@janvorli
Copy link
Member Author

I have created a couple of PRs to separate bug fixes and cleanups unrelated to this PR.

src/coreclr/vm/amd64/cgencpu.h Outdated Show resolved Hide resolved
src/coreclr/vm/eetwain.cpp Show resolved Hide resolved
src/coreclr/vm/frames.cpp Show resolved Hide resolved
src/coreclr/vm/exceptionhandling.h Show resolved Hide resolved
src/coreclr/vm/exceptmacros.h Outdated Show resolved Hide resolved
src/coreclr/vm/exinfo.cpp Show resolved Hide resolved
src/coreclr/vm/proftoeeinterfaceimpl.cpp Outdated Show resolved Hide resolved
@janvorli janvorli force-pushed the port-nativeaot-eh-to-coreclr-final-2 branch from a96c6a9 to 57a7e78 Compare June 28, 2023 21:36
@janvorli janvorli force-pushed the port-nativeaot-eh-to-coreclr-final-2 branch from 57a7e78 to 6db013d Compare June 29, 2023 07:30
@janvorli janvorli force-pushed the port-nativeaot-eh-to-coreclr-final-2 branch from c77f189 to ee3f998 Compare July 26, 2023 21:05
EEToProfilerExceptionInterfaceWrapper::ExceptionCatcherLeave();
}
}
// TODO: add the m_EHClauseInfo for profiler
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this TODO about?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be a TODO-NewEH

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I still plan to add a number of TODO-NewEHs to this PR. As for this TODO, the comment was related to the fact that the ExceptionTracker::MakeCallbacksRelatedToHandler, which is the old EH variant of this function, calls m_EHClauseInfo.ResetInfo() and IIRC, profiler stuff somehow relied on more stuff from the m_EHClauseInfo info than the stuff I've extracted from it and use in the new EH.

This change ports NativeAOT exception handling to CoreCLR, resulting in
3.5..4 times speedup in exception handling. Due to time constraints and
various complexities, thread abort and debugger support is not
completed yet, so this change is enabled only when
`DOTNET_EnableNewExceptionHandling` env variable is set. By default,
the old way is used.
This change supports all OSes and targets we support except of x86
Windows. That may be doable too in the future, but the difference in
exception handling on x86 Windows adds quite a lot of complexity into
the picture.

Notes for the PR:
* I have left the `ExceptionHandling.cs` and `StackFrameIterator.cs` in
  the nativeaot folder to simplify the review. I can move it to some
  common location after the change is reviewed. Also it was not clear to
  me where that should be, so advise would be welcome here.
* Naming of the native helpers like `RhpCallCatchFunclet` was left the
  same as in the NativeAOT for now.
* There are still some little things I'd like to eventually clean up,
  like `ExInfo` encapsulation and possibly moving `REGDISPLAY` and
  `CONTEXT` it uses into the `ExInfo` itself or moving debug members of
  `StackFrameIterator` and `REGDISPLAY` to the end of those structures
  so that the `AsmOffsets.cs` can be simplified. It also may be possible
  to unify the exception handling callback that's used for ObjectiveC to
  use the managed version. I've tried and there were some ugly
  complications, so I've left it separated.
* There are two bug fixes for bugs unrelated to this PR and a removal of
  unused parameter in existing code that could be made as separate PRs
  before this PR.
  * `ProfilerEnter` and `ProfilerLeave` for the case of
    `UnmanagedCallersOnly` method were being called in preemptive mode.
  * NativeAOT code for rethrowing exception was incorrectly calling
    `DispatchEx` with last argument set to `activeExInfo._idxCurClause`
    to start at the last clause processed when the rethrown exception
    was originally thrown instead of starting from the first one again.
    I have a accidentally came with a simple test that discovered this
    bug and causes failures in  the original NativeAOT too.
* Changes in  the stackwalk.cpp add support for
  * Usage of `ExInfo` instead of `ExceptionTracker`
  * Handling of case when GC runs while finally funclet is on the stack
    and then again when the code is back in the new exception handling
    code in managed code before other finally or catch funclet is
    called. The NativeAOT solves that by disabling GC for the 2nd pass
    of EH, for this change it would not be reasonable.
  * Handling the GC reporting when funclet is found while walking the
    stack. It needs to scan frames of the managed code that handles the
    exception too,  since it contains live references. The old EH way
    doesn't have this case.
* I needed to add `GCFrame::Remove` method that can remove the `GCFrame`
  from any location in the chain. There is a managed runtime method that
  calls `GCReporting::Unregister` that was popping it with my changes
  out of order due to the exception handling code being managed.

Fix context initialization after rebase
The `UNINSTALL_UNWIND_AND_CONTINUE_HANDLER_NO_PROBE` in the
`EE_TO_JIT_TRANSITION` needs to rethrow an exception (if any) using native
exception handling mechanism instead of calling the new managed
exception handling, as the native exception needs to propagate through
some native code layers from there.
This change adds parameter to the
`UNINSTALL_UNWIND_AND_CONTINUE_HANDLER_NO_PROBE` macro to select whether
to rethrow the exception as native or to invoke the new managed
exception handling.

This problem didn't show up until I ran the coreclr tests with tiered
compilation disabled.
@janvorli janvorli force-pushed the port-nativeaot-eh-to-coreclr-final-2 branch from c3824e1 to 28f603b Compare August 22, 2023 13:06
@janvorli
Copy link
Member Author

@jkotas I believe I have addressed all of the feedback. Can you please take a look to see if you have any other comments or if it can be merged? The CI failures are unrelated.

@jkotas
Copy link
Member

jkotas commented Aug 23, 2023

The UNINSTALL_UNWIND_AND_CONTINUE_HANDLER_NO_PROBE in the
EE_TO_JIT_TRANSITION needs to rethrow an exception (if any) using native
exception handling mechanism instead of calling the new managed
exception handling, as the native exception needs to propagate through
some native code layers from there.

Is JIT/EE interface really the only place that needs this change? I went over all UNINSTALL_UNWIND_AND_CONTINUE_HANDLER occurences and it seems that some of them need this fix too. For example, this one: https://github.com/dotnet/runtime/blob/main/src/coreclr/vm/excep.cpp#L6055-L6057

@janvorli
Copy link
Member Author

Is JIT/EE interface really the only place that needs this change? I went over all UNINSTALL_UNWIND_AND_CONTINUE_HANDLER occurences and it seems that some of them need this fix too

That's a good point, I'll review the usage. The place that you've mentioned definitely needs it.

There were three places where the UNINSTALL_UNWIND_AND_CONTINUE_HANDLER
needed to be replaced by
UNINSTALL_UNWIND_AND_CONTINUE_HANDLER_NO_PROBE(true).
src/coreclr/vm/excep.cpp Outdated Show resolved Hide resolved
To INSTALL_UNWIND_AND_CONTINUE_HANDLER_EX, as the old name is obsolete
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants