-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove extra UnmanagedCallersOnly overhead on x86 #46238
Remove extra UnmanagedCallersOnly overhead on x86 #46238
Conversation
…he correct argument order and register usage on x86.
… to do argument shuffling.
…helper to match custom x86 thunk. Fixes dotnet#46177
…ve Windows x86 unmanaged callers only to not have extra overhead and put reverse P/Invoke stubs for Windows x86 on the UnmanagedCallersOnly plan.
…ntime; branch 'master' of github.com:dotnet/runtime into x86-reverse-stub-cleanup
…llersOnly has been simplified.
…emitted unmanaged thunk stub, we need to handle the x86 differences for copy-constructed parameters in the IL stub.
I've started taking a look at the test failures. The Vector2_3_4 test failure is due to
However, these instructions are incorrect. The actually emitted instructions are:
As a result, the return value isn't put into the correct registers and the test fails. We've had a number of these types of bugs when doing this work for x86. @sandreenko can you help out with fixing this one after you fix the bug I created the last time I tried to fix this type of bug? |
This reverts commit 423f70e.
…ate a new stack frame.
I've validated as of the last commit (69e14c6) that there are still no JIT diffs on x86. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jit changes LGTM with 1 qustion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
The one final thing I would like to confirm validation for is the IJW scenario. I would think that we have existing tests for this but want to confirm that to ensure we don't give any surprises to our WPF friends.
I've validated that this PR passes all of the C++/CLI tests that the C++ team runs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…l instead of the entry point and call TraceCall from all JIT_ReversePInvokeEnter* helpers.
Merged in 5ec0c7a GitHub messed up and didn’t mark the PR as merged. I’ll close the PR. |
* Implement emitting an unmanaged calling convention entry point with the correct argument order and register usage on x86. * Move Unix x86 to the UnmanagedCallersOnly plan now that we don't need to do argument shuffling. * Add SEH hookup and profiler/debugger hooks to Reverse P/Invoke entry helper to match custom x86 thunk. Fixes dotnet#46177 * Remove Windows x86 assembly stub for individual reverse p/invokes. Move Windows x86 unmanaged callers only to not have extra overhead and put reverse P/Invoke stubs for Windows x86 on the UnmanagedCallersOnly plan. * Further cleanup * Remove extraneous UnmanagedCallersOnly block now that x86 UnmanagedCallersOnly has been simplified. * Undo ArgOrder size specifier since it isn't needed and it doesn't work. * Fix copy constructor reverse marshalling. Now that we don't have the emitted unmanaged thunk stub, we need to handle the x86 differences for copy-constructed parameters in the IL stub. * Fix version guid syntax. * Remove FastNExportHandler. * Revert "Remove FastNExportHandler." This reverts commit 423f70e. * Fix setting up entry frame for new thread. * Allow the NExportSEH record to live below ESP so we don't need to create a new stack frame. * Fix formatting. * Assign an offset for the return buffer on x86 since it might come in on the stack. * Make sure we use the TC block we just put in on x86 as well. * Shrink the ReversePInvokeFrame on non-x86 back to master's size. * Fix arch-specific R2R constant. * Pass the return address of the ReversePInvokeEnter helper to TraceCall instead of the entry point and call TraceCall from all JIT_ReversePInvokeEnter* helpers. * Fix ILVerification and ILVerify * fix R2R constants for crossgen1 * Don't assert ReversePInvokeFrame size for cross-bitness scenarios.
// As a result, we specially decorate this method to have the correct calling convention | ||
// and argument ordering for an HCALL, but we don't use the HCALL macros and contracts | ||
// since this method doesn't follow the contracts. | ||
void F_CALL_CONV HCCALL3(JIT_ReversePInvokeEnter, ReversePInvokeFrame* frame, CORINFO_METHOD_HANDLE handle, void* secretArg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have not addressed my comment about additional performance overhead of passing the extra method handle and secret arg arguments. It is introducing performance regression, it is breaking change for R2R (the helper has different signature now), and this design does not work for NativeAOT.
We should be paying this overhead only when somebody turns on the specific profiling notifications (they is rare and slow by design):
- Add a flag to JIT/EE interface that controls whether
- Keep the existing helper signature as is, introduce a new helper that takes these arguments
- Call the new helper from JIT/EE interface only when the new flag is introduced
Could you please fix that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #47223
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will take a look at this ASAP.
{ | ||
_ASSERTE(frame != NULL && handle != NULL); | ||
|
||
void* traceAddr = _ReturnAddress(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not need to fetch the return address upfront. It is only needed on the slow path when we are calling the rare helper.
This PR teaches the JIT to understand how to emit a method with a native x86 calling convention and then utilizes that feature to remove the argument shuffling stubs for both Windows and Unix x86 Reverse P/Invoke thunks.
As part of this work, the JIT helpers for Reverse P/Invokes now report transitions to the profiler and the debugger as the x86 Reverse P/Invoke thunk did.
This allows us to remove the rest of the overhead that x86 has for Reverse P/Invokes and UnmanagedCallersOnly methods in comparison to other platforms.
Since I changed the signature of the JIT_ReversePInvokeEnter helper, I tentatively changed the JIT-EE interface ID. If we don't need to do so for this change, we can revert that part of this change.
I've validated that there are no crossgen diffs. The results from a framework crossgen JIT diff on Windows x86 are included below.
Fixes #33582
Fixes #46177
Fixes #47037