-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support frozen struct returns for Swift calls #99704
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
cc @dotnet/jit-contrib PTAL @amanasifkhalid (JIT parts) @jkoritzinsky @jkotas (VM thunk changes) I tested with 5000 tests both locally and in CI and they passed on both osx-arm64 and osx-x64. I reduced the set to 100 tests to be checked in. |
.macro TAILJMP_R10 | ||
.byte 0x49 | ||
.byte 0xFF | ||
.byte 0xE2 | ||
.endm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sequence has a superfluous rex prefix bit set to make sure the unwinder picks it up as an external tailcall:
runtime/src/coreclr/vm/amd64/excepamd64.cpp
Lines 368 to 378 in 2193467
else if (((TempOpcode & 0xf8) == AMD64_SIZE64_PREFIX) | |
&& (NextByte[1] == AMD64_JMP_IND_OP) | |
&& (NextByte[2] & 0x38) == AMD64_JMP_IND_RAX) | |
{ | |
// | |
// This is an indirect jump opcode: 0x48 0xff /4. The 64-bit | |
// flag (REX.W) is always redundant here, so its presence is | |
// overloaded to indicate a branch out of the function - a tail | |
// call. | |
// | |
// Such an opcode is an unambiguous epilogue indication. |
(I wasn't totally sure whether it's necessary or not for this thunk, but the existing TAILJMP_RAX
also has a superfluous rex prefix)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is Windows unwinder convention. I do not think it is applicable outside Windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll replace it with just jmp r10
. I guess the TAILJMP_RAX
uses in other non-Windows thunks can be cleaned up then.
Is there a difference between managed and native code here? The JIT is also generating these tailcalls with the extraneous rex prefixes. Can they be avoided outside Windows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regular CoreCLR uses Windows unwinder outside Windows, so the JIT has to follow the Windows unwinder conventions there as well.
If we cared enough, we can skip the prefix for native AOT. We use native unwinder for native AOT so Windows unwinder conventions are not applicable outside Windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JIT changes LGTM. Thanks!
assert((idx < ArrLen(swiftIntReturnRegs)) && (idx < ArrLen(swiftFloatReturnRegs))); | ||
unsigned intRegIdx = 0; | ||
unsigned floatRegIdx = 0; | ||
for (unsigned i = 0; i < idx; i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this isn't on a hot path, and we have at most 4 return registers for a Swift call, I'm guessing it's not worth caching the return registers used so we can skip this loop? Though we do call GetABIReturnReg
while looping over all return register indices in a few places...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we would keep all the registers and also the offsets as part of this type, from a simple code hygiene standpoint, but increasing its size increases the size of GenTreeCall
which increases the size of all large nodes. I think it would be good to clean this up to make the additional data for ReturnTypeDesc
allocated only when necessary, but that's a separate change.
I don't think there's anything to be concerned about wrt. throughput here.
{ | ||
const CORINFO_SWIFT_LOWERING* lowering = comp->GetSwiftLowering(clsHnd); | ||
assert(!lowering->byReference); | ||
assert(lowering->numLoweredElements <= MAX_RET_REG_COUNT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MAX_RET_REG_COUNT
is already 4 on ARM64, but it might be helpful to the reader to static_assert
that here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a static assert.
I wonder if we have a problem around Edit: I guess not since that helper call is just going to use the regular managed calling convention. |
/azp run runtime-coreclr jitstress, runtime-coreclr jitstressregs, runtime-coreclr jitstress2-jitstressregs |
Azure Pipelines successfully started running 3 pipeline(s). |
The current design of CORINFO_HELP_PINVOKE_CALLI and vasig cookies is suboptimal. It leaves perf on the table and it is unfriendly to native AOT. We use a different approach in native AOT (look for If you are running into troubles with |
I'll keep that in mind. I don't think we're going to see issues around it, however. |
Adds support for pinvokes to Swift functions that return frozen structs in multiple registers. This turned out to be simpler than I expected on the JIT side; there is a small change necessary in
genMultiRegStoreToLocal
to take into account that the Swift fields are going into offsets that don't necessarily correspond to the register sizes (we already DNER the cases where things don't work out, it seems).Also adds 100 tests.
The support is complicated by the fact that Swift calls take the ret buffer in
rax
on x64. This requires some VM side changes to avoid usingrax
in the interop thunks.