Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support frozen struct returns for Swift calls #99704

Merged
merged 20 commits into from
Mar 14, 2024

Conversation

jakobbotsch
Copy link
Member

@jakobbotsch jakobbotsch commented Mar 13, 2024

Adds support for pinvokes to Swift functions that return frozen structs in multiple registers. This turned out to be simpler than I expected on the JIT side; there is a small change necessary in genMultiRegStoreToLocal to take into account that the Swift fields are going into offsets that don't necessarily correspond to the register sizes (we already DNER the cases where things don't work out, it seems).

Also adds 100 tests.

The support is complicated by the fact that Swift calls take the ret buffer in rax on x64. This requires some VM side changes to avoid using rax in the interop thunks.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 13, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@jakobbotsch jakobbotsch reopened this Mar 13, 2024
@jakobbotsch jakobbotsch marked this pull request as ready for review March 13, 2024 22:34
@jakobbotsch
Copy link
Member Author

cc @dotnet/jit-contrib PTAL @amanasifkhalid (JIT parts) @jkoritzinsky @jkotas (VM thunk changes)

FYI @kotlarmilos @matouskozak

I tested with 5000 tests both locally and in CI and they passed on both osx-arm64 and osx-x64. I reduced the set to 100 tests to be checked in.

Comment on lines 89 to 93
.macro TAILJMP_R10
.byte 0x49
.byte 0xFF
.byte 0xE2
.endm
Copy link
Member Author

@jakobbotsch jakobbotsch Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sequence has a superfluous rex prefix bit set to make sure the unwinder picks it up as an external tailcall:

else if (((TempOpcode & 0xf8) == AMD64_SIZE64_PREFIX)
&& (NextByte[1] == AMD64_JMP_IND_OP)
&& (NextByte[2] & 0x38) == AMD64_JMP_IND_RAX)
{
//
// This is an indirect jump opcode: 0x48 0xff /4. The 64-bit
// flag (REX.W) is always redundant here, so its presence is
// overloaded to indicate a branch out of the function - a tail
// call.
//
// Such an opcode is an unambiguous epilogue indication.

(I wasn't totally sure whether it's necessary or not for this thunk, but the existing TAILJMP_RAX also has a superfluous rex prefix)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is Windows unwinder convention. I do not think it is applicable outside Windows.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll replace it with just jmp r10. I guess the TAILJMP_RAX uses in other non-Windows thunks can be cleaned up then.

Is there a difference between managed and native code here? The JIT is also generating these tailcalls with the extraneous rex prefixes. Can they be avoided outside Windows?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regular CoreCLR uses Windows unwinder outside Windows, so the JIT has to follow the Windows unwinder conventions there as well.

If we cared enough, we can skip the prefix for native AOT. We use native unwinder for native AOT so Windows unwinder conventions are not applicable outside Windows.

Copy link
Member

@amanasifkhalid amanasifkhalid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JIT changes LGTM. Thanks!

assert((idx < ArrLen(swiftIntReturnRegs)) && (idx < ArrLen(swiftFloatReturnRegs)));
unsigned intRegIdx = 0;
unsigned floatRegIdx = 0;
for (unsigned i = 0; i < idx; i++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this isn't on a hot path, and we have at most 4 return registers for a Swift call, I'm guessing it's not worth caching the return registers used so we can skip this loop? Though we do call GetABIReturnReg while looping over all return register indices in a few places...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we would keep all the registers and also the offsets as part of this type, from a simple code hygiene standpoint, but increasing its size increases the size of GenTreeCall which increases the size of all large nodes. I think it would be good to clean this up to make the additional data for ReturnTypeDesc allocated only when necessary, but that's a separate change.

I don't think there's anything to be concerned about wrt. throughput here.

{
const CORINFO_SWIFT_LOWERING* lowering = comp->GetSwiftLowering(clsHnd);
assert(!lowering->byReference);
assert(lowering->numLoweredElements <= MAX_RET_REG_COUNT);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAX_RET_REG_COUNT is already 4 on ARM64, but it might be helpful to the reader to static_assert that here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a static assert.

@jakobbotsch
Copy link
Member Author

jakobbotsch commented Mar 14, 2024

I wonder if we have a problem around CORINFO_HELP_PINVOKE_CALLI, in particular if we end up using a jump stub to call it that clobbers rax. We may need to ensure the JIT materializes its full address in the codegen for Swift calls with return buffers.

Edit: I guess not since that helper call is just going to use the regular managed calling convention.

@jakobbotsch
Copy link
Member Author

/azp run runtime-coreclr jitstress, runtime-coreclr jitstressregs, runtime-coreclr jitstress2-jitstressregs

Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@jakobbotsch jakobbotsch merged commit 0ea9b8f into dotnet:main Mar 14, 2024
187 of 217 checks passed
@jakobbotsch jakobbotsch deleted the swift-multireg-returns branch March 14, 2024 19:23
@jkotas
Copy link
Member

jkotas commented Mar 14, 2024

I wonder if we have a problem around CORINFO_HELP_PINVOKE_CALLI

The current design of CORINFO_HELP_PINVOKE_CALLI and vasig cookies is suboptimal. It leaves perf on the table and it is unfriendly to native AOT. We use a different approach in native AOT (look for convertPInvokeCalliToCall).

If you are running into troubles with CORINFO_HELP_PINVOKE_CALLI, it would be best to get rid of it and change the EE to generate a stub with slightly different shape, same as what we do in native AOT.

@jakobbotsch
Copy link
Member Author

The current design of CORINFO_HELP_PINVOKE_CALLI and vasig cookies is suboptimal. It leaves perf on the table and it is unfriendly to native AOT. We use a different approach in native AOT (look for convertPInvokeCalliToCall).

If you are running into troubles with CORINFO_HELP_PINVOKE_CALLI, it would be best to get rid of it and change the EE to generate a stub with slightly different shape, same as what we do in native AOT.

I'll keep that in mind. I don't think we're going to see issues around it, however.

@github-actions github-actions bot locked and limited conversation to collaborators Apr 14, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants