Support frozen struct returns for Swift calls #99704

jakobbotsch · 2024-03-13T18:09:52Z

Adds support for pinvokes to Swift functions that return frozen structs in multiple registers. This turned out to be simpler than I expected on the JIT side; there is a small change necessary in genMultiRegStoreToLocal to take into account that the Swift fields are going into offsets that don't necessarily correspond to the register sizes (we already DNER the cases where things don't work out, it seems).

Also adds 100 tests.

The support is complicated by the fact that Swift calls take the ret buffer in rax on x64. This requires some VM side changes to avoid using rax in the interop thunks.

dotnet-policy-service · 2024-03-13T18:10:37Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

jakobbotsch · 2024-03-13T22:36:43Z

cc @dotnet/jit-contrib PTAL @amanasifkhalid (JIT parts) @jkoritzinsky @jkotas (VM thunk changes)

FYI @kotlarmilos @matouskozak

I tested with 5000 tests both locally and in CI and they passed on both osx-arm64 and osx-x64. I reduced the set to 100 tests to be checked in.

jakobbotsch · 2024-03-13T22:40:35Z

src/coreclr/pal/inc/unixasmmacrosamd64.inc

+.macro TAILJMP_R10
+        .byte 0x49
+        .byte 0xFF
+        .byte 0xE2
+.endm


This sequence has a superfluous rex prefix bit set to make sure the unwinder picks it up as an external tailcall:

runtime/src/coreclr/vm/amd64/excepamd64.cpp

Lines 368 to 378 in 2193467

else if (((TempOpcode & 0xf8) == AMD64_SIZE64_PREFIX)

&& (NextByte[1] == AMD64_JMP_IND_OP)

&& (NextByte[2] & 0x38) == AMD64_JMP_IND_RAX)

{

//

// This is an indirect jump opcode: 0x48 0xff /4. The 64-bit

// flag (REX.W) is always redundant here, so its presence is

// overloaded to indicate a branch out of the function - a tail

// call.

//

// Such an opcode is an unambiguous epilogue indication.

(I wasn't totally sure whether it's necessary or not for this thunk, but the existing TAILJMP_RAX also has a superfluous rex prefix)

This is Windows unwinder convention. I do not think it is applicable outside Windows.

clang does not seem to generate a special jmp for tail calls on Unix:https://godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(filename:'1',fontScale:14,fontUsePx:'0',j:1,lang:c%2B%2B,selection:(endColumn:36,endLineNumber:5,positionColumn:36,positionLineNumber:5,selectionStartColumn:36,selectionStartLineNumber:5,startColumn:36,startLineNumber:5),source:'extern+int+foo(int+x)%3B%0A%0Aextern+int+bar(int+x)%3B%0A%0Aint+foo(int+x)+%7B+return+bar(x%2B1)%3B+%7D'),l:'5',n:'0',o:'C%2B%2B+source+%231',t:'0')),k:48.26193742510047,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:clang_trunk,filters:(b:'0',binary:'1',binaryObject:'1',commentOnly:'0',debugCalls:'1',demangle:'0',directives:'0',execute:'1',intel:'0',libraryCode:'0',trim:'1'),flagsViewOpen:'1',fontScale:14,fontUsePx:'0',j:2,lang:c%2B%2B,libs:!(),options:'-O3',overrides:!(),selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:1),l:'5',n:'0',o:'+x86-64+clang+(trunk)+(Editor+%231)',t:'0')),header:(),k:51.73806257489954,l:'4',n:'0',o:'',s:0,t:'0')),l:'2',n:'0',o:'',t:'0')),version:4

I'll replace it with just jmp r10. I guess the TAILJMP_RAX uses in other non-Windows thunks can be cleaned up then.

Is there a difference between managed and native code here? The JIT is also generating these tailcalls with the extraneous rex prefixes. Can they be avoided outside Windows?

Regular CoreCLR uses Windows unwinder outside Windows, so the JIT has to follow the Windows unwinder conventions there as well.

If we cared enough, we can skip the prefix for native AOT. We use native unwinder for native AOT so Windows unwinder conventions are not applicable outside Windows.

amanasifkhalid

JIT changes LGTM. Thanks!

amanasifkhalid · 2024-03-14T00:33:11Z

src/coreclr/jit/gentree.cpp

+        assert((idx < ArrLen(swiftIntReturnRegs)) && (idx < ArrLen(swiftFloatReturnRegs)));
+        unsigned intRegIdx   = 0;
+        unsigned floatRegIdx = 0;
+        for (unsigned i = 0; i < idx; i++)


Since this isn't on a hot path, and we have at most 4 return registers for a Swift call, I'm guessing it's not worth caching the return registers used so we can skip this loop? Though we do call GetABIReturnReg while looping over all return register indices in a few places...

Ideally we would keep all the registers and also the offsets as part of this type, from a simple code hygiene standpoint, but increasing its size increases the size of GenTreeCall which increases the size of all large nodes. I think it would be good to clean this up to make the additional data for ReturnTypeDesc allocated only when necessary, but that's a separate change.

I don't think there's anything to be concerned about wrt. throughput here.

amanasifkhalid · 2024-03-14T00:41:15Z

src/coreclr/jit/gentree.cpp

+{
+    const CORINFO_SWIFT_LOWERING* lowering = comp->GetSwiftLowering(clsHnd);
+    assert(!lowering->byReference);
+    assert(lowering->numLoweredElements <= MAX_RET_REG_COUNT);


MAX_RET_REG_COUNT is already 4 on ARM64, but it might be helpful to the reader to static_assert that here.

I added a static assert.

jakobbotsch · 2024-03-14T09:46:29Z

I wonder if we have a problem around CORINFO_HELP_PINVOKE_CALLI, in particular if we end up using a jump stub to call it that clobbers rax. We may need to ensure the JIT materializes its full address in the codegen for Swift calls with return buffers.

Edit: I guess not since that helper call is just going to use the regular managed calling convention.

jakobbotsch · 2024-03-14T17:24:44Z

/azp run runtime-coreclr jitstress, runtime-coreclr jitstressregs, runtime-coreclr jitstress2-jitstressregs

azure-pipelines · 2024-03-14T17:25:09Z

Azure Pipelines successfully started running 3 pipeline(s).

jkotas · 2024-03-14T20:21:40Z

I wonder if we have a problem around CORINFO_HELP_PINVOKE_CALLI

The current design of CORINFO_HELP_PINVOKE_CALLI and vasig cookies is suboptimal. It leaves perf on the table and it is unfriendly to native AOT. We use a different approach in native AOT (look for convertPInvokeCalliToCall).

If you are running into troubles with CORINFO_HELP_PINVOKE_CALLI, it would be best to get rid of it and change the EE to generate a stub with slightly different shape, same as what we do in native AOT.

jakobbotsch · 2024-03-14T21:05:27Z

The current design of CORINFO_HELP_PINVOKE_CALLI and vasig cookies is suboptimal. It leaves perf on the table and it is unfriendly to native AOT. We use a different approach in native AOT (look for convertPInvokeCalliToCall).

If you are running into troubles with CORINFO_HELP_PINVOKE_CALLI, it would be best to get rid of it and change the EE to generate a stub with slightly different shape, same as what we do in native AOT.

I'll keep that in mind. I don't think we're going to see issues around it, however.

jakobbotsch added 16 commits March 13, 2024 16:33

Swift multireg returns

9b4357f

Add some jitdump for Swift call return types

9cbc75e

Add test

c8d12ed

Fix check

e28e51d

Fix jitdump code

1a4034c

Special case genMultiRegStoreToLocal for Swift calls

0153efd

Add tests

ca4bd39

Disable tests for Mono

d3163ec

Remove assert

35a106a

Fix assert

8005056

Remove assert

f456f6f

Fixed ret buff register for x64

6d9c0b8

Fix NDirectImportThunk for Swift calls with ret bufs in rax

6580269

Add in right place

5e19124

Use right rex prefix

7bed897

Align properly

93b81fe

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 13, 2024

dotnet-policy-service bot assigned jakobbotsch Mar 13, 2024

jkoritzinsky mentioned this pull request Mar 13, 2024

Runtime support for Swift Interop in .NET 9 #93631

Closed

42 tasks

Fix RISC-V build

1842c04

jakobbotsch closed this Mar 13, 2024

Dial back tests

daff872

jakobbotsch reopened this Mar 13, 2024

jakobbotsch marked this pull request as ready for review March 13, 2024 22:34

jakobbotsch requested review from amanasifkhalid, jkoritzinsky and jkotas March 13, 2024 22:36

jakobbotsch commented Mar 13, 2024

View reviewed changes

build-analysis bot mentioned this pull request Mar 13, 2024

System.DirectoryServices.Protocols.Tests.BerConverterTests.Decode_Bytes_ReturnsExpected failing on Windows x86 #99725

Closed

amanasifkhalid approved these changes Mar 14, 2024

View reviewed changes

jakobbotsch added 2 commits March 14, 2024 10:28

Simplify

5119a6b

Add a static assert

ec9620c

jakobbotsch merged commit 0ea9b8f into dotnet:main Mar 14, 2024
187 of 217 checks passed

jakobbotsch deleted the swift-multireg-returns branch March 14, 2024 19:23

jakobbotsch mentioned this pull request Mar 14, 2024

JIT: Fix Swift multireg stores in presence of GT_RELOAD #99789

Merged

github-actions bot locked and limited conversation to collaborators Apr 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support frozen struct returns for Swift calls #99704

Support frozen struct returns for Swift calls #99704

jakobbotsch commented Mar 13, 2024 •

edited

Loading

dotnet-policy-service bot commented Mar 13, 2024

jakobbotsch commented Mar 13, 2024

jakobbotsch Mar 13, 2024 •

edited

Loading

jkotas Mar 14, 2024

jkotas Mar 14, 2024

jakobbotsch Mar 14, 2024

jkotas Mar 14, 2024

amanasifkhalid left a comment

amanasifkhalid Mar 14, 2024

jakobbotsch Mar 14, 2024

amanasifkhalid Mar 14, 2024

jakobbotsch Mar 14, 2024

jakobbotsch commented Mar 14, 2024 •

edited

Loading

jakobbotsch commented Mar 14, 2024

azure-pipelines bot commented Mar 14, 2024

jkotas commented Mar 14, 2024

jakobbotsch commented Mar 14, 2024

	else if (((TempOpcode & 0xf8) == AMD64_SIZE64_PREFIX)
	&& (NextByte[1] == AMD64_JMP_IND_OP)
	&& (NextByte[2] & 0x38) == AMD64_JMP_IND_RAX)
	{
	//
	// This is an indirect jump opcode: 0x48 0xff /4. The 64-bit
	// flag (REX.W) is always redundant here, so its presence is
	// overloaded to indicate a branch out of the function - a tail
	// call.
	//
	// Such an opcode is an unambiguous epilogue indication.

Support frozen struct returns for Swift calls #99704

Support frozen struct returns for Swift calls #99704

Conversation

jakobbotsch commented Mar 13, 2024 • edited Loading

dotnet-policy-service bot commented Mar 13, 2024

jakobbotsch commented Mar 13, 2024

jakobbotsch Mar 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amanasifkhalid left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakobbotsch commented Mar 14, 2024 • edited Loading

jakobbotsch commented Mar 14, 2024

azure-pipelines bot commented Mar 14, 2024

jkotas commented Mar 14, 2024

jakobbotsch commented Mar 14, 2024

jakobbotsch commented Mar 13, 2024 •

edited

Loading

jakobbotsch Mar 13, 2024 •

edited

Loading

jakobbotsch commented Mar 14, 2024 •

edited

Loading