Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signed byte is incorrectly transferred over interop on iOS/Catalyst ARM64 #100891

Open
dotMorten opened this issue Apr 10, 2024 · 10 comments
Open
Assignees
Milestone

Comments

@dotMorten
Copy link

Description

When sending an sbyte to a C++ library (int8_t), on iOS and MacCatalyst ARM64, the sbyte is not sent across correctly.

Reproduction Steps

  1. Unzip the attached project: SbyteRepro.zip
  2. Open the project MauiTestApp\MauiTestApp.csproj and run it on an iOS Device or iOS ARM64 simulator
  3. Click the button
  4. Notice that the button show MinusFive=251. This was expected to show -5.

That value comes from sending an sbyte=-5 to native code, let the native code convert it to an int32 and return it again. But because something goes wrong at the interop level, the value gets sent across as 251. This seems to only affect sbyte going into native code - the second part of the code tests signed bytes coming out of native code and handles negative values just fine.

A Windows version of the repro is also provided (I tested Windows ARM64 and Android ARM64 and these don't exhibit this behavior).

C++ code:

DLLEXPORT int32_t WINAPI SByteToInt(int8_t value)
{
  return static_cast<int32_t>(value);
}

C# code:

[LibraryImport("__Internal")]
private static partial Int32 SByteToInt(sbyte value);

Expected behavior

signed bytes gets sent across the interop correctly on iOS ARM64 like other platforms.

Actual behavior

-5 gets incorrectly transferred.

Regression?

No response

Known Workarounds

don't use sbyte, but transfer as int32_t and cast once in native code.

Configuration

.NET 8 ARM64 iOS and Catalyst

Other information

Interesting details for Apple ARM64 ABI: https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms#Pass-arguments-to-functions-correctly

The caller of a function is responsible for signing or zero-extending any argument with fewer than 32 bits. The standard ABI expects the callee to sign or zero-extend those arguments.
The C language requires the promotion of arguments smaller than int before a call.

I wonder if .NET zero-extends our sbyte instead of sign-extending it.
-5 in two's compliment int8_t would be 1111 1011
Zero-extended to int32_t would be 0000 0000 0000 0000 0000 0000 1111 1011 which is 251

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Apr 10, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Apr 10, 2024
@ivanpovazan ivanpovazan added os-ios Apple iOS area-Interop-mono and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Apr 10, 2024
@ivanpovazan ivanpovazan added this to the 9.0.0 milestone Apr 10, 2024
@huoyaoyuan
Copy link
Member

int8_t and int32_t are both signed. I'd expect static_cast<int32_t> to be sign extend.

@dotMorten
Copy link
Author

@huoyaoyuan the static-cast code was just to show what happens to a roundtripped value, but we have code deeper down that checks ranges of the sbyte (in the specific case -12..12) that fails for all negative numbers. The issue only occured on ios and catalyst when running on ARM64. Windows ARM64 and Android ARM64 are just fine

@rolfbjarne
Copy link
Member

I wonder if the same thing happens with System.Int16 (short) <-> int16_t?

@dotnet-policy-service dotnet-policy-service bot removed the untriaged New issue has not been triaged by the area owner label Apr 11, 2024
@huoyaoyuan
Copy link
Member

static_cast<int32_t>(value) isn't a NOP on other platforms:

image

In other words, these platforms are treating upper bits of int8_t as undefined.

Compiler explorer doesn't provide apple compilers, but I can imagine that they are defining the upper bits to be already sign extended.

@dotMorten
Copy link
Author

@rolfbjarne we have not seen this happen with shorts.

@ivanpovazan ivanpovazan self-assigned this Apr 11, 2024
@jkotas
Copy link
Member

jkotas commented Apr 14, 2024

@jakobbotsch Do we have the sign and zero extensions implemented correctly in RyuJIT for Apple Arm64 ABI?

@ivanpovazan
Copy link
Member

ivanpovazan commented Apr 15, 2024

@jakobbotsch Do we have the sign and zero extensions implemented correctly in RyuJIT for Apple Arm64 ABI?

Seems like we do. I tested SByteToInt conversion with NativeAOT on iOS and it returns the expected result.
@dotMorten did you have a chance of trying out NativeAOT on iOS with your projects?


Regarding Mono, the problem seems reproducible only if the native library is built with -Os:

clang++ -c -o lib.o lib.cpp -isysroot /Applications/Xcode_15.2.0.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS17.2.sdk -Os

Removing the size optimization flag when building the native library makes Mono produce the expected result as well.


We should look into why Mono fails when the library is built with -Os

@jakobbotsch
Copy link
Member

jakobbotsch commented Apr 15, 2024

I don't see where RyuJIT takes explicit care of this normalization. RyuJIT in general keeps all values in registers normalized up to 32 bits, which makes it very hard/impossible to hit issues around this with IL created from C#. But you can hit this with RyuJIT as well for IL based programs on both arm32 (which has the same ABI requirement) and apple arm64. I opened #101046 about it.

@lambdageek
Copy link
Member

Looking at what apple clang produces with -S -emit-llvm it looks like the initial LLVM IR doesn't need to include the sign extension explicitly.

This:

#include <stdint.h>

int32_t foo (int8_t i)
{
	return i;
}

int32_t bar (int8_t i);

int32_t call_bar (int8_t i, int8_t j)
{
	return bar(i + j);
}

Produces the following LLVM IR with lang -Os -S -emit-llvm /tmp/foo.c -o foo.S -isysroot /Applications/Xcode-15.4.0.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS17.5.sdk:

 ModuleID = '/tmp/foo.c'
source_filename = "/tmp/foo.c"
target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
target triple = "arm64-apple-ios17.5.0"

; Function Attrs: mustprogress nofree norecurse nosync nounwind optsize readnone ssp willreturn uwtable(sync)
define i32 @foo(i8 noundef signext %0) local_unnamed_addr #0 {
  %2 = sext i8 %0 to i32
  ret i32 %2
}

; Function Attrs: nounwind optsize ssp uwtable(sync)
define i32 @call_bar(i8 noundef signext %0, i8 noundef signext %1) local_unnamed_addr #1 {
  %3 = add i8 %1, %0
  %4 = tail call i32 @bar(i8 noundef signext %3) #3
  ret i32 %4
}

; Function Attrs: optsize
declare i32 @bar(i8 noundef signext) local_unnamed_addr #2

attributes #0 = { mustprogress nofree norecurse nosync nounwind optsize readnone ssp willreturn uwtable(sync) "frame-pointer"="non-leaf" "min-legal-vector-width"="0" "no-trapping-math"="true" "probe-stack"="__chkstk_darwin" "stack-protector-buffer-size"="8" "target-cpu"="apple-a7" "target-features"="+aes,+crypto,+fp-armv8,+neon,+sha2,+v8a,+zcm,+zcz" }
attributes #1 = { nounwind optsize ssp uwtable(sync) "frame-pointer"="non-leaf" "min-legal-vector-width"="0" "no-trapping-math"="true" "probe-stack"="__chkstk_darwin" "stack-protector-buffer-size"="8" "target-cpu"="apple-a7" "target-features"="+aes,+crypto,+fp-armv8,+neon,+sha2,+v8a,+zcm,+zcz" }
attributes #2 = { optsize "frame-pointer"="non-leaf" "no-trapping-math"="true" "probe-stack"="__chkstk_darwin" "stack-protector-buffer-size"="8" "target-cpu"="apple-a7" "target-features"="+aes,+crypto,+fp-armv8,+neon,+sha2,+v8a,+zcm,+zcz" }
attributes #3 = { nounwind optsize }

!llvm.module.flags = !{!0, !1, !2, !3, !4}
!llvm.ident = !{!5}

!0 = !{i32 2, !"SDK Version", [2 x i32] [i32 17, i32 5]}
!1 = !{i32 1, !"wchar_size", i32 4}
!2 = !{i32 8, !"PIC Level", i32 2}
!3 = !{i32 7, !"uwtable", i32 1}
!4 = !{i32 7, !"frame-pointer", i32 1}
!5 = !{!"Apple clang version 15.0.0 (clang-1500.3.9.4)"}

You get the same IR with apple clang targeting the standard ARM64 ABI: clang -Os -S /tmp/foo.c -o foo.S --target=arm64-linux-gnu

; ModuleID = '/tmp/foo.c'
source_filename = "/tmp/foo.c"
target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "arm64-unknown-linux-gnu"

; Function Attrs: mustprogress nofree norecurse nosync nounwind optsize readnone willreturn uwtable
define dso_local i32 @foo(i8 noundef %0) local_unnamed_addr #0 {
  %2 = sext i8 %0 to i32
  ret i32 %2
}

; Function Attrs: nounwind optsize uwtable
define dso_local i32 @call_bar(i8 noundef %0, i8 noundef %1) local_unnamed_addr #1 {
  %3 = add i8 %1, %0
  %4 = tail call i32 @bar(i8 noundef %3) #3
  ret i32 %4
}

; Function Attrs: optsize
declare i32 @bar(i8 noundef) local_unnamed_addr #2

attributes #0 = { mustprogress nofree norecurse nosync nounwind optsize readnone willreturn uwtable "frame-pointer"="non-leaf" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+neon,+v8a" }
attributes #1 = { nounwind optsize uwtable "frame-pointer"="non-leaf" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+neon,+v8a" }
attributes #2 = { optsize "frame-pointer"="non-leaf" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+neon,+v8a" }
attributes #3 = { nounwind optsize }

!llvm.module.flags = !{!0, !1, !2, !3, !4}
!llvm.ident = !{!5}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 8, !"PIC Level", i32 2}
!2 = !{i32 7, !"PIE Level", i32 2}
!3 = !{i32 7, !"uwtable", i32 2}
!4 = !{i32 7, !"frame-pointer", i32 1}
!5 = !{!"Apple clang version 15.0.0 (clang-1500.3.9.4)"}

However the final assembly differs.

Apple API: clang -Os -S /tmp/foo.c -o foo.S -isysroot /Applications/Xcode-15.4.0.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS17.5.sdk --target=arm64-apple-ios17.5.0

	.section	__TEXT,__text,regular,pure_instructions
	.build_version ios, 17, 5	sdk_version 17, 5
	.globl	_foo                            ; -- Begin function foo
	.p2align	2
_foo:                                   ; @foo
	.cfi_startproc
; %bb.0:
	ret
	.cfi_endproc
                                        ; -- End function
	.globl	_call_bar                       ; -- Begin function call_bar
	.p2align	2
_call_bar:                              ; @call_bar
	.cfi_startproc
; %bb.0:
	add	w8, w1, w0
	sxtb	w0, w8
	b	_bar
	.cfi_endproc
                                        ; -- End function
.subsections_via_symbols

Linux ABI: clang -Os -S /tmp/foo.c -o foo.S -isysroot /Applications/Xcode-15.4.0.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS17.5.sdk --target=arm64-linux-gnu

	.text
	.file	"foo.c"
	.globl	foo                             // -- Begin function foo
	.p2align	2
	.type	foo,@function
foo:                                    // @foo
	.cfi_startproc
// %bb.0:
	sxtb	w0, w0
	ret
.Lfunc_end0:
	.size	foo, .Lfunc_end0-foo
	.cfi_endproc
                                        // -- End function
	.globl	call_bar                        // -- Begin function call_bar
	.p2align	2
	.type	call_bar,@function
call_bar:                               // @call_bar
	.cfi_startproc
// %bb.0:
	add	w0, w1, w0
	b	bar
.Lfunc_end1:
	.size	call_bar, .Lfunc_end1-call_bar
	.cfi_endproc
                                        // -- End function
	.ident	"Apple clang version 15.0.0 (clang-1500.3.9.4)"
	.section	".note.GNU-stack","",@progbits
	.addrsig

Conclusion: we're probably not passing the correct target triple to something in our backend. either opt or llc or clang.

@ivanpovazan
Copy link
Member

ivanpovazan commented Aug 28, 2024

Investigation

I think the problem is that we are not decorating params/arguments fewer than 32bits with the correct LLVM IR attributes.

If we considering a simple caller.cpp

#include <cstdint>
extern "C" {int32_t SByteToInt(int8_t value);}
int main() {
    int result = SByteToInt(-5);
    return result;
}

Compiled with: clang++ -S -emit-llvm -o caller.S caller.cpp -isysroot /Applications/Xcode_15.2.0.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS17.2.sdk
We get the following IR, where the param/arg have signext attribute associated with them:

; Function Attrs: noinline norecurse optnone ssp uwtable(sync)
define i32 @main() #0 {
  %1 = alloca i32, align 4
  %2 = alloca i32, align 4
  store i32 0, ptr %1, align 4
  %3 = call i32 @SByteToInt(i8 noundef signext -5)
  store i32 %3, ptr %2, align 4
  %4 = load i32, ptr %2, align 4
  ret i32 %4
}

declare i32 @SByteToInt(i8 noundef signext) #1

This gives the following assembly:

_main:
0000000000000000	sub	sp, sp, #0x20
0000000000000004	stp	x29, x30, [sp, #0x10]
0000000000000008	add	x29, sp, #0x10
000000000000000c	stur	wzr, [x29, #-0x4]
0000000000000010	mov	w0, #-0x5 // <-- correct
0000000000000014	bl	0x14

However, when we use Mono with LLVM that PInvokes the same function we generate:

; Function Attrs: noinline
define hidden monocc i32 @Program_Program_EONEConvert() #2 gc "coreclr" !dbg !89 {
...
gc.safepoint_poll.exit:                           ; preds = %BB0, %gc.safepoint_poll.poll.i
  %2 = notail call monocc i32 @p_8_plt_Program_Program_SByteToInt_sbyte_llvm(i8 -5), !dbg !90, !managed_name !91
  ret i32 %2, !dbg !92
}

declare hidden i32 @p_8_plt_Program_Program_SByteToInt_sbyte_llvm(i8) local_unnamed_addr #0

which gives following assembly:

_Program_Program_MyConvert:
0000000000000290    stp    x29, x30, [sp, #-0x10]!
0000000000000294    adrp    x8, 0 ; 0x0
0000000000000298    ldr    x8, [x8]
000000000000029c    ldr    x8, [x8]
00000000000002a0    cbnz    x8, 0x2b4
00000000000002a4    mov    w0, #0xfb // <-- wrong
00000000000002a8    bl    0x2a8

Solution

Adapt https://github.com/dotnet/runtime/blob/main/src/mono/mono/mini/mini-llvm.c#L2333 (and other places) to properly decorate params/args with the correct set of https://llvm.org/docs/LangRef.html#parameter-attributes


FWIW Using mini-jit backend alone, produces the correct result:

Program_EONEConvert:
00000001003e8280	stp	x29, x30, [sp, #-0x10]!
00000001003e8284	mov	x29, sp
00000001003e8288	adrp	x16, 2243 ; 0x100cab000
00000001003e828c	add	x16, x16, #0xa50
00000001003e8290	ldr	x0, [x16, #0x38]
00000001003e8294	ldr	x17, [x0]
00000001003e8298	cbz	x17, 0x1003e82a0
00000001003e829c	bl	plt
00000001003e82a0	mov	x0, #-0x5
00000001003e82a4	bl	plt_Program_SByteToInt_sbyte

@vitek-karas vitek-karas modified the milestones: 9.0.0, 10.0.0 Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

8 participants