Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't get the correct return type of llvm link #1143

Closed
SparrowLii opened this issue Apr 27, 2021 · 10 comments · Fixed by #1200
Closed

Can't get the correct return type of llvm link #1143

SparrowLii opened this issue Apr 27, 2021 · 10 comments · Fixed by #1200

Comments

@SparrowLii
Copy link
Member

Look the following implementation of vld1_s16_x2 neon instruction:
godbolt
the return type of vld1x2.v4i16.p0i16 should be {<4 x i16>, <4 x i16>}
But if we do a similar implementation in rust:
gotbolt
llvm will change the return type to the wrong [ 2 x <i8 x 8> ]. As a result, we cannot implement the vld1_s16_x2 instruction.

@bjorn3
Copy link
Member

bjorn3 commented Apr 27, 2021

Except for the padding using the "unadjusted" abi and a tuple works:

#![feature(link_llvm_intrinsics, aarch64_target_feature, stdsimd, simd_ffi, repr_simd, arm_target_feature, platform_intrinsics, abi_unadjusted)]

#[cfg(target_arch = "aarch64")]
use core::arch::aarch64::*;

#[cfg(target_arch = "arm")]
use core::arch::arm::*;

pub struct int16x4x2_t(pub int16x4_t, pub int16x4_t);
pub unsafe fn vld1_s16_x2(ptr: *const i16) -> int16x4x2_t {
    #[allow(improper_ctypes)]
    extern "unadjusted" {
        #[cfg_attr(target_arch = "arm", link_name = "llvm.arm.neon.vld1x2.v4i16.p0i16")]
        #[cfg_attr(target_arch = "aarch64", link_name = "llvm.aarch64.neon.ld1x2.v4i16.p0i16")]
        fn vld1_s16_x2_(ptr: *const i16) -> (int16x4_t, int16x4_t);
    }
    let (a, b) = vld1_s16_x2_(ptr);
    int16x4x2_t(a, b)
}
Intrinsic has incorrect return type!
{ [0 x i64], <4 x i16>, [0 x i64], <4 x i16>, [0 x i64] } (i16*)* @llvm.aarch64.neon.ld1x2.v4i16.p0i16
in function _ZN7example11vld1_s16_x217h2e85ec3f6fc419e2E
LLVM ERROR: Broken function found, compilation aborted!

@Lokathor
Copy link
Contributor

having a repr(rust) type as the output type of the inner function call seems pretty dodgy

@SparrowLii
Copy link
Member Author

SparrowLii commented Apr 28, 2021

@Amanieu I think this is a llvm bug? Considering that it blocked the implementation of about 500 neon instructions, can we have a solution to it?

@Amanieu
Copy link
Member

Amanieu commented Apr 28, 2021

This is a bug in rustc: with the "unadjusted" ABI it should emit the type in LLVM IR without padding. However I am not very familiar with this part of rustc so I'm not exactly sure what the best way to go about fixing it is.

@bjorn3
Copy link
Member

bjorn3 commented Apr 28, 2021

The "unadjusted" ABI is just that. It directly uses the LLVM type rustc chose for the argument/return type for normal use inside the function without adjusting it for any calling convention as the argument/return type at LLVM level. This means that if rustc changes the representation of types at LLVM level, the "unadjusted" ABI changes too. The padding is not necessary in this case I think and as such could be removed, but there is no guarantee that there will never be padding. Luckily it is fine for stdarch to depend on unstable implementation details as it is shipped with a specific rustc version.

bors added a commit to rust-lang-ci/rust that referenced this issue Aug 11, 2021
…mit_zero_sized_padding, r=eddyb

LLVM codegen: Don't emit zero-sized padding for fields

Currently padding is emitted before fields of a struct and at the end of the struct regardless of the ABI. Even if no padding is required zero-sized padding fields are emitted. This is not useful and - more importantly - it make it impossible to generate the exact vector types that LLVM expects for certain ARM SIMD intrinsics. This change should unblock the implementation of many ARM intrinsics using the `unadjusted` ABI, see rust-lang/stdarch#1143 (comment).

This is a proof of concept only because the field lookup now takes O(number of fields) time compared to O(1) before since it recalculates the mapping at every lookup. I would like to find out how big the performance impact actually is before implementing caching or restricting this behavior to the `unadjusted` ABI.

cc `@SparrowLii` `@bjorn3`

([Discussion on internals](https://internals.rust-lang.org/t/feature-request-add-a-way-in-rustc-for-generating-struct-type-llvm-ir-without-paddings/15007))
@hkratz
Copy link
Contributor

hkratz commented Aug 16, 2021

That should work with the "unadjusted" ABI, now that rust-lang/rust#87254 has landed.

@SparrowLii
Copy link
Member Author

That should work with the "unadjusted" ABI, now that rust-lang/rust#87254 has landed.

Yes. Thanks again for solving this problem, I plan to add related instructions this week( If no one else does it)

@SparrowLii
Copy link
Member Author

SparrowLii commented Aug 18, 2021

It seems that here is a new problem in armv7: https://rust.godbolt.org/z/E3G9rhrda
LLVM doesn't know how to split the result of this operator.

@SparrowLii
Copy link
Member Author

I will implement the aarch64 part first.
Do you know how to solve this problem? @bjorn3 @Amanieu

@Amanieu
Copy link
Member

Amanieu commented Aug 18, 2021

You need to enable the neon target feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants