Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_unchecked() is never inlined on armv7-unknown-linux-gnueabihf in functions with #[target_feature(enable = "neon")] #131745

Open
hkratz opened this issue Oct 15, 2024 · 1 comment
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-SIMD Area: SIMD (Single Instruction Multiple Data) A-target-feature Area: Enabling/disabling target features like AVX, Neon, etc. C-bug Category: This is a bug. I-slow Issue: Problems and improvements with respect to performance of generated code. O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@hkratz
Copy link
Contributor

hkratz commented Oct 15, 2024

While working on armv7 neon support for simdutf8 I ran across inlining problems for functions with #[target_feature(enable = "neon")]. One of them is that get_unchecked() is never inlined in such functions.

More info:

  • It also is not inlined with the armv7-linux-androideabi target
  • It is inlined with the thumbv7neon-unknown-linux-gnueabihf target
  • It is inlined if compiled with RUSTFLAGS="-Ctarget_feature=+neon"
#![feature(arm_target_feature)]

#[target_feature(enable = "neon")]
pub unsafe fn get_unchecked_range(x: &[u8]) -> &[u8] {
    // do more neon stuff
    unsafe { x.get_unchecked(3..)}
    // do more neon stuff
}

#[target_feature(enable = "neon")]
pub unsafe fn get_unchecked_one(x: &[u8]) -> &u8 {
    // do more neon stuff
    unsafe { x.get_unchecked(3)}
    // do more neon stuff
}

Godbolt

The root cause might be the same as for #102220.

@hkratz hkratz added the C-bug Category: This is a bug. label Oct 15, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Oct 15, 2024
@hkratz
Copy link
Contributor Author

hkratz commented Oct 15, 2024

@rustbot label +A-target-feature +T-compiler +A-SIMD +A-codegen +O-arm

@rustbot rustbot added A-codegen Area: Code generation A-SIMD Area: SIMD (Single Instruction Multiple Data) A-target-feature Area: Enabling/disabling target features like AVX, Neon, etc. O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Oct 15, 2024
@nikic nikic added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. labels Oct 15, 2024
@saethlin saethlin removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-SIMD Area: SIMD (Single Instruction Multiple Data) A-target-feature Area: Enabling/disabling target features like AVX, Neon, etc. C-bug Category: This is a bug. I-slow Issue: Problems and improvements with respect to performance of generated code. O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

4 participants