-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Closed
Labels
A-SIMDArea: SIMD (Single Instruction Multiple Data)Area: SIMD (Single Instruction Multiple Data)A-codegenArea: Code generationArea: Code generationC-bugCategory: This is a bug.Category: This is a bug.O-AArch64Armv8-A or later processors in AArch64 modeArmv8-A or later processors in AArch64 modeT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.Relevant to the compiler team, which will review and decide on the PR/issue.
Description
Code generation for std::intrinsics::simd::simd_reduce_add_unordered
generates an extra floating-point add that adds +0.0 to the result: https://godbolt.org/z/Y496nxv3E
use std::simd::*;
unsafe fn reduce_add_unordered(v: f32x4) -> f32 {
std::intrinsics::simd::simd_reduce_add_unordered(v)
}
The problem seems to be because the compiler uses +0.0 as the starting value of @llvm.vector.reduce.fadd.*
instead of -0.0. Comparing LLVM code generation for the two cases, we get the more efficient version when using -0.0: https://godbolt.org/z/fhaz7ced6
define float @reduce_fadd_positive_zero(ptr %p) {
%v = load <4 x float>, ptr %p, align 16
%result = tail call reassoc float @llvm.vector.reduce.fadd.v4f32(float 0.000000e+00, <4 x float> %v)
ret float %result
}
define float @reduce_fadd_negative_zero(ptr %p) {
%v = load <4 x float>, ptr %p, align 16
%result = tail call reassoc float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> %v)
ret float %result
}
declare float @llvm.vector.reduce.fadd.v4f32(float, <4 x float>)
This generates the following assembly for AArch64:
reduce_fadd_positive_zero: // @reduce_fadd_positive_zero
ldr q1, [x0]
movi d0, #0000000000000000
faddp v1.4s, v1.4s, v1.4s
faddp s1, v1.2s
fadd s0, s1, s0
ret
reduce_fadd_negative_zero: // @reduce_fadd_negative_zero
ldr q0, [x0]
faddp v0.4s, v0.4s, v0.4s
faddp s0, v0.2s
ret
To me, this behaviour seems to be caused by using +0.0 instead of -0.0 here in the compiler:
rust/compiler/rustc_codegen_llvm/src/intrinsic.rs
Lines 2095 to 2101 in a3af208
arith_red!( | |
simd_reduce_add_unordered: vector_reduce_add, | |
vector_reduce_fadd_reassoc, | |
false, | |
add, | |
0.0 | |
); |
Metadata
Metadata
Assignees
Labels
A-SIMDArea: SIMD (Single Instruction Multiple Data)Area: SIMD (Single Instruction Multiple Data)A-codegenArea: Code generationArea: Code generationC-bugCategory: This is a bug.Category: This is a bug.O-AArch64Armv8-A or later processors in AArch64 modeArmv8-A or later processors in AArch64 modeT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.Relevant to the compiler team, which will review and decide on the PR/issue.