Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[X86] LLVM >= 18 folding bitcast -> and -> bitcast to @llvm.fabs.f16 generates call to __extendhfsf2 #104915

Open
overmighty opened this issue Aug 20, 2024 · 5 comments · May be fixed by #106153
Labels

Comments

@overmighty
Copy link
Member

https://godbolt.org/z/on6Pj6Tsc

Starting from 5c0da58, InstCombine transforms this IR:

define linkonce_odr hidden noundef half @_Float16 __llvm_libc_20_0_0_git::fputil::abs<_Float16, 0>(_Float16)(half noundef %x) local_unnamed_addr #1 comdat {
entry:
  %0 = bitcast half %x to i16
  %1 = and i16 %0, 32767
  %2 = bitcast i16 %1 to half
  ret half %2
}

into:

define linkonce_odr hidden noundef half @_Float16 __llvm_libc_20_0_0_git::fputil::abs<_Float16, 0>(_Float16)(half noundef %x) local_unnamed_addr #1 comdat {
entry:
  %0 = call half @llvm.fabs.f16(half %x)
  ret half %0
}

On x86, when AVX-512 FP16 is not available, this generates something like:

.LCPI0_0:
        .long   0x7fffffff
__llvm_libc_20_0_0_git::fabsf16(_Float16):
        push    rbp
        mov     rbp, rsp
        call    __extendhfsf2@PLT
        vbroadcastss    xmm1, dword ptr [rip + .LCPI0_0]
        vandps  xmm0, xmm0, xmm1
        call    __truncsfhf2@PLT
        pop     rbp
        ret

whereas with LLVM 16, bitcast -> and -> bitcast generates something like:

__llvm_libc_20_0_0_git::fabsf16(_Float16):
        push    rbp
        mov     rbp, rsp
        vpextrw eax, xmm0, 0
        and     eax, 32767
        vpinsrw xmm0, xmm0, eax, 0
        pop     rbp
        ret

Related:

cc @arsenm @lntue

@arsenm
Copy link
Contributor

arsenm commented Aug 20, 2024

Forming fabs is correct, the codegen is not. Promotion should bitcast to integer

@arsenm arsenm added llvm:SelectionDAG SelectionDAGISel as well and removed llvm:instcombine labels Aug 20, 2024
@overmighty overmighty changed the title [InstCombine][X86] LLVM >= 18 folding bitcast -> and -> bitcast to @llvm.fabs.f16 generates call to __extendhfsf2 [X86] LLVM >= 18 folding bitcast -> and -> bitcast to @llvm.fabs.f16 generates call to __extendhfsf2 Aug 20, 2024
@llvmbot
Copy link
Member

llvmbot commented Aug 20, 2024

@llvm/issue-subscribers-backend-x86

Author: OverMighty (overmighty)

https://godbolt.org/z/on6Pj6Tsc

Starting from 5c0da58, InstCombine transforms this IR:

define linkonce_odr hidden noundef half @<!-- -->_Float16 __llvm_libc_20_0_0_git::fputil::abs&lt;_Float16, 0&gt;(_Float16)(half noundef %x) local_unnamed_addr #<!-- -->1 comdat {
entry:
  %0 = bitcast half %x to i16
  %1 = and i16 %0, 32767
  %2 = bitcast i16 %1 to half
  ret half %2
}

into:

define linkonce_odr hidden noundef half @<!-- -->_Float16 __llvm_libc_20_0_0_git::fputil::abs&lt;_Float16, 0&gt;(_Float16)(half noundef %x) local_unnamed_addr #<!-- -->1 comdat {
entry:
  %0 = call half @<!-- -->llvm.fabs.f16(half %x)
  ret half %0
}

On x86, when AVX-512 FP16 is not available, this generates something like:

.LCPI0_0:
        .long   0x7fffffff
__llvm_libc_20_0_0_git::fabsf16(_Float16):
        push    rbp
        mov     rbp, rsp
        call    __extendhfsf2@<!-- -->PLT
        vbroadcastss    xmm1, dword ptr [rip + .LCPI0_0]
        vandps  xmm0, xmm0, xmm1
        call    __truncsfhf2@<!-- -->PLT
        pop     rbp
        ret

whereas with LLVM 16, bitcast -> and -> bitcast generates something like:

__llvm_libc_20_0_0_git::fabsf16(_Float16):
        push    rbp
        mov     rbp, rsp
        vpextrw eax, xmm0, 0
        and     eax, 32767
        vpinsrw xmm0, xmm0, eax, 0
        pop     rbp
        ret

Related:

cc @arsenm @lntue

@arsenm
Copy link
Contributor

arsenm commented Aug 21, 2024

The same bug exists for fneg and copysign, and this is repeated in the type legalizer and promotion action

@arsenm
Copy link
Contributor

arsenm commented Aug 21, 2024

Attaching partial patch. It's incomplete because it asserts on targets with legal f16, but no legal i16. I don't have the patience to fight with SelectionDAG's insistence that there can be no illegal types after type legalization

0001-DAG-Do-not-use-FP-conversions-to-promote-fabs.patch

v01dXYZ pushed a commit to v01dXYZ/llvm-project that referenced this issue Aug 26, 2024
Conditionally to the same sized integer type being legal.

For example, for f16, if i16 is legal, bitcast to i16, clear the sign
bit and bitcast back to f16.

Adapted from a suggested patch from @arsenm (Matthew.Arsenault@amd.com):

llvm#104915 (comment)
@beetrees
Copy link
Contributor

This isn't x86 specific; it affects FABS and FNEG for any target that uses TypeSoftPromoteHalf. The issue is that the lowering uses the generic SoftPromoteHalfRes_UnaryOp method. The resulting lowering is actually a miscompilation as __extendhfsf2 and __truncsfhf2 quieten signalling NaNs, whereas FABS and FNEG are defined as only affecting the sign bit (FCOPYSIGN doesn't have this issue as its lowering in SoftPromoteHalfRes_FCOPYSIGN doesn't convert to and from a f32).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants