Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[X86][NVPTX][LegalizeDAG] If i16 legal, legalize FABS/FNEG/FCOPYSIGN (f16) with Expand #106153

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

v01dXYZ
Copy link
Contributor

@v01dXYZ v01dXYZ commented Aug 26, 2024

For the concerned targets (X86, NVPTX), Expand clears the sign bit after bitcasting to i16. This is cheaper than Promote that requires converting back and forth from f16 to f32 (with possibly expensive libcalls) in order to call FABS.f32

Fixes #104915.

@llvmbot
Copy link
Member

llvmbot commented Aug 26, 2024

@llvm/pr-subscribers-backend-nvptx

@llvm/pr-subscribers-backend-x86

Author: None (v01dXYZ)

Changes

For the concerned targets (X86, NVPTX), Expand clears the sign bit after bitcasting to i16. This is cheaper than Promote that requires converting back and forth from f16 to f32 (with possibly expensive libcalls) in order to call FABS.f32


Full diff: https://github.com/llvm/llvm-project/pull/106153.diff

6 Files Affected:

  • (modified) llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp (+2-1)
  • (modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+2-1)
  • (modified) llvm/test/CodeGen/NVPTX/f16-instructions.ll (+4-8)
  • (modified) llvm/test/CodeGen/NVPTX/f16x2-instructions.ll (+7-11)
  • (modified) llvm/test/CodeGen/X86/fp16-libcalls.ll (+7-28)
  • (modified) llvm/test/CodeGen/X86/half.ll (+5-7)
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
index 73791102fc04de..5b0a87d5f150f8 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -849,7 +849,8 @@ NVPTXTargetLowering::NVPTXTargetLowering(const NVPTXTargetMachine &TM,
     AddPromotedToType(Op, MVT::bf16, MVT::f32);
   }
   for (const auto &Op : {ISD::FABS}) {
-    setOperationAction(Op, MVT::f16, Promote);
+    // Expand instead of Promote to clear sign bit by bitcasting to i16
+    setOperationAction(Op, MVT::f16, Expand);
     setOperationAction(Op, MVT::f32, Legal);
     setOperationAction(Op, MVT::f64, Legal);
     setOperationAction(Op, MVT::v2f16, Expand);
diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 1a6be4eb5af1ef..274a03b6664736 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -599,7 +599,8 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM,
   setOperationAction(ISD::STRICT_FP_EXTEND, MVT::f64, Legal);
 
   auto setF16Action = [&] (MVT VT, LegalizeAction Action) {
-    setOperationAction(ISD::FABS, VT, Action);
+    // Expand instead of Promote to clear sign bit by bitcasting to i16.
+    setOperationAction(ISD::FABS, VT, Expand);
     setOperationAction(ISD::FNEG, VT, Action);
     setOperationAction(ISD::FCOPYSIGN, VT, Expand);
     setOperationAction(ISD::FREM, VT, Action);
diff --git a/llvm/test/CodeGen/NVPTX/f16-instructions.ll b/llvm/test/CodeGen/NVPTX/f16-instructions.ll
index 14e02a49f6e5e4..cfabf1d6639c2a 100644
--- a/llvm/test/CodeGen/NVPTX/f16-instructions.ll
+++ b/llvm/test/CodeGen/NVPTX/f16-instructions.ll
@@ -981,14 +981,10 @@ define half @test_fma(half %a, half %b, half %c) #0 {
 }
 
 ; CHECK-LABEL: test_fabs(
-; CHECK:      ld.param.b16    [[A:%rs[0-9]+]], [test_fabs_param_0];
-; CHECK-NOFTZ:      cvt.f32.f16     [[AF:%f[0-9]+]], [[A]];
-; CHECK-NOFTZ:      abs.f32         [[RF:%f[0-9]+]], [[AF]];
-; CHECK-F16-FTZ:      cvt.ftz.f32.f16     [[AF:%f[0-9]+]], [[A]];
-; CHECK-F16-FTZ:      abs.ftz.f32         [[RF:%f[0-9]+]], [[AF]];
-; CHECK:      cvt.rn.f16.f32  [[R:%rs[0-9]+]], [[RF]];
-; CHECK:      st.param.b16    [func_retval0+0], [[R]];
-; CHECK:      ret;
+; CHECK:    ld.param.b16    [[A:%rs[0-9]+]], [test_fabs_param_0];
+; CHECK:    and.b16 [[RF:%rs[0-9]+]], [[A]], 32767;
+; CHECK:    st.param.b16 [func_retval0+0], [[RF]];
+; CHECK:    ret;
 define half @test_fabs(half %a) #0 {
   %r = call half @llvm.fabs.f16(half %a)
   ret half %r
diff --git a/llvm/test/CodeGen/NVPTX/f16x2-instructions.ll b/llvm/test/CodeGen/NVPTX/f16x2-instructions.ll
index 464b3a754804fe..c2189a20c98c25 100644
--- a/llvm/test/CodeGen/NVPTX/f16x2-instructions.ll
+++ b/llvm/test/CodeGen/NVPTX/f16x2-instructions.ll
@@ -1183,17 +1183,13 @@ define <2 x half> @test_fma(<2 x half> %a, <2 x half> %b, <2 x half> %c) #0 {
 }
 
 ; CHECK-LABEL: test_fabs(
-; CHECK:      ld.param.b32    [[A:%r[0-9]+]], [test_fabs_param_0];
-; CHECK:      mov.b32         {[[A0:%rs[0-9]+]], [[A1:%rs[0-9]+]]}, [[A]]
-; CHECK-DAG:  cvt.f32.f16     [[AF0:%f[0-9]+]], [[A0]];
-; CHECK-DAG:  cvt.f32.f16     [[AF1:%f[0-9]+]], [[A1]];
-; CHECK-DAG:  abs.f32         [[RF0:%f[0-9]+]], [[AF0]];
-; CHECK-DAG:  abs.f32         [[RF1:%f[0-9]+]], [[AF1]];
-; CHECK-DAG:  cvt.rn.f16.f32  [[R0:%rs[0-9]+]], [[RF0]];
-; CHECK-DAG:  cvt.rn.f16.f32  [[R1:%rs[0-9]+]], [[RF1]];
-; CHECK:      mov.b32         [[R:%r[0-9]+]], {[[R0]], [[R1]]}
-; CHECK:      st.param.b32    [func_retval0+0], [[R]];
-; CHECK:      ret;
+; CHECK:    ld.param.b32    [[A:%r[0-9]+]], [test_fabs_param_0];
+; CHECK:    mov.b32         {[[A0:%rs[0-9]+]], [[A1:%rs[0-9]+]]}, [[A]]
+; CHECK:    and.b16 [[A2:%rs[0-9]+]], [[A1]], 32767;
+; CHECK:    and.b16 [[A3:%rs[0-9]+]], [[A0]], 32767;
+; CHECK:    mov.b32 [[B:%r[0-9]+]], {[[A3]], [[A2]]};
+; CHECK:    st.param.b32 [func_retval0+0], [[B]];
+; CHECK:    ret;
 define <2 x half> @test_fabs(<2 x half> %a) #0 {
   %r = call <2 x half> @llvm.fabs.f16(<2 x half> %a)
   ret <2 x half> %r
diff --git a/llvm/test/CodeGen/X86/fp16-libcalls.ll b/llvm/test/CodeGen/X86/fp16-libcalls.ll
index db3d031a8fe3fb..ee342f22eeff66 100644
--- a/llvm/test/CodeGen/X86/fp16-libcalls.ll
+++ b/llvm/test/CodeGen/X86/fp16-libcalls.ll
@@ -124,11 +124,7 @@ define void @test_half_fabs(half %a0, ptr %p0) nounwind {
 ; F16C-LABEL: test_half_fabs:
 ; F16C:       # %bb.0:
 ; F16C-NEXT:    vpextrw $0, %xmm0, %eax
-; F16C-NEXT:    vmovd %eax, %xmm0
-; F16C-NEXT:    vcvtph2ps %xmm0, %xmm0
-; F16C-NEXT:    vandps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
-; F16C-NEXT:    vcvtps2ph $4, %xmm0, %xmm0
-; F16C-NEXT:    vmovd %xmm0, %eax
+; F16C-NEXT:    andl $32767, %eax # imm = 0x7FFF
 ; F16C-NEXT:    movw %ax, (%rdi)
 ; F16C-NEXT:    retq
 ;
@@ -141,34 +137,17 @@ define void @test_half_fabs(half %a0, ptr %p0) nounwind {
 ;
 ; X64-LABEL: test_half_fabs:
 ; X64:       # %bb.0:
-; X64-NEXT:    pushq %rbx
-; X64-NEXT:    movq %rdi, %rbx
-; X64-NEXT:    callq __extendhfsf2@PLT
-; X64-NEXT:    pand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
-; X64-NEXT:    callq __truncsfhf2@PLT
 ; X64-NEXT:    pextrw $0, %xmm0, %eax
-; X64-NEXT:    movw %ax, (%rbx)
-; X64-NEXT:    popq %rbx
+; X64-NEXT:    andl $32767, %eax # imm = 0x7FFF
+; X64-NEXT:    movw %ax, (%rdi)
 ; X64-NEXT:    retq
 ;
 ; X86-LABEL: test_half_fabs:
 ; X86:       # %bb.0:
-; X86-NEXT:    pushl %esi
-; X86-NEXT:    subl $8, %esp
-; X86-NEXT:    pinsrw $0, {{[0-9]+}}(%esp), %xmm0
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %esi
-; X86-NEXT:    pextrw $0, %xmm0, %eax
-; X86-NEXT:    movw %ax, (%esp)
-; X86-NEXT:    calll __extendhfsf2
-; X86-NEXT:    fstps {{[0-9]+}}(%esp)
-; X86-NEXT:    movd {{.*#+}} xmm0 = mem[0],zero,zero,zero
-; X86-NEXT:    pand {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
-; X86-NEXT:    movd %xmm0, (%esp)
-; X86-NEXT:    calll __truncsfhf2
-; X86-NEXT:    pextrw $0, %xmm0, %eax
-; X86-NEXT:    movw %ax, (%esi)
-; X86-NEXT:    addl $8, %esp
-; X86-NEXT:    popl %esi
+; X86-NEXT:    movl {{[0-9]+}}(%esp), %eax
+; X86-NEXT:    movzwl {{[0-9]+}}(%esp), %ecx
+; X86-NEXT:    andl $32767, %ecx # imm = 0x7FFF
+; X86-NEXT:    movw %cx, (%eax)
 ; X86-NEXT:    retl
   %res = call half @llvm.fabs.half(half %a0)
   store half %res, ptr %p0, align 2
diff --git a/llvm/test/CodeGen/X86/half.ll b/llvm/test/CodeGen/X86/half.ll
index 9f01d07e6a6705..6838925240058e 100644
--- a/llvm/test/CodeGen/X86/half.ll
+++ b/llvm/test/CodeGen/X86/half.ll
@@ -1059,7 +1059,6 @@ define void @main.158() #0 {
 ; CHECK-LIBCALL:       # %bb.0: # %entry
 ; CHECK-LIBCALL-NEXT:    pushq %rax
 ; CHECK-LIBCALL-NEXT:    xorps %xmm0, %xmm0
-; CHECK-LIBCALL-NEXT:    callq __truncsfhf2@PLT
 ; CHECK-LIBCALL-NEXT:    callq __extendhfsf2@PLT
 ; CHECK-LIBCALL-NEXT:    movss {{.*#+}} xmm1 = [8.0E+0,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-LIBCALL-NEXT:    ucomiss %xmm0, %xmm1
@@ -1077,10 +1076,10 @@ define void @main.158() #0 {
 ; BWON-F16C-LABEL: main.158:
 ; BWON-F16C:       # %bb.0: # %entry
 ; BWON-F16C-NEXT:    vxorps %xmm0, %xmm0, %xmm0
-; BWON-F16C-NEXT:    vcvtps2ph $4, %xmm0, %xmm1
-; BWON-F16C-NEXT:    vcvtph2ps %xmm1, %xmm1
-; BWON-F16C-NEXT:    vmovss {{.*#+}} xmm2 = [8.0E+0,0.0E+0,0.0E+0,0.0E+0]
-; BWON-F16C-NEXT:    vucomiss %xmm1, %xmm2
+; BWON-F16C-NEXT:    vcvtph2ps %xmm0, %xmm0
+; BWON-F16C-NEXT:    vmovss {{.*#+}} xmm1 = [8.0E+0,0.0E+0,0.0E+0,0.0E+0]
+; BWON-F16C-NEXT:    vucomiss %xmm0, %xmm1
+; BWON-F16C-NEXT:    vxorps %xmm0, %xmm0, %xmm0
 ; BWON-F16C-NEXT:    jae .LBB20_2
 ; BWON-F16C-NEXT:  # %bb.1: # %entry
 ; BWON-F16C-NEXT:    vmovss {{.*#+}} xmm0 = [NaN,0.0E+0,0.0E+0,0.0E+0]
@@ -1093,8 +1092,7 @@ define void @main.158() #0 {
 ; CHECK-I686-LABEL: main.158:
 ; CHECK-I686:       # %bb.0: # %entry
 ; CHECK-I686-NEXT:    subl $12, %esp
-; CHECK-I686-NEXT:    movl $0, (%esp)
-; CHECK-I686-NEXT:    calll __truncsfhf2
+; CHECK-I686-NEXT:    pxor %xmm0, %xmm0
 ; CHECK-I686-NEXT:    pextrw $0, %xmm0, %eax
 ; CHECK-I686-NEXT:    movw %ax, (%esp)
 ; CHECK-I686-NEXT:    calll __extendhfsf2

Copy link

github-actions bot commented Aug 26, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@phoebewang phoebewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for X86 change, thanks!

@justinfargnoli justinfargnoli requested a review from Artem-B August 27, 2024 07:51
Copy link
Member

@Artem-B Artem-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Comment on lines +1187 to +1191
; CHECK: mov.b32 {[[A0:%rs[0-9]+]], [[A1:%rs[0-9]+]]}, [[A]]
; CHECK: and.b16 [[A2:%rs[0-9]+]], [[A1]], 32767;
; CHECK: and.b16 [[A3:%rs[0-9]+]], [[A0]], 32767;
; CHECK: mov.b32 [[B:%r[0-9]+]], {[[A3]], [[A2]]};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this could be improved further by performing and on the i32 directly. I suspect ptxas may do that for us.

Maybe add a TODO note.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a TODO is fine, but that we should not rely on ptxas doing it for us.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LegalizeType would type-expand FABS (<2 x f16>).

The DAG just before instruction selection:

Optimized legalized selection DAG: %bb.0 'test_fabs:'
SelectionDAG has 21 nodes:
  t0: ch,glue = EntryToken
  t14: v2f16,ch = load<(dereferenceable invariant load (s32) from `ptr addrspace(101) null`, addrspace 101)> t0, TargetExternalSymbol:i64'test_fabs_param_0', undef:i64
      t8: ch = CopyToReg t0, Register:v2f16 %0, t14
              t16: f16 = extract_vector_elt t14, Constant:i64<0>
            t22: i16 = bitcast t16
          t24: i16 = and t22, Constant:i16<32767>
        t25: f16 = bitcast t24
              t19: f16 = extract_vector_elt t14, Constant:i64<1>
            t26: i16 = bitcast t19
          t27: i16 = and t26, Constant:i16<32767>
        t28: f16 = bitcast t27
      t21: v2f16 = BUILD_VECTOR t25, t28
    t11: ch = NVPTXISD::StoreRetval<(store (s32), align 1)> t8, Constant:i32<0>, t21
  t12: ch = NVPTXISD::RET_GLUE t11

Maybe detect a composition of bitwise operations on each element and merge it into a composition on the bitcasted vector itself (starting from BUILD_VECTOR).

BTW that's also the case with x86:

define <2 x half> @test_fabs(<2 x half> %a) #0 {
  %r = call <2 x half> @llvm.fabs.f16(<2 x half> %a)
  ret <2 x half> %r
}
        .text
        .file   "fabs_nvptx.ll"
        .globl  test_fabs                       # -- Begin function test_fabs
        .p2align        4, 0x90
        .type   test_fabs,@function
test_fabs:                              # @test_fabs
        .cfi_startproc
# %bb.0:
        pextrw  $0, %xmm0, %eax
        psrld   $16, %xmm0
        pextrw  $0, %xmm0, %ecx
        andl    $32767, %ecx                    # imm = 0x7FFF
        pinsrw  $0, %ecx, %xmm1
        andl    $32767, %eax                    # imm = 0x7FFF
        pinsrw  $0, %eax, %xmm0
        punpcklwd       %xmm1, %xmm0            # xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
        retq
.Lfunc_end0:
        .size   test_fabs, .Lfunc_end0-test_fabs
        .cfi_endproc
                                        # -- End function
        .section        ".note.GNU-stack","",@progbits

But AArch64 seems not to do so (there is a combiner that combine the build_vector into a concat_vector ..., undef).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not get sidetracked. This is something for a separate patch or github issue, and would be best discussed there.

A comment highlighting a future optimization opportunity is all that we need in this patch.

@@ -849,7 +849,8 @@ NVPTXTargetLowering::NVPTXTargetLowering(const NVPTXTargetMachine &TM,
AddPromotedToType(Op, MVT::bf16, MVT::f32);
}
for (const auto &Op : {ISD::FABS}) {
setOperationAction(Op, MVT::f16, Promote);
// Expand instead of Promote to clear sign bit by bitcasting to i16
setOperationAction(Op, MVT::f16, Expand);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would operations other than FABS benefit from that, too? E.g. fneg or fcopysign?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a valid point.

For NVPTX, we have for f16 (so no change required)

  • FNEG Legal / Expand
  • FCOPYSIGN Expand

For X86-64, we have for f16

  • FNEG Promote
  • FCOPYSIGN Expand

@v01dXYZ v01dXYZ force-pushed the 104915-legalize-dag-if-i16-legal-then-legalize-fabs-f16-with-expand branch from 87097a5 to 00d591c Compare August 27, 2024 21:55
@v01dXYZ v01dXYZ requested a review from phoebewang August 27, 2024 21:55
@v01dXYZ
Copy link
Contributor Author

v01dXYZ commented Aug 27, 2024

@phoebewang I added FNEG/FCOPYSIGN. I rewrote a little bit the code to avoid having special cases within setF16Action (not more efficient but simpler).

// Expand instead of Promote to clear/flip/copy sign bit by bitcasting to
// i16.
setOperationAction(ISD::FABS, MVT::f16, Expand);
setOperationAction(ISD::FNEG, MVT::f16, Expand);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

X86 could use test cases for fneg/fcopysign.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks !

@v01dXYZ v01dXYZ force-pushed the 104915-legalize-dag-if-i16-legal-then-legalize-fabs-f16-with-expand branch from 00d591c to ff5805c Compare August 27, 2024 22:52
@v01dXYZ v01dXYZ changed the title [X86][NVPTX][LegalizeDAG] If i16 legal, legalize FABS.f16 with Expand [X86][NVPTX][LegalizeDAG] If i16 legal, legalize FABS/FNEG/FCOPYSIGN (f16) with Expand Aug 27, 2024
@@ -601,7 +601,7 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM,
auto setF16Action = [&] (MVT VT, LegalizeAction Action) {
setOperationAction(ISD::FABS, VT, Action);
setOperationAction(ISD::FNEG, VT, Action);
setOperationAction(ISD::FCOPYSIGN, VT, Expand);
setOperationAction(ISD::FCOPYSIGN, VT, Action);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason to change this? And why don't set above Expand here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it would create even more special cases that don't reflect the function name.

The function is called elsewhere with Expand unless it's f16.

That's just for consistency, I could do it here if it's better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is called not only by scalar f16 but also both vector FP16 and BF16, though we don't have tests to cover them.

Copy link
Contributor Author

@v01dXYZ v01dXYZ Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When it is for a vector FP/BF, it is always called with Expand.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, then it's fine for me :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait.. should we remove it from setF16Action instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait.. should we remove it from setF16Action instead?

Sorry, I misunderstood. No problem then.

Copy link
Contributor Author

@v01dXYZ v01dXYZ Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can choose b/w:

  1. isolate those Opcodes into another function.
  2. keep them and overwrite the actions just after calling setF16Action if they are the right actions (which is the path I've taken, and that's what's in the current code with MVT::v16f16 for example).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking you meant FCOPYSIGN is set Expand by default, then I got you mean it's all set Expand in X86ISelLowering.cpp.
Just have a double check, it does set Expand by default, see
https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/TargetLoweringBase.cpp#L775
So we can simply remove it from setF16Action.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. The less code the better.

@v01dXYZ v01dXYZ force-pushed the 104915-legalize-dag-if-i16-legal-then-legalize-fabs-f16-with-expand branch 4 times, most recently from f38c1c9 to 04e7cea Compare August 28, 2024 02:08
Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please can you rebase? I've already added additional test coverage to fp16-libcalls.ll, including copysign/fabs/fneg etc.

…(f16) with Expand

For the concerned targets, `Expand` clears/flips/copies the sign bit
after bitcasting to i16. This is cheaper than `Promote` that requires
converting back and forth from f16 to f32 (with possibly expensive
libcalls) in order to call FABS/FNEG/FCOPYSIGN (f32).

Reminder: By default, FCOPYSIGN action is Expand.
@v01dXYZ v01dXYZ force-pushed the 104915-legalize-dag-if-i16-legal-then-legalize-fabs-f16-with-expand branch from 04e7cea to d8a6a62 Compare August 28, 2024 11:42
Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - cheers

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still papering over a bug - the promote action is just busted and needs to be replaced or removed

@v01dXYZ
Copy link
Contributor Author

v01dXYZ commented Aug 28, 2024

@arsenm

Yes, promote seems to be overkill for such operations that could be implemented as bitwise ops after move back and forth to a GPR.

  • For targets with f16 and i16 registers, the natural way to lower those ops is with a bitcast (unless there are native instructions for that of course). In this case, removing Promote is sound.
  • Concerning the case of an arch with f16 and without i16, we should instead use the smallest fitting integer type rather than promotion.

The weird case would be an arch having f16 registers but only GPR registers that are strictly smaller than i16.

@arsenm
Copy link
Contributor

arsenm commented Aug 28, 2024

  • For targets with f16 have a valid i16, the natural way to lower those ops is with a bitcast (unless there are native instructions for that of course). In this case, removing Promote is sound.

The problem isn't that it's overkill, it's that it is wrong. You cannot have the canonicalization effects of the fpextend.

  • Concerning the case of an arch with f16 and without i16, we should instead use the smallest fitting integer type rather than promotion.

This is more of a ridiculous DAG design issue; you should be able to create the bitcast and have later legalization steps deal with it. Failing that, you could do a replacement bitcast by doing store to stack and reload

@v01dXYZ
Copy link
Contributor Author

v01dXYZ commented Aug 28, 2024

Indeed, LangRef clearly mentions the fact that only the bit sign is changed with fabs/fneg/fcopysign but says nothing about that with fpext/fptrunc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[X86] LLVM >= 18 folding bitcast -> and -> bitcast to @llvm.fabs.f16 generates call to __extendhfsf2
7 participants