- 
                Notifications
    You must be signed in to change notification settings 
- Fork 14.9k
[ARM] Set isCheapToSpeculateCtlz as true for hasV5TOps and no Thumb 1 #154848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| @llvm/pr-subscribers-backend-arm Author: AZero13 (AZero13) ChangesThis is so that we don't expand to include unneeded 0 checks. Also fix the logic error in LegalizerInfo so it is NOT legal on Thumb1 in Fast-ISEL. Finally, Remove the README entry regarding this issue. Full diff: https://github.com/llvm/llvm-project/pull/154848.diff 4 Files Affected: 
 diff --git a/llvm/lib/Target/ARM/ARMISelLowering.cpp b/llvm/lib/Target/ARM/ARMISelLowering.cpp
index 830156359e9e8..c6cf6f7d8978a 100644
--- a/llvm/lib/Target/ARM/ARMISelLowering.cpp
+++ b/llvm/lib/Target/ARM/ARMISelLowering.cpp
@@ -21398,7 +21398,7 @@ bool ARMTargetLowering::isCheapToSpeculateCttz(Type *Ty) const {
 }
 
 bool ARMTargetLowering::isCheapToSpeculateCtlz(Type *Ty) const {
-  return Subtarget->hasV6T2Ops();
+  return Subtarget->hasV5TOps() && !Subtarget->isThumb1Only();
 }
 
 bool ARMTargetLowering::isMaskAndCmp0FoldingBeneficial(
diff --git a/llvm/lib/Target/ARM/ARMLegalizerInfo.cpp b/llvm/lib/Target/ARM/ARMLegalizerInfo.cpp
index fc12f050fa5a5..cdff649ecfa57 100644
--- a/llvm/lib/Target/ARM/ARMLegalizerInfo.cpp
+++ b/llvm/lib/Target/ARM/ARMLegalizerInfo.cpp
@@ -206,7 +206,7 @@ ARMLegalizerInfo::ARMLegalizerInfo(const ARMSubtarget &ST) : ST(ST) {
 
   getActionDefinitionsBuilder({G_FREM, G_FPOW}).libcallFor({s32, s64});
 
-  if (ST.hasV5TOps()) {
+  if (ST.hasV5TOps() && !ST.isThumb1Only()) {
     getActionDefinitionsBuilder(G_CTLZ)
         .legalFor({s32, s32})
         .clampScalar(1, s32, s32)
diff --git a/llvm/lib/Target/ARM/README.txt b/llvm/lib/Target/ARM/README.txt
index def67cfae7277..ff84e07fa084a 100644
--- a/llvm/lib/Target/ARM/README.txt
+++ b/llvm/lib/Target/ARM/README.txt
@@ -697,22 +697,6 @@ target-neutral one.
 
 //===---------------------------------------------------------------------===//
 
-Optimize unnecessary checks for zero with __builtin_clz/ctz.  Those builtins
-are specified to be undefined at zero, so portable code must check for zero
-and handle it as a special case.  That is unnecessary on ARM where those
-operations are implemented in a way that is well-defined for zero.  For
-example:
-
-int f(int x) { return x ? __builtin_clz(x) : sizeof(int)*8; }
-
-should just be implemented with a CLZ instruction.  Since there are other
-targets, e.g., PPC, that share this behavior, it would be best to implement
-this in a target-independent way: we should probably fold that (when using
-"undefined at zero" semantics) to set the "defined at zero" bit and have
-the code generator expand out the right code.
-
-//===---------------------------------------------------------------------===//
-
 Clean up the test/MC/ARM files to have more robust register choices.
 
 R0 should not be used as a register operand in the assembler tests as it's then
diff --git a/llvm/test/CodeGen/ARM/clz.ll b/llvm/test/CodeGen/ARM/clz.ll
index 0f49fbba11845..9e1e9f6ce6daa 100644
--- a/llvm/test/CodeGen/ARM/clz.ll
+++ b/llvm/test/CodeGen/ARM/clz.ll
@@ -1,12 +1,41 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
 ; RUN: llc -mtriple=arm-eabi -mattr=+v5t %s -o - | FileCheck %s -check-prefixes=CHECK,INLINE
 ; RUN: llc -mtriple=arm-eabi %s -o - | FileCheck %s -check-prefixes=CHECK,LIBCALL
 
 declare i32 @llvm.ctlz.i32(i32, i1)
 
-define i32 @test(i32 %x) {
-; CHECK-LABEL: test
-; INLINE: clz r0, r0
-; LIBCALL: b __clzsi2
+define i32 @undef_zero(i32 %x) {
+; INLINE-LABEL: undef_zero:
+; INLINE:       @ %bb.0:
+; INLINE-NEXT:    clz r0, r0
+; INLINE-NEXT:    bx lr
+;
+; LIBCALL-LABEL: undef_zero:
+; LIBCALL:       @ %bb.0:
+; LIBCALL-NEXT:    b __clzsi2
         %tmp.1 = call i32 @llvm.ctlz.i32( i32 %x, i1 true )
         ret i32 %tmp.1
 }
+
+define i32 @no_undef_zero(i32 %x) {
+; INLINE-LABEL: no_undef_zero:
+; INLINE:       @ %bb.0:
+; INLINE-NEXT:    clz r0, r0
+; INLINE-NEXT:    bx lr
+;
+; LIBCALL-LABEL: no_undef_zero:
+; LIBCALL:       @ %bb.0:
+; LIBCALL-NEXT:    cmp r0, #0
+; LIBCALL-NEXT:    moveq r0, #32
+; LIBCALL-NEXT:    moveq pc, lr
+; LIBCALL-NEXT:  .LBB1_1: @ %cond.false
+; LIBCALL-NEXT:    .save {r11, lr}
+; LIBCALL-NEXT:    push {r11, lr}
+; LIBCALL-NEXT:    bl __clzsi2
+; LIBCALL-NEXT:    pop {r11, lr}
+; LIBCALL-NEXT:    mov pc, lr
+        %tmp.1 = call i32 @llvm.ctlz.i32( i32 %x, i1 false )
+        ret i32 %tmp.1
+}
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; CHECK: {{.*}}
 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(For reference, see af1b48b .)
Please also fix isCheapToSpeculateCttz.
| 
 Done! | 
| Maybe also add a RUN line to llvm/test/CodeGen/ARM/cttz.ll? | 
…or hasV5TOps and no Thumb 1 This is so that we don't expand to include unneeded 0 checks. Also fix the logic error in LegalizerInfo so it is NOT legal on Thumb1 in Fast-ISEL. Finally, Remove the README entry regarding this issue.
| @efriedma-quic Done! | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
(You could maybe make an argument that we shouldn't speculate ctz without rbit... but it close enough that it probably doesn't matter much either way.)
| Thank you. @efriedma-quic i don't have merge permissions. Can you please do it? | 
This is so that we don't expand to include unneeded 0 checks.
Also fix the logic error in LegalizerInfo so it is NOT legal on Thumb1 in Fast-ISEL.
Finally, Remove the README entry regarding this issue.