AMDGPU: Use "countMaxActiveBits() <= 5" to define uint5Bits #115543

changpeng · 2024-11-08T21:00:48Z

countMaxTrailingOnes() is not correct. This patch follows the suggestion from #115372.

countMaxTrailingOnes() is not correct. This patch follows the suggestion from llvm#115372.

llvmbot · 2024-11-08T21:01:24Z

@llvm/pr-subscribers-backend-amdgpu

Author: Changpeng Fang (changpeng)

Changes

countMaxTrailingOnes() is not correct. This patch follows the suggestion from #115372.

Full diff: https://github.com/llvm/llvm-project/pull/115543.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIInstructions.td (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/extract-lowbits.ll (+25)

diff --git a/llvm/lib/Target/AMDGPU/SIInstructions.td b/llvm/lib/Target/AMDGPU/SIInstructions.td
index 0658e030ffa5d6..755cbb7fb65492 100644
--- a/llvm/lib/Target/AMDGPU/SIInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SIInstructions.td
@@ -3554,7 +3554,7 @@ def : AMDGPUPat <
 >;
 
 def uint5Bits : PatLeaf<(i32 VGPR_32:$width), [{
-  return CurDAG->computeKnownBits(SDValue(N, 0)).countMaxTrailingOnes() <= 5;
+  return CurDAG->computeKnownBits(SDValue(N, 0)).countMaxActiveBits() <= 5;
 }]>;
 
 // x << (bitwidth - y) >> (bitwidth - y)
diff --git a/llvm/test/CodeGen/AMDGPU/extract-lowbits.ll b/llvm/test/CodeGen/AMDGPU/extract-lowbits.ll
index 3de8db2c6a448e..0e5a68773a6ba8 100644
--- a/llvm/test/CodeGen/AMDGPU/extract-lowbits.ll
+++ b/llvm/test/CodeGen/AMDGPU/extract-lowbits.ll
@@ -163,6 +163,31 @@ define i32 @bzhi32_d0(i32 %val, i32 %numlowbits) nounwind {
   ret i32 %masked
 }
 
+define i32 @bzhi32_d0_even(i32 %val, i32 %numlowbits) nounwind {
+; SI-LABEL: bzhi32_d0_even:
+; SI:       ; %bb.0:
+; SI-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; SI-NEXT:    v_lshlrev_b32_e32 v1, 1, v1
+; SI-NEXT:    v_sub_i32_e32 v1, vcc, 32, v1
+; SI-NEXT:    v_lshlrev_b32_e32 v0, v1, v0
+; SI-NEXT:    v_lshrrev_b32_e32 v0, v1, v0
+; SI-NEXT:    s_setpc_b64 s[30:31]
+;
+; VI-LABEL: bzhi32_d0_even:
+; VI:       ; %bb.0:
+; VI-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT:    v_lshlrev_b32_e32 v1, 1, v1
+; VI-NEXT:    v_sub_u32_e32 v1, vcc, 32, v1
+; VI-NEXT:    v_lshlrev_b32_e32 v0, v1, v0
+; VI-NEXT:    v_lshrrev_b32_e32 v0, v1, v0
+; VI-NEXT:    s_setpc_b64 s[30:31]
+  %times2 = shl i32 %numlowbits, 1
+  %numhighbits = sub i32 32, %times2
+  %highbitscleared = shl i32 %val, %numhighbits
+  %masked = lshr i32 %highbitscleared, %numhighbits
+  ret i32 %masked
+}
+
 define i32 @bzhi32_d1_indexzext(i32 %val, i8 %numlowbits) nounwind {
 ; SI-LABEL: bzhi32_d1_indexzext:
 ; SI:       ; %bb.0:

arsenm · 2024-11-08T21:07:31Z

I think the whole original patch should be reverted

changpeng · 2024-11-08T21:13:02Z

I think the whole original patch should be reverted

What is the original patch? WE should not do pattern match for x << (32 -y) >> (32 - y) to bfe x, 0, y?

arsenm · 2024-11-13T17:34:29Z

I think the whole original patch should be reverted

What is the original patch? WE should not do pattern match for x << (32 -y) >> (32 - y) to bfe x, 0, y?

Yes

…" (#116091) Based on the suggestion from #115543, we should not do the pattern matching from x << (32-y) >> (32-y) to "bfe x, 0, y" at all. This reverts commits a2bacf8 and bdf8e30.

AMDGPU: Use "countMaxActiveBits() <= 5" to define uint5Bits

a2bacf8

countMaxTrailingOnes() is not correct. This patch follows the suggestion from llvm#115372.

llvmbot added the backend:AMDGPU label Nov 8, 2024

changpeng mentioned this pull request Nov 8, 2024

AMDGPU: Don't avoid clamp of bit shift in BFE pattern #115372

Merged

changpeng merged commit db6f476 into llvm:main Nov 8, 2024
7 of 9 checks passed

changpeng deleted the bfe branch November 8, 2024 21:03

changpeng mentioned this pull request Nov 13, 2024

Revert "AMDGPU: Don't avoid clamp of bit shift in BFE pattern (#115372)" #116091

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AMDGPU: Use "countMaxActiveBits() <= 5" to define uint5Bits #115543

AMDGPU: Use "countMaxActiveBits() <= 5" to define uint5Bits #115543

changpeng commented Nov 8, 2024

llvmbot commented Nov 8, 2024

arsenm commented Nov 8, 2024

changpeng commented Nov 8, 2024

arsenm commented Nov 13, 2024

AMDGPU: Use "countMaxActiveBits() <= 5" to define uint5Bits #115543

AMDGPU: Use "countMaxActiveBits() <= 5" to define uint5Bits #115543

Conversation

changpeng commented Nov 8, 2024

llvmbot commented Nov 8, 2024

arsenm commented Nov 8, 2024

changpeng commented Nov 8, 2024

arsenm commented Nov 13, 2024