[AMDGPU] Fix interaction between WQM and llvm.amdgcn.init.exec #93680

jayfoad · 2024-05-29T13:15:42Z

Whole quad mode requires inserting a copy of the initial EXEC mask. In a
function that also uses llvm.amdgcn.init.exec, insert the COPY after
initializing EXEC.

jayfoad · 2024-05-29T13:16:59Z

Please see individual commits. If the overall approach seems good, I plan to split them into separate PRs.

piotrAMD · 2024-05-29T14:50:17Z

The approach looks good to me. Seeing the failing case, it makes sense that SI_INIT_EXEC (whose exec value can be arbitrary) is not expanded until si-wqm.

Whole quad mode requires inserting a copy of the initial EXEC mask. In a function that also uses llvm.amdgcn.init.exec, insert the COPY after initializing EXEC.

llvmbot · 2024-06-07T09:21:35Z

@llvm/pr-subscribers-backend-amdgpu

Author: Jay Foad (jayfoad)

Changes

Whole quad mode requires inserting a copy of the initial EXEC mask. In a
function that also uses llvm.amdgcn.init.exec, insert the COPY after
initializing EXEC.

Full diff: https://github.com/llvm/llvm-project/pull/93680.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp (+20-9)
(modified) llvm/test/CodeGen/AMDGPU/wqm.ll (+46)

diff --git a/llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp b/llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
index 5b4c44302fa62..913942dda19d9 100644
--- a/llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
+++ b/llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
@@ -225,7 +225,7 @@ class SIWholeQuadMode : public MachineFunctionPass {
   void lowerCopyInstrs();
   void lowerKillInstrs(bool IsWQM);
   void lowerInitExec(MachineInstr &MI);
-  void lowerInitExecInstrs();
+  MachineBasicBlock::iterator lowerInitExecInstrs(MachineBasicBlock &Entry);
 
 public:
   static char ID;
@@ -1648,9 +1648,23 @@ void SIWholeQuadMode::lowerInitExec(MachineInstr &MI) {
   LIS->createAndComputeVirtRegInterval(CountReg);
 }
 
-void SIWholeQuadMode::lowerInitExecInstrs() {
-  for (MachineInstr *MI : InitExecInstrs)
+/// Lower INIT_EXEC instructions. Return a suitable insert point in \p Entry
+/// for instructions that depend on EXEC.
+MachineBasicBlock::iterator
+SIWholeQuadMode::lowerInitExecInstrs(MachineBasicBlock &Entry) {
+  MachineBasicBlock::iterator InsertPt = Entry.getFirstNonPHI();
+
+  for (MachineInstr *MI : InitExecInstrs) {
+    // Try to handle undefined cases gracefully:
+    // - multiple INIT_EXEC instructions
+    // - INIT_EXEC instructions not in the entry block
+    if (MI->getParent() == &Entry)
+      InsertPt = std::next(MI->getIterator());
+
     lowerInitExec(*MI);
+  }
+
+  return InsertPt;
 }
 
 bool SIWholeQuadMode::runOnMachineFunction(MachineFunction &MF) {
@@ -1701,19 +1715,16 @@ bool SIWholeQuadMode::runOnMachineFunction(MachineFunction &MF) {
 
   LiveMaskReg = Exec;
 
+  MachineBasicBlock &Entry = MF.front();
+  MachineBasicBlock::iterator EntryMI = lowerInitExecInstrs(Entry);
+
   // Shader is simple does not need any state changes or any complex lowering
   if (!(GlobalFlags & (StateWQM | StateStrict)) && LowerToCopyInstrs.empty() &&
       LowerToMovInstrs.empty() && KillInstrs.empty()) {
-    lowerInitExecInstrs();
     lowerLiveMaskQueries();
     return !InitExecInstrs.empty() || !LiveMaskQueries.empty();
   }
 
-  lowerInitExecInstrs();
-
-  MachineBasicBlock &Entry = MF.front();
-  MachineBasicBlock::iterator EntryMI = Entry.getFirstNonPHI();
-
   // Store a copy of the original live mask when required
   if (NeedsLiveMask || (GlobalFlags & StateWQM)) {
     LiveMaskReg = MRI->createVirtualRegister(TRI->getBoolRC());
diff --git a/llvm/test/CodeGen/AMDGPU/wqm.ll b/llvm/test/CodeGen/AMDGPU/wqm.ll
index 6fcf5067b0225..3bf6c104a0254 100644
--- a/llvm/test/CodeGen/AMDGPU/wqm.ll
+++ b/llvm/test/CodeGen/AMDGPU/wqm.ll
@@ -3395,6 +3395,52 @@ main_body:
   ret void
 }
 
+; Test the interaction between wqm and llvm.amdgcn.init.exec.
+define amdgpu_gs void @wqm_init_exec() {
+; GFX9-W64-LABEL: wqm_init_exec:
+; GFX9-W64:       ; %bb.0: ; %bb
+; GFX9-W64-NEXT:    s_mov_b64 exec, -1
+; GFX9-W64-NEXT:    s_mov_b32 s0, 0
+; GFX9-W64-NEXT:    v_mov_b32_e32 v0, 0
+; GFX9-W64-NEXT:    s_mov_b32 s1, s0
+; GFX9-W64-NEXT:    s_mov_b32 s2, s0
+; GFX9-W64-NEXT:    s_mov_b32 s3, s0
+; GFX9-W64-NEXT:    v_mov_b32_e32 v1, v0
+; GFX9-W64-NEXT:    v_mov_b32_e32 v2, v0
+; GFX9-W64-NEXT:    v_mov_b32_e32 v3, v0
+; GFX9-W64-NEXT:    buffer_store_dwordx4 v[0:3], off, s[0:3], 0
+; GFX9-W64-NEXT:    s_wqm_b64 exec, exec
+; GFX9-W64-NEXT:    ; kill: def $sgpr0 killed $sgpr0 killed $exec
+; GFX9-W64-NEXT:    v_mov_b32_e32 v1, s0
+; GFX9-W64-NEXT:    ds_write_b32 v0, v1
+; GFX9-W64-NEXT:    s_endpgm
+;
+; GFX10-W32-LABEL: wqm_init_exec:
+; GFX10-W32:       ; %bb.0: ; %bb
+; GFX10-W32-NEXT:    s_mov_b32 exec_lo, -1
+; GFX10-W32-NEXT:    s_mov_b32 s1, exec_lo
+; GFX10-W32-NEXT:    v_mov_b32_e32 v0, 0
+; GFX10-W32-NEXT:    s_mov_b32 s0, 0
+; GFX10-W32-NEXT:    s_wqm_b32 exec_lo, exec_lo
+; GFX10-W32-NEXT:    s_mov_b32 s2, s0
+; GFX10-W32-NEXT:    s_and_b32 exec_lo, exec_lo, s1
+; GFX10-W32-NEXT:    v_mov_b32_e32 v1, v0
+; GFX10-W32-NEXT:    v_mov_b32_e32 v2, v0
+; GFX10-W32-NEXT:    v_mov_b32_e32 v3, v0
+; GFX10-W32-NEXT:    v_mov_b32_e32 v4, s0
+; GFX10-W32-NEXT:    s_mov_b32 s1, s0
+; GFX10-W32-NEXT:    s_mov_b32 s3, s0
+; GFX10-W32-NEXT:    buffer_store_dwordx4 v[0:3], off, s[0:3], 0
+; GFX10-W32-NEXT:    ds_write_b32 v0, v4
+; GFX10-W32-NEXT:    s_endpgm
+bb:
+  call void @llvm.amdgcn.init.exec(i64 -1)
+  call void @llvm.amdgcn.raw.buffer.store.v4f32(<4 x float> zeroinitializer, <4 x i32> zeroinitializer, i32 0, i32 0, i32 0)
+  %i = call i32 @llvm.amdgcn.wqm.i32(i32 0)
+  store i32 %i, i32 addrspace(3)* null, align 4
+  ret void
+}
+
 declare void @llvm.amdgcn.exp.f32(i32, i32, float, float, float, float, i1, i1) #1
 declare void @llvm.amdgcn.image.store.1d.v4f32.i32(<4 x float>, i32, i32, <8 x i32>, i32, i32) #1

jayfoad · 2024-06-07T09:22:14Z

Rebased on #94452. Please take a look!

piotrAMD

LGTM

jayfoad requested a review from perlfu May 29, 2024 13:16

jayfoad requested a review from piotrAMD May 29, 2024 13:22

jayfoad mentioned this pull request Jun 5, 2024

[AMDGPU] Move INIT_EXEC lowering from SILowerControlFlow to SIWholeQuadMode #94452

Merged

jayfoad added 2 commits June 7, 2024 10:16

[AMDGPU] New test for WQM and llvm.amdgcn.init.exec

56d41fa

[AMDGPU] Fix interaction between WQM and llvm.amdgcn.init.exec

0e942f6

Whole quad mode requires inserting a copy of the initial EXEC mask. In a function that also uses llvm.amdgcn.init.exec, insert the COPY after initializing EXEC.

jayfoad force-pushed the siwqm-init-exec branch from d996f0a to 0e942f6 Compare June 7, 2024 09:20

jayfoad marked this pull request as ready for review June 7, 2024 09:21

llvmbot added the backend:AMDGPU label Jun 7, 2024

piotrAMD approved these changes Jun 7, 2024

View reviewed changes

jayfoad merged commit df6750e into llvm:main Jun 7, 2024
8 of 9 checks passed

jayfoad deleted the siwqm-init-exec branch June 7, 2024 12:23

HerrCai0907 mentioned this pull request Jun 13, 2024

tidy #95384

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU] Fix interaction between WQM and llvm.amdgcn.init.exec #93680

[AMDGPU] Fix interaction between WQM and llvm.amdgcn.init.exec #93680

Uh oh!

jayfoad commented May 29, 2024 •

edited

Loading

Uh oh!

jayfoad commented May 29, 2024

Uh oh!

piotrAMD commented May 29, 2024

Uh oh!

llvmbot commented Jun 7, 2024

Uh oh!

jayfoad commented Jun 7, 2024

Uh oh!

piotrAMD left a comment

Uh oh!

Uh oh!

Uh oh!

[AMDGPU] Fix interaction between WQM and llvm.amdgcn.init.exec #93680

[AMDGPU] Fix interaction between WQM and llvm.amdgcn.init.exec #93680

Uh oh!

Conversation

jayfoad commented May 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jayfoad commented May 29, 2024

Uh oh!

piotrAMD commented May 29, 2024

Uh oh!

llvmbot commented Jun 7, 2024

Uh oh!

jayfoad commented Jun 7, 2024

Uh oh!

piotrAMD left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jayfoad commented May 29, 2024 •

edited

Loading