[AMDGPU] Remove one case of vmcnt loop header flushing for GFX12 #105550

jayfoad · 2024-08-21T16:40:10Z

When a loop contains a VMEM load whose result is only used outside the
loop, do not bother to flush vmcnt in the loop head on GFX12. A wait for
vmcnt will be required inside the loop anyway, because VMEM instructions
can write their VGPR results out of order.

jayfoad · 2024-08-21T16:40:25Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @jayfoad and the rest of your teammates on Graphite

llvmbot · 2024-08-21T16:43:47Z

@llvm/pr-subscribers-backend-amdgpu

Author: Jay Foad (jayfoad)

Changes

When a loop contains a VMEM load whose result is only used outside the
loop, do not bother to flush vmcnt in the loop head on GFX12. A wait for
vmcnt will be required inside the loop anyway, because VMEM instructions
can write their VGPR results out of order.

Full diff: https://github.com/llvm/llvm-project/pull/105550.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir (+5-5)

diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index 4262e7b5d9c25..eafe20be17d5b 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -2390,7 +2390,7 @@ bool SIInsertWaitcnts::shouldFlushVmCnt(MachineLoop *ML,
   }
   if (!ST->hasVscnt() && HasVMemStore && !HasVMemLoad && UsesVgprLoadedOutside)
     return true;
-  return HasVMemLoad && UsesVgprLoadedOutside;
+  return HasVMemLoad && UsesVgprLoadedOutside && ST->hasVmemWriteVgprInOrder();
 }
 
 bool SIInsertWaitcnts::runOnMachineFunction(MachineFunction &MF) {
diff --git a/llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir b/llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir
index bdef55ab956a0..0ddd2aa285b26 100644
--- a/llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir
+++ b/llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir
@@ -295,7 +295,7 @@ body:             |
 # GFX12-LABEL: waitcnt_vm_loop2
 # GFX12-LABEL: bb.0:
 # GFX12: BUFFER_LOAD_FORMAT_X_IDXEN
-# GFX12: S_WAIT_LOADCNT 0
+# GFX12-NOT: S_WAIT_LOADCNT 0
 # GFX12-LABEL: bb.1:
 # GFX12: S_WAIT_LOADCNT 0
 # GFX12-LABEL: bb.2:
@@ -342,7 +342,7 @@ body:             |
 # GFX12-LABEL: waitcnt_vm_loop2_store
 # GFX12-LABEL: bb.0:
 # GFX12: BUFFER_LOAD_FORMAT_X_IDXEN
-# GFX12: S_WAIT_LOADCNT 0
+# GFX12-NOT: S_WAIT_LOADCNT 0
 # GFX12-LABEL: bb.1:
 # GFX12: S_WAIT_LOADCNT 0
 # GFX12-LABEL: bb.2:
@@ -499,9 +499,9 @@ body:             |
 # GFX12-LABEL: waitcnt_vm_loop2_reginterval
 # GFX12-LABEL: bb.0:
 # GFX12: GLOBAL_LOAD_DWORDX4
-# GFX12: S_WAIT_LOADCNT 0
-# GFX12-LABEL: bb.1:
 # GFX12-NOT: S_WAIT_LOADCNT 0
+# GFX12-LABEL: bb.1:
+# GFX12: S_WAIT_LOADCNT 0
 # GFX12-LABEL: bb.2:
 name:            waitcnt_vm_loop2_reginterval
 body:             |
@@ -600,7 +600,7 @@ body:             |
 # GFX12-LABEL: bb.0:
 # GFX12: BUFFER_LOAD_FORMAT_X_IDXEN
 # GFX12: BUFFER_LOAD_FORMAT_X_IDXEN
-# GFX12: S_WAIT_LOADCNT 0
+# GFX12-NOT: S_WAIT_LOADCNT 0
 # GFX12-LABEL: bb.1:
 # GFX12: S_WAIT_LOADCNT 0
 # GFX12-LABEL: bb.2:

When a loop contains a VMEM load whose result is only used outside the loop, do not bother to flush vmcnt in the loop head on GFX12. A wait for vmcnt will be required inside the loop anyway, because VMEM instructions can write their VGPR results out of order.

…m#105550) When a loop contains a VMEM load whose result is only used outside the loop, do not bother to flush vmcnt in the loop head on GFX12. A wait for vmcnt will be required inside the loop anyway, because VMEM instructions can write their VGPR results out of order. (cherry picked from commit fa2dccb)

…m#105550) When a loop contains a VMEM load whose result is only used outside the loop, do not bother to flush vmcnt in the loop head on GFX12. A wait for vmcnt will be required inside the loop anyway, because VMEM instructions can write their VGPR results out of order.

…m#105550) When a loop contains a VMEM load whose result is only used outside the loop, do not bother to flush vmcnt in the loop head on GFX12. A wait for vmcnt will be required inside the loop anyway, because VMEM instructions can write their VGPR results out of order. (cherry picked from commit fa2dccb)

…m#105550) When a loop contains a VMEM load whose result is only used outside the loop, do not bother to flush vmcnt in the loop head on GFX12. A wait for vmcnt will be required inside the loop anyway, because VMEM instructions can write their VGPR results out of order.

This was referenced Aug 21, 2024

[AMDGPU] Add GFX12 test coverage for vmcnt flushing in loop headers #105548

Merged

[AMDGPU] GFX12 VMEM loads can write VGPR results out of order #105549

Merged

jayfoad marked this pull request as ready for review August 21, 2024 16:43

llvmbot added the backend:AMDGPU label Aug 21, 2024

jayfoad requested review from arsenm, jmmartinez, nhaehnle, bsaleil, Pierre-vh, mjbedy and stepthomas August 22, 2024 08:59

jayfoad force-pushed the users/foad/vmem-write-vgpr-in-order_split branch from 9a2103d to c3cbf18 Compare August 22, 2024 10:43

Base automatically changed from users/foad/vmem-write-vgpr-in-order_split to main August 22, 2024 10:46

jayfoad force-pushed the users/foad/vmem-write-vgpr-in-order_split_split branch from e53f758 to 283d345 Compare August 22, 2024 10:48

arsenm approved these changes Aug 22, 2024

View reviewed changes

jayfoad force-pushed the users/foad/vmem-write-vgpr-in-order_split_split branch from 283d345 to ba06857 Compare August 23, 2024 08:30

jayfoad merged commit fa2dccb into main Aug 23, 2024
6 of 8 checks passed

jayfoad deleted the users/foad/vmem-write-vgpr-in-order_split_split branch August 23, 2024 09:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMDGPU] Remove one case of vmcnt loop header flushing for GFX12 #105550

[AMDGPU] Remove one case of vmcnt loop header flushing for GFX12 #105550

jayfoad commented Aug 21, 2024

jayfoad commented Aug 21, 2024 •

edited

Loading

llvmbot commented Aug 21, 2024

[AMDGPU] Remove one case of vmcnt loop header flushing for GFX12 #105550

[AMDGPU] Remove one case of vmcnt loop header flushing for GFX12 #105550

Conversation

jayfoad commented Aug 21, 2024

jayfoad commented Aug 21, 2024 • edited Loading

llvmbot commented Aug 21, 2024

jayfoad commented Aug 21, 2024 •

edited

Loading