-
Notifications
You must be signed in to change notification settings - Fork 12.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDGPU] Remove one case of vmcnt loop header flushing for GFX12 #105550
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
@llvm/pr-subscribers-backend-amdgpu Author: Jay Foad (jayfoad) ChangesWhen a loop contains a VMEM load whose result is only used outside the Full diff: https://github.com/llvm/llvm-project/pull/105550.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index 4262e7b5d9c25..eafe20be17d5b 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -2390,7 +2390,7 @@ bool SIInsertWaitcnts::shouldFlushVmCnt(MachineLoop *ML,
}
if (!ST->hasVscnt() && HasVMemStore && !HasVMemLoad && UsesVgprLoadedOutside)
return true;
- return HasVMemLoad && UsesVgprLoadedOutside;
+ return HasVMemLoad && UsesVgprLoadedOutside && ST->hasVmemWriteVgprInOrder();
}
bool SIInsertWaitcnts::runOnMachineFunction(MachineFunction &MF) {
diff --git a/llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir b/llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir
index bdef55ab956a0..0ddd2aa285b26 100644
--- a/llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir
+++ b/llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir
@@ -295,7 +295,7 @@ body: |
# GFX12-LABEL: waitcnt_vm_loop2
# GFX12-LABEL: bb.0:
# GFX12: BUFFER_LOAD_FORMAT_X_IDXEN
-# GFX12: S_WAIT_LOADCNT 0
+# GFX12-NOT: S_WAIT_LOADCNT 0
# GFX12-LABEL: bb.1:
# GFX12: S_WAIT_LOADCNT 0
# GFX12-LABEL: bb.2:
@@ -342,7 +342,7 @@ body: |
# GFX12-LABEL: waitcnt_vm_loop2_store
# GFX12-LABEL: bb.0:
# GFX12: BUFFER_LOAD_FORMAT_X_IDXEN
-# GFX12: S_WAIT_LOADCNT 0
+# GFX12-NOT: S_WAIT_LOADCNT 0
# GFX12-LABEL: bb.1:
# GFX12: S_WAIT_LOADCNT 0
# GFX12-LABEL: bb.2:
@@ -499,9 +499,9 @@ body: |
# GFX12-LABEL: waitcnt_vm_loop2_reginterval
# GFX12-LABEL: bb.0:
# GFX12: GLOBAL_LOAD_DWORDX4
-# GFX12: S_WAIT_LOADCNT 0
-# GFX12-LABEL: bb.1:
# GFX12-NOT: S_WAIT_LOADCNT 0
+# GFX12-LABEL: bb.1:
+# GFX12: S_WAIT_LOADCNT 0
# GFX12-LABEL: bb.2:
name: waitcnt_vm_loop2_reginterval
body: |
@@ -600,7 +600,7 @@ body: |
# GFX12-LABEL: bb.0:
# GFX12: BUFFER_LOAD_FORMAT_X_IDXEN
# GFX12: BUFFER_LOAD_FORMAT_X_IDXEN
-# GFX12: S_WAIT_LOADCNT 0
+# GFX12-NOT: S_WAIT_LOADCNT 0
# GFX12-LABEL: bb.1:
# GFX12: S_WAIT_LOADCNT 0
# GFX12-LABEL: bb.2:
|
9a2103d
to
c3cbf18
Compare
e53f758
to
283d345
Compare
When a loop contains a VMEM load whose result is only used outside the loop, do not bother to flush vmcnt in the loop head on GFX12. A wait for vmcnt will be required inside the loop anyway, because VMEM instructions can write their VGPR results out of order.
283d345
to
ba06857
Compare
…m#105550) When a loop contains a VMEM load whose result is only used outside the loop, do not bother to flush vmcnt in the loop head on GFX12. A wait for vmcnt will be required inside the loop anyway, because VMEM instructions can write their VGPR results out of order. (cherry picked from commit fa2dccb)
…m#105550) When a loop contains a VMEM load whose result is only used outside the loop, do not bother to flush vmcnt in the loop head on GFX12. A wait for vmcnt will be required inside the loop anyway, because VMEM instructions can write their VGPR results out of order. (cherry picked from commit fa2dccb)
…m#105550) When a loop contains a VMEM load whose result is only used outside the loop, do not bother to flush vmcnt in the loop head on GFX12. A wait for vmcnt will be required inside the loop anyway, because VMEM instructions can write their VGPR results out of order.
…m#105550) When a loop contains a VMEM load whose result is only used outside the loop, do not bother to flush vmcnt in the loop head on GFX12. A wait for vmcnt will be required inside the loop anyway, because VMEM instructions can write their VGPR results out of order. (cherry picked from commit fa2dccb)
…m#105550) When a loop contains a VMEM load whose result is only used outside the loop, do not bother to flush vmcnt in the loop head on GFX12. A wait for vmcnt will be required inside the loop anyway, because VMEM instructions can write their VGPR results out of order.
When a loop contains a VMEM load whose result is only used outside the
loop, do not bother to flush vmcnt in the loop head on GFX12. A wait for
vmcnt will be required inside the loop anyway, because VMEM instructions
can write their VGPR results out of order.