-
Notifications
You must be signed in to change notification settings - Fork 12.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMDGPU: Move attributor into optimization pipeline #83131
Conversation
Removing it from the codegen pipeline induces a lot of test churn because llc is no longer optimizing out implicit arguments to kernels. Mostly mechanical, but there are some creative test updates. I preferred to take the changes as-is in tests where the ABI isn't relevant. In cases where it's more relevant, or the optimize out logic was too ingrained in the test, I pre-run the optimization. Some cases manually add attributes to disable inputs.
@llvm/pr-subscribers-llvm-globalisel @llvm/pr-subscribers-backend-amdgpu Author: Matt Arsenault (arsenm) ChangesRemoving it from the codegen pipeline induces a lot of test churn because llc is no longer optimizing out implicit arguments to kernels. Mostly mechanical, but there are some creative test updates. I preferred to take the changes as-is in tests where the ABI isn't relevant. In cases where it's more relevant, or the optimize out logic was too ingrained in the test, I pre-run the optimization. Some cases manually add attributes to disable inputs. Patch is 17.97 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/83131.diff 534 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index 0d830df1f1f1df..3373942a7782b6 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -781,6 +781,14 @@ void AMDGPUTargetMachine::registerPassBuilderCallbacks(
PM.addPass(createCGSCCToFunctionPassAdaptor(std::move(FPM)));
});
+
+ // FIXME: Why is AMDGPUAttributor not in CGSCC?
+ PB.registerOptimizerLastEPCallback(
+ [this](ModulePassManager &MPM, OptimizationLevel Level) {
+ if (Level != OptimizationLevel::O0) {
+ MPM.addPass(AMDGPUAttributorPass(*this));
+ }
+ });
}
int64_t AMDGPUTargetMachine::getNullPointerValue(unsigned AddrSpace) {
@@ -1043,11 +1051,6 @@ void AMDGPUPassConfig::addIRPasses() {
addPass(createAMDGPULowerModuleLDSLegacyPass(&TM));
}
- // AMDGPUAttributor infers lack of llvm.amdgcn.lds.kernel.id calls, so run
- // after their introduction
- if (TM.getOptLevel() > CodeGenOptLevel::None)
- addPass(createAMDGPUAttributorLegacyPass());
-
if (TM.getOptLevel() > CodeGenOptLevel::None)
addPass(createInferAddressSpacesPass());
diff --git a/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp b/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
index 4f106bf0dfb114..b22b6609c68673 100644
--- a/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
@@ -678,6 +678,12 @@ void SIFrameLowering::emitEntryFunctionPrologue(MachineFunction &MF,
break;
}
}
+
+ // FIXME: We can spill incoming arguments and restore at the end of the
+ // prolog.
+ if (!ScratchWaveOffsetReg)
+ report_fatal_error(
+ "could not find temporary scratch offset register in prolog");
} else {
ScratchWaveOffsetReg = PreloadedScratchWaveOffsetReg;
}
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/addsubu64.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/addsubu64.ll
index a38b6e3263882c..359c1e53de99e3 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/addsubu64.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/addsubu64.ll
@@ -6,8 +6,8 @@ define amdgpu_kernel void @s_add_u64(ptr addrspace(1) %out, i64 %a, i64 %b) {
; GFX11-LABEL: s_add_u64:
; GFX11: ; %bb.0: ; %entry
; GFX11-NEXT: s_clause 0x1
-; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x24
-; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x34
+; GFX11-NEXT: s_load_b128 s[4:7], s[2:3], 0x24
+; GFX11-NEXT: s_load_b64 s[0:1], s[2:3], 0x34
; GFX11-NEXT: v_mov_b32_e32 v2, 0
; GFX11-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-NEXT: s_add_u32 s0, s6, s0
@@ -22,8 +22,8 @@ define amdgpu_kernel void @s_add_u64(ptr addrspace(1) %out, i64 %a, i64 %b) {
; GFX12-LABEL: s_add_u64:
; GFX12: ; %bb.0: ; %entry
; GFX12-NEXT: s_clause 0x1
-; GFX12-NEXT: s_load_b128 s[4:7], s[0:1], 0x24
-; GFX12-NEXT: s_load_b64 s[0:1], s[0:1], 0x34
+; GFX12-NEXT: s_load_b128 s[4:7], s[2:3], 0x24
+; GFX12-NEXT: s_load_b64 s[0:1], s[2:3], 0x34
; GFX12-NEXT: v_mov_b32_e32 v2, 0
; GFX12-NEXT: s_wait_kmcnt 0x0
; GFX12-NEXT: s_add_nc_u64 s[0:1], s[6:7], s[0:1]
@@ -58,8 +58,8 @@ define amdgpu_kernel void @s_sub_u64(ptr addrspace(1) %out, i64 %a, i64 %b) {
; GFX11-LABEL: s_sub_u64:
; GFX11: ; %bb.0: ; %entry
; GFX11-NEXT: s_clause 0x1
-; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x24
-; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x34
+; GFX11-NEXT: s_load_b128 s[4:7], s[2:3], 0x24
+; GFX11-NEXT: s_load_b64 s[0:1], s[2:3], 0x34
; GFX11-NEXT: v_mov_b32_e32 v2, 0
; GFX11-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-NEXT: s_sub_u32 s0, s6, s0
@@ -74,8 +74,8 @@ define amdgpu_kernel void @s_sub_u64(ptr addrspace(1) %out, i64 %a, i64 %b) {
; GFX12-LABEL: s_sub_u64:
; GFX12: ; %bb.0: ; %entry
; GFX12-NEXT: s_clause 0x1
-; GFX12-NEXT: s_load_b128 s[4:7], s[0:1], 0x24
-; GFX12-NEXT: s_load_b64 s[0:1], s[0:1], 0x34
+; GFX12-NEXT: s_load_b128 s[4:7], s[2:3], 0x24
+; GFX12-NEXT: s_load_b64 s[0:1], s[2:3], 0x34
; GFX12-NEXT: v_mov_b32_e32 v2, 0
; GFX12-NEXT: s_wait_kmcnt 0x0
; GFX12-NEXT: s_sub_nc_u64 s[0:1], s[6:7], s[0:1]
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_udec_wrap.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_udec_wrap.ll
index b04bc04ab22691..705bcbddf227a6 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_udec_wrap.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_udec_wrap.ll
@@ -16,8 +16,8 @@ declare i32 @llvm.amdgcn.workitem.id.x() #0
define amdgpu_kernel void @lds_atomic_dec_ret_i32(ptr addrspace(1) %out, ptr addrspace(3) %ptr) #1 {
; CI-LABEL: lds_atomic_dec_ret_i32:
; CI: ; %bb.0:
-; CI-NEXT: s_load_dword s2, s[4:5], 0x2
-; CI-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
+; CI-NEXT: s_load_dword s2, s[6:7], 0x2
+; CI-NEXT: s_load_dwordx2 s[0:1], s[6:7], 0x0
; CI-NEXT: v_mov_b32_e32 v0, 42
; CI-NEXT: s_mov_b32 m0, -1
; CI-NEXT: s_waitcnt lgkmcnt(0)
@@ -31,8 +31,8 @@ define amdgpu_kernel void @lds_atomic_dec_ret_i32(ptr addrspace(1) %out, ptr add
;
; VI-LABEL: lds_atomic_dec_ret_i32:
; VI: ; %bb.0:
-; VI-NEXT: s_load_dword s2, s[4:5], 0x8
-; VI-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
+; VI-NEXT: s_load_dword s2, s[6:7], 0x8
+; VI-NEXT: s_load_dwordx2 s[0:1], s[6:7], 0x0
; VI-NEXT: v_mov_b32_e32 v0, 42
; VI-NEXT: s_mov_b32 m0, -1
; VI-NEXT: s_waitcnt lgkmcnt(0)
@@ -46,8 +46,8 @@ define amdgpu_kernel void @lds_atomic_dec_ret_i32(ptr addrspace(1) %out, ptr add
;
; GFX9-LABEL: lds_atomic_dec_ret_i32:
; GFX9: ; %bb.0:
-; GFX9-NEXT: s_load_dword s2, s[4:5], 0x8
-; GFX9-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
+; GFX9-NEXT: s_load_dword s2, s[6:7], 0x8
+; GFX9-NEXT: s_load_dwordx2 s[0:1], s[6:7], 0x0
; GFX9-NEXT: v_mov_b32_e32 v1, 42
; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_mov_b32_e32 v0, s2
@@ -59,11 +59,11 @@ define amdgpu_kernel void @lds_atomic_dec_ret_i32(ptr addrspace(1) %out, ptr add
;
; GFX10-LABEL: lds_atomic_dec_ret_i32:
; GFX10: ; %bb.0:
-; GFX10-NEXT: s_load_dword s0, s[4:5], 0x8
+; GFX10-NEXT: s_load_dword s0, s[6:7], 0x8
; GFX10-NEXT: v_mov_b32_e32 v1, 42
; GFX10-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-NEXT: v_mov_b32_e32 v0, s0
-; GFX10-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
+; GFX10-NEXT: s_load_dwordx2 s[0:1], s[6:7], 0x0
; GFX10-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-NEXT: ds_dec_rtn_u32 v0, v0, v1
; GFX10-NEXT: s_waitcnt lgkmcnt(0)
@@ -74,11 +74,11 @@ define amdgpu_kernel void @lds_atomic_dec_ret_i32(ptr addrspace(1) %out, ptr add
;
; GFX11-LABEL: lds_atomic_dec_ret_i32:
; GFX11: ; %bb.0:
-; GFX11-NEXT: s_clause 0x1
-; GFX11-NEXT: s_load_b32 s2, s[0:1], 0x8
-; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
+; GFX11-NEXT: s_load_b32 s0, s[2:3], 0x8
+; GFX11-NEXT: s_waitcnt lgkmcnt(0)
+; GFX11-NEXT: v_dual_mov_b32 v1, 42 :: v_dual_mov_b32 v0, s0
+; GFX11-NEXT: s_load_b64 s[0:1], s[2:3], 0x0
; GFX11-NEXT: s_waitcnt lgkmcnt(0)
-; GFX11-NEXT: v_dual_mov_b32 v1, 42 :: v_dual_mov_b32 v0, s2
; GFX11-NEXT: ds_dec_rtn_u32 v0, v0, v1
; GFX11-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-NEXT: buffer_gl0_inv
@@ -95,8 +95,8 @@ define amdgpu_kernel void @lds_atomic_dec_ret_i32(ptr addrspace(1) %out, ptr add
define amdgpu_kernel void @lds_atomic_dec_ret_i32_offset(ptr addrspace(1) %out, ptr addrspace(3) %ptr) #1 {
; CI-LABEL: lds_atomic_dec_ret_i32_offset:
; CI: ; %bb.0:
-; CI-NEXT: s_load_dword s2, s[4:5], 0x2
-; CI-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
+; CI-NEXT: s_load_dword s2, s[6:7], 0x2
+; CI-NEXT: s_load_dwordx2 s[0:1], s[6:7], 0x0
; CI-NEXT: v_mov_b32_e32 v0, 42
; CI-NEXT: s_mov_b32 m0, -1
; CI-NEXT: s_waitcnt lgkmcnt(0)
@@ -110,8 +110,8 @@ define amdgpu_kernel void @lds_atomic_dec_ret_i32_offset(ptr addrspace(1) %out,
;
; VI-LABEL: lds_atomic_dec_ret_i32_offset:
; VI: ; %bb.0:
-; VI-NEXT: s_load_dword s2, s[4:5], 0x8
-; VI-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
+; VI-NEXT: s_load_dword s2, s[6:7], 0x8
+; VI-NEXT: s_load_dwordx2 s[0:1], s[6:7], 0x0
; VI-NEXT: v_mov_b32_e32 v0, 42
; VI-NEXT: s_mov_b32 m0, -1
; VI-NEXT: s_waitcnt lgkmcnt(0)
@@ -125,8 +125,8 @@ define amdgpu_kernel void @lds_atomic_dec_ret_i32_offset(ptr addrspace(1) %out,
;
; GFX9-LABEL: lds_atomic_dec_ret_i32_offset:
; GFX9: ; %bb.0:
-; GFX9-NEXT: s_load_dword s2, s[4:5], 0x8
-; GFX9-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
+; GFX9-NEXT: s_load_dword s2, s[6:7], 0x8
+; GFX9-NEXT: s_load_dwordx2 s[0:1], s[6:7], 0x0
; GFX9-NEXT: v_mov_b32_e32 v0, 42
; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_mov_b32_e32 v1, s2
@@ -138,11 +138,11 @@ define amdgpu_kernel void @lds_atomic_dec_ret_i32_offset(ptr addrspace(1) %out,
;
; GFX10-LABEL: lds_atomic_dec_ret_i32_offset:
; GFX10: ; %bb.0:
-; GFX10-NEXT: s_load_dword s0, s[4:5], 0x8
+; GFX10-NEXT: s_load_dword s0, s[6:7], 0x8
; GFX10-NEXT: v_mov_b32_e32 v0, 42
; GFX10-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-NEXT: v_mov_b32_e32 v1, s0
-; GFX10-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
+; GFX10-NEXT: s_load_dwordx2 s[0:1], s[6:7], 0x0
; GFX10-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-NEXT: ds_dec_rtn_u32 v0, v1, v0 offset:16
; GFX10-NEXT: s_waitcnt lgkmcnt(0)
@@ -153,11 +153,11 @@ define amdgpu_kernel void @lds_atomic_dec_ret_i32_offset(ptr addrspace(1) %out,
;
; GFX11-LABEL: lds_atomic_dec_ret_i32_offset:
; GFX11: ; %bb.0:
-; GFX11-NEXT: s_load_b32 s2, s[0:1], 0x8
-; GFX11-NEXT: v_mov_b32_e32 v0, 42
-; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
+; GFX11-NEXT: s_load_b32 s0, s[2:3], 0x8
+; GFX11-NEXT: s_waitcnt lgkmcnt(0)
+; GFX11-NEXT: v_dual_mov_b32 v0, 42 :: v_dual_mov_b32 v1, s0
+; GFX11-NEXT: s_load_b64 s[0:1], s[2:3], 0x0
; GFX11-NEXT: s_waitcnt lgkmcnt(0)
-; GFX11-NEXT: v_mov_b32_e32 v1, s2
; GFX11-NEXT: ds_dec_rtn_u32 v0, v1, v0 offset:16
; GFX11-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-NEXT: buffer_gl0_inv
@@ -175,7 +175,7 @@ define amdgpu_kernel void @lds_atomic_dec_ret_i32_offset(ptr addrspace(1) %out,
define amdgpu_kernel void @lds_atomic_dec_noret_i32(ptr addrspace(3) %ptr) #1 {
; CI-LABEL: lds_atomic_dec_noret_i32:
; CI: ; %bb.0:
-; CI-NEXT: s_load_dword s0, s[4:5], 0x0
+; CI-NEXT: s_load_dword s0, s[6:7], 0x0
; CI-NEXT: v_mov_b32_e32 v0, 42
; CI-NEXT: s_mov_b32 m0, -1
; CI-NEXT: s_waitcnt lgkmcnt(0)
@@ -186,7 +186,7 @@ define amdgpu_kernel void @lds_atomic_dec_noret_i32(ptr addrspace(3) %ptr) #1 {
;
; VI-LABEL: lds_atomic_dec_noret_i32:
; VI: ; %bb.0:
-; VI-NEXT: s_load_dword s0, s[4:5], 0x0
+; VI-NEXT: s_load_dword s0, s[6:7], 0x0
; VI-NEXT: v_mov_b32_e32 v0, 42
; VI-NEXT: s_mov_b32 m0, -1
; VI-NEXT: s_waitcnt lgkmcnt(0)
@@ -197,7 +197,7 @@ define amdgpu_kernel void @lds_atomic_dec_noret_i32(ptr addrspace(3) %ptr) #1 {
;
; GFX9-LABEL: lds_atomic_dec_noret_i32:
; GFX9: ; %bb.0:
-; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0
+; GFX9-NEXT: s_load_dword s0, s[6:7], 0x0
; GFX9-NEXT: v_mov_b32_e32 v1, 42
; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_mov_b32_e32 v0, s0
@@ -207,7 +207,7 @@ define amdgpu_kernel void @lds_atomic_dec_noret_i32(ptr addrspace(3) %ptr) #1 {
;
; GFX10-LABEL: lds_atomic_dec_noret_i32:
; GFX10: ; %bb.0:
-; GFX10-NEXT: s_load_dword s0, s[4:5], 0x0
+; GFX10-NEXT: s_load_dword s0, s[6:7], 0x0
; GFX10-NEXT: v_mov_b32_e32 v1, 42
; GFX10-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-NEXT: v_mov_b32_e32 v0, s0
@@ -218,7 +218,7 @@ define amdgpu_kernel void @lds_atomic_dec_noret_i32(ptr addrspace(3) %ptr) #1 {
;
; GFX11-LABEL: lds_atomic_dec_noret_i32:
; GFX11: ; %bb.0:
-; GFX11-NEXT: s_load_b32 s0, s[0:1], 0x0
+; GFX11-NEXT: s_load_b32 s0, s[2:3], 0x0
; GFX11-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-NEXT: v_dual_mov_b32 v1, 42 :: v_dual_mov_b32 v0, s0
; GFX11-NEXT: ds_dec_u32 v0, v1
@@ -232,7 +232,7 @@ define amdgpu_kernel void @lds_atomic_dec_noret_i32(ptr addrspace(3) %ptr) #1 {
define amdgpu_kernel void @lds_atomic_dec_noret_i32_offset(ptr addrspace(3) %ptr) #1 {
; CI-LABEL: lds_atomic_dec_noret_i32_offset:
; CI: ; %bb.0:
-; CI-NEXT: s_load_dword s0, s[4:5], 0x0
+; CI-NEXT: s_load_dword s0, s[6:7], 0x0
; CI-NEXT: v_mov_b32_e32 v0, 42
; CI-NEXT: s_mov_b32 m0, -1
; CI-NEXT: s_waitcnt lgkmcnt(0)
@@ -243,7 +243,7 @@ define amdgpu_kernel void @lds_atomic_dec_noret_i32_offset(ptr addrspace(3) %ptr
;
; VI-LABEL: lds_atomic_dec_noret_i32_offset:
; VI: ; %bb.0:
-; VI-NEXT: s_load_dword s0, s[4:5], 0x0
+; VI-NEXT: s_load_dword s0, s[6:7], 0x0
; VI-NEXT: v_mov_b32_e32 v0, 42
; VI-NEXT: s_mov_b32 m0, -1
; VI-NEXT: s_waitcnt lgkmcnt(0)
@@ -254,7 +254,7 @@ define amdgpu_kernel void @lds_atomic_dec_noret_i32_offset(ptr addrspace(3) %ptr
;
; GFX9-LABEL: lds_atomic_dec_noret_i32_offset:
; GFX9: ; %bb.0:
-; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0
+; GFX9-NEXT: s_load_dword s0, s[6:7], 0x0
; GFX9-NEXT: v_mov_b32_e32 v0, 42
; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_mov_b32_e32 v1, s0
@@ -264,7 +264,7 @@ define amdgpu_kernel void @lds_atomic_dec_noret_i32_offset(ptr addrspace(3) %ptr
;
; GFX10-LABEL: lds_atomic_dec_noret_i32_offset:
; GFX10: ; %bb.0:
-; GFX10-NEXT: s_load_dword s0, s[4:5], 0x0
+; GFX10-NEXT: s_load_dword s0, s[6:7], 0x0
; GFX10-NEXT: v_mov_b32_e32 v0, 42
; GFX10-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-NEXT: v_mov_b32_e32 v1, s0
@@ -275,7 +275,7 @@ define amdgpu_kernel void @lds_atomic_dec_noret_i32_offset(ptr addrspace(3) %ptr
;
; GFX11-LABEL: lds_atomic_dec_noret_i32_offset:
; GFX11: ; %bb.0:
-; GFX11-NEXT: s_load_b32 s0, s[0:1], 0x0
+; GFX11-NEXT: s_load_b32 s0, s[2:3], 0x0
; GFX11-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-NEXT: v_dual_mov_b32 v0, 42 :: v_dual_mov_b32 v1, s0
; GFX11-NEXT: ds_dec_u32 v1, v0 offset:16
@@ -290,7 +290,7 @@ define amdgpu_kernel void @lds_atomic_dec_noret_i32_offset(ptr addrspace(3) %ptr
define amdgpu_kernel void @global_atomic_dec_ret_i32(ptr addrspace(1) %out, ptr addrspace(1) %ptr) #1 {
; CI-LABEL: global_atomic_dec_ret_i32:
; CI: ; %bb.0:
-; CI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
+; CI-NEXT: s_load_dwordx4 s[0:3], s[6:7], 0x0
; CI-NEXT: v_mov_b32_e32 v2, 42
; CI-NEXT: s_waitcnt lgkmcnt(0)
; CI-NEXT: v_mov_b32_e32 v0, s2
@@ -305,7 +305,7 @@ define amdgpu_kernel void @global_atomic_dec_ret_i32(ptr addrspace(1) %out, ptr
;
; VI-LABEL: global_atomic_dec_ret_i32:
; VI: ; %bb.0:
-; VI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
+; VI-NEXT: s_load_dwordx4 s[0:3], s[6:7], 0x0
; VI-NEXT: v_mov_b32_e32 v2, 42
; VI-NEXT: s_waitcnt lgkmcnt(0)
; VI-NEXT: v_mov_b32_e32 v0, s2
@@ -320,7 +320,7 @@ define amdgpu_kernel void @global_atomic_dec_ret_i32(ptr addrspace(1) %out, ptr
;
; GFX9-LABEL: global_atomic_dec_ret_i32:
; GFX9: ; %bb.0:
-; GFX9-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
+; GFX9-NEXT: s_load_dwordx4 s[0:3], s[6:7], 0x0
; GFX9-NEXT: v_mov_b32_e32 v0, 42
; GFX9-NEXT: v_mov_b32_e32 v1, 0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)
@@ -332,7 +332,7 @@ define amdgpu_kernel void @global_atomic_dec_ret_i32(ptr addrspace(1) %out, ptr
;
; GFX10-LABEL: global_atomic_dec_ret_i32:
; GFX10: ; %bb.0:
-; GFX10-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
+; GFX10-NEXT: s_load_dwordx4 s[0:3], s[6:7], 0x0
; GFX10-NEXT: v_mov_b32_e32 v0, 42
; GFX10-NEXT: v_mov_b32_e32 v1, 0
; GFX10-NEXT: s_waitcnt lgkmcnt(0)
@@ -345,7 +345,7 @@ define amdgpu_kernel void @global_atomic_dec_ret_i32(ptr addrspace(1) %out, ptr
;
; GFX11-LABEL: global_atomic_dec_ret_i32:
; GFX11: ; %bb.0:
-; GFX11-NEXT: s_load_b128 s[0:3], s[0:1], 0x0
+; GFX11-NEXT: s_load_b128 s[0:3], s[2:3], 0x0
; GFX11-NEXT: v_dual_mov_b32 v0, 42 :: v_dual_mov_b32 v1, 0
; GFX11-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-NEXT: global_atomic_dec_u32 v0, v1, v0, s[2:3] glc
@@ -364,7 +364,7 @@ define amdgpu_kernel void @global_atomic_dec_ret_i32(ptr addrspace(1) %out, ptr
define amdgpu_kernel void @global_atomic_dec_ret_i32_offset(ptr addrspace(1) %out, ptr addrspace(1) %ptr) #1 {
; CI-LABEL: global_atomic_dec_ret_i32_offset:
; CI: ; %bb.0:
-; CI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
+; CI-NEXT: s_load_dwordx4 s[0:3], s[6:7], 0x0
; CI-NEXT: v_mov_b32_e32 v2, 42
; CI-NEXT: s_waitcnt lgkmcnt(0)
; CI-NEXT: s_add_u32 s2, s2, 16
@@ -381,7 +381,7 @@ define amdgpu_kernel void @global_atomic_dec_ret_i32_offset(ptr addrspace(1) %ou
;
; VI-LABEL: global_atomic_dec_ret_i32_offset:
; VI: ; %bb.0:
-; VI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
+; VI-NEXT: s_load_dwordx4 s[0:3], s[6:7], 0x0
; VI-NEXT: v_mov_b32_e32 v2, 42
; VI-NEXT: s_waitcnt lgkmcnt(0)
; VI-NEXT: s_add_u32 s2, s2, 16
@@ -398,7 +398,7 @@ define amdgpu_kernel void @global_atomic_dec_ret_i32_offset(ptr addrspace(1) %ou
;
; GFX9-LABEL: global_atomic_dec_ret_i32_offset:
; GFX9: ; %bb.0:
-; GFX9-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
+; GFX9-NEXT: s_load_dwordx4 s[0:3], s[6:7], 0x0
; GFX9-NEXT: v_mov_b32_e32 v0, 42
; GFX9-NEXT: v_mov_b32_e32 v1, 0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)
@@ -410,7 +410,7 @@ define amdgpu_kernel void @global_atomic_dec_ret_i32_offset(ptr addrspace(1) %ou
;
; GFX10-LABEL: global_atomic_dec_ret_i32_offset:
; GFX10: ; %bb.0:
-; GFX10-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
+; GFX10-NEXT: s_load_dwordx4 s[0:3], s[6:7], 0x0
; GFX10-NEXT: v_mov_b32_e32 v0, 42
; GFX10-NEXT: v_mov_b32_e32 v1, 0
; GFX10-NEXT: s_waitcnt lgkmcnt(0)
@@ -423,7 +423,7 @@ define amdgpu_kernel void @global_atomic_dec_ret_i32_offset(ptr addrspace(1) %ou
;
; GFX11-LABEL: global_atomic_dec_ret_i32_offset:
; GFX11: ; %bb.0:
-; GFX11-NEXT: s_load_b128 s[0:3], s[0:1], 0x0
+; GFX11-NEXT: s_load_b128 s[0:3], s[2:3], 0x0
; GFX11-NEXT: v_dual_mov_b32 v0, 42 :: v_dual_mov_b32 v1, 0
; GFX11-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-NEXT: global_atomic_dec_u32 v0, v1, v0, s[2:3] offset:16 glc
@@ -443,7 +443,7 @@ define amdgpu_kernel void @global_atomic_dec_ret_i32_offset(ptr addrspace(1) %ou
define amdgpu_kernel void @global_atomic_dec_ret_i32_offset_system(ptr addrspace(1) %out, ptr addrspace(1) %ptr) #1 {
; CI-LABEL: global_atomic_dec_ret_i32_offset_system:
; CI: ; %bb.0:
-; CI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
+; CI-NEXT: s_load_dwordx4 s[0:3], s[6:7], 0x0
; CI-NEXT: v_mov_b32_e32 v2, 42
; CI-NEXT: s_waitcnt lgkmcnt(0)
; CI-NEXT: s_add_u32 s2, s2, 16
@@ -460,7 +460,7 @@ define amdgpu_kernel void @global_atomic_dec_ret_i32_offset_system(ptr addrspace
;
; VI-LABEL: global_atomic_dec_ret_i32_offset_system:
; VI: ; %bb.0:
-; VI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
+; VI-NEXT: s_load_dwordx4 s[0:3], s[6:7], 0x0
; VI-NEXT: v_mov_b32_e32 v2, 42
; VI-NEXT: s_waitcnt lgkmcnt(0)
; VI-NEXT: s_add_u32 s2, s2, 16
@@ -477,7 +477,7 @@ define amdgpu_kernel void @global_atomic_dec_ret_i32_offset_system(ptr addrspace
;
; GFX9-LABEL: global_atomic_dec_ret_i32_offset_system:
; GFX9: ; %bb.0:
-; GFX9-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
+; GFX9-NEXT: s_load_dwordx4 s[0:3], s[6:7], 0x0
; GFX9-NEXT: v_mov_b32_e32 v0, 42
; GFX9-NEXT: v_mov_b32_e32 v1, 0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)
@@ -489,7 +489,7 @@ define amdgpu_kernel void @global_atomic_dec_ret_i32_offset_system(ptr addrspace
;
; GFX10-LABEL: global_atomic_dec_ret_i32_offset_system:
; GFX10: ; %bb.0:
-; GFX10-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
+; GFX10-NEXT: s_load_dwordx4 s[0:3], s[6:7], 0x0
; GFX10-NEXT: v_mov_b32_e32 v0, 42
; GFX10-NEXT: v_mov_b32_e32 v1, 0
; GFX10-NEXT: s_waitcnt lgkmcnt(0)
@@ -502,7 +502,7 @@ define amdgpu_kernel void @global_atomic_dec_ret_i32_offset_system(ptr addrspace
;
; GFX11-LABEL: global_atomic_dec_ret_i32_offset_system:
; GFX11: ; %bb.0:
-; GFX11-NEXT: s_load_b128 s[0:3], s[0:1], 0x0
+; GFX11-NEXT: s_load_b128 s[0:3], s[2:3], 0x0
; GFX11-NEXT: v_dual_mov_b32 v0, 42 :: v_dual_mov_b32 v1, 0
; GFX11-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-NEXT: global_atomic_dec_u32 v0, v1, v0, s[2:3] offset:16 glc
@@ -522,7 +522,7 @@ define amdgpu_kernel void @global_atomic_dec_ret_i32_offset...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The non-test related changes look fairly straightforward. LGTM.
|
||
// FIXME: We can spill incoming arguments and restore at the end of the | ||
// prolog. | ||
if (!ScratchWaveOffsetReg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change related?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kind of. One test hit the assert due to the sgpr limiting attribute
The main question, is it required for correctness now or not? I.e. can we use -O0 w/o the attributor? If not required for -O0 then LGTM. |
No, it never was. The "required" stuff was in the other pass which was deleted already |
This is stuck due to OpenCL blender failing in PSDB (which is almost certainly a result of perturbing anything, such that something else already broken is exposed) |
Explicitly mark the unused implicit arguments in the test, since this should be sensitive to the number of free user SGPRs. This is in preparation for llvm#83131.
Explicitly mark the unused implicit arguments in the test, since this should be sensitive to the number of free user SGPRs. This is in preparation for #83131.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/144/builds/2221 Here is the relevant piece of the build log for the reference:
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/30/builds/1923 Here is the relevant piece of the build log for the reference:
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/12/builds/1814 Here is the relevant piece of the build log for the reference:
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/3/builds/1447 Here is the relevant piece of the build log for the reference:
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/123/builds/1764 Here is the relevant piece of the build log for the reference:
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/73/builds/1794 Here is the relevant piece of the build log for the reference:
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/46/builds/1587 Here is the relevant piece of the build log for the reference:
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/133/builds/1521 Here is the relevant piece of the build log for the reference:
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/190/builds/1798 Here is the relevant piece of the build log for the reference:
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/154/builds/1345 Here is the relevant piece of the build log for the reference:
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/65/builds/1354 Here is the relevant piece of the build log for the reference:
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/11/builds/1668 Here is the relevant piece of the build log for the reference:
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/23/builds/966 Here is the relevant piece of the build log for the reference:
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/161/builds/575 Here is the relevant piece of the build log for the reference:
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/193/builds/940 Here is the relevant piece of the build log for the reference:
|
Removing it from the codegen pipeline induces a lot of test churn because llc is no longer optimizing out implicit arguments to kernels. Mostly mechanical, but there are some creative test updates. I preferred to take the changes as-is in tests where the ABI isn't relevant. In cases where it's more relevant, or the optimize out logic was too ingrained in the test, I pre-run the optimization. Some cases manually add attributes to disable inputs.
@arsenm are you aware that there is a test failure from this change that is still failing now about 16 hours later? Can you please take a look and revert if you need time to investigate so that we can get the bots back to green? Some failing bots: |
I think we should go ahead and revert this change given that it's been almost an entire day and the builders are still broken. |
…and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851) This reverts commits 677cc15 and 78bc1b6. The test CodeGenHIP/default-attributes.hip is failing on multiple bots even after the attempted fix including the following: - https://lab.llvm.org/buildbot/#/builders/3/builds/1473 - https://lab.llvm.org/buildbot/#/builders/65/builds/1380 - https://lab.llvm.org/buildbot/#/builders/161/builds/595 - https://lab.llvm.org/buildbot/#/builders/154/builds/1372 - https://lab.llvm.org/buildbot/#/builders/133/builds/1547 - https://lab.llvm.org/buildbot/#/builders/81/builds/755 - https://lab.llvm.org/buildbot/#/builders/40/builds/570 - https://lab.llvm.org/buildbot/#/builders/13/builds/748 - https://lab.llvm.org/buildbot/#/builders/12/builds/1845 - https://lab.llvm.org/buildbot/#/builders/11/builds/1695 - https://lab.llvm.org/buildbot/#/builders/190/builds/1829 - https://lab.llvm.org/buildbot/#/builders/193/builds/962 - https://lab.llvm.org/buildbot/#/builders/23/builds/991 - https://lab.llvm.org/buildbot/#/builders/144/builds/2256 - https://lab.llvm.org/buildbot/#/builders/46/builds/1614 These bots have been broken for a day, so reverting to get everything back to green.
Hey @arsenm this broke all AMDGPU OpenMP Offload buildbots (e.g., https://lab.llvm.org/buildbot/#/builders/30). Any chance you can fix these issues? |
Can you attach before/after IR and just XFAIL them for now? |
In a recent change to the [LLVM AMD backend](llvm/llvm-project#83131), we moved the `AMDGPUAttributor` pass into the optimization pipeline (as opposed to the codegen pipeline). Since this is a pass specific for `AMD` targets, we want to pass the `TargetMachine` when building the pipeline, i.e., during the call to `optimize_module`. Failure to do so will result in an increase of number of registers used. Also, we spoke with our LLVM backend team, and they advised to always pass the `TargetMachine` when building the LLVM optimization pipeline. This PR is addressing this issue, in the following way: - I added optional parameters to the `optimize_module` funciton (similar to those passed to `translate_to_asm`) - if those params are passed in, then we will create the `TargetMachine` and pass it to the `PassBuilder` - Otherwise the `TargetMachine` will still be `nullptr` (as it was before) Please note that, as it stands now, this change will only effect the AMD backend.
In a recent change to the [LLVM AMD backend](llvm/llvm-project#83131), we moved the `AMDGPUAttributor` pass into the optimization pipeline (as opposed to the codegen pipeline). Since this is a pass specific for `AMD` targets, we want to pass the `TargetMachine` when building the pipeline, i.e., during the call to `optimize_module`. Failure to do so will result in an increase of number of registers used. Also, we spoke with our LLVM backend team, and they advised to always pass the `TargetMachine` when building the LLVM optimization pipeline. This PR is addressing this issue, in the following way: - I added optional parameters to the `optimize_module` funciton (similar to those passed to `translate_to_asm`) - if those params are passed in, then we will create the `TargetMachine` and pass it to the `PassBuilder` - Otherwise the `TargetMachine` will still be `nullptr` (as it was before) Please note that, as it stands now, this change will only effect the AMD backend.
…lvm#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (llvm#98851)" This reverts commit b1bcb7c. Change-Id: Ia262230003989ed152f82ea475364b42d2592090
In a recent change to the [LLVM AMD backend](llvm/llvm-project#83131), we moved the `AMDGPUAttributor` pass into the optimization pipeline (as opposed to the codegen pipeline). Since this is a pass specific for `AMD` targets, we want to pass the `TargetMachine` when building the pipeline, i.e., during the call to `optimize_module`. Failure to do so will result in an increase of number of registers used. Also, we spoke with our LLVM backend team, and they advised to always pass the `TargetMachine` when building the LLVM optimization pipeline. This PR is addressing this issue, in the following way: - I added optional parameters to the `optimize_module` funciton (similar to those passed to `translate_to_asm`) - if those params are passed in, then we will create the `TargetMachine` and pass it to the `PassBuilder` - Otherwise the `TargetMachine` will still be `nullptr` (as it was before) Please note that, as it stands now, this change will only effect the AMD backend.
Removing it from the codegen pipeline induces a lot of test churn because llc is no longer optimizing out implicit arguments to kernels.
Mostly mechanical, but there are some creative test updates. I preferred to take the changes as-is in tests where the ABI isn't relevant. In cases where it's more relevant, or the optimize out logic was too ingrained in the test, I pre-run the optimization. Some cases manually add attributes to disable inputs.