[StackSlotColoring] Ignore non-spill objects in RemoveDeadStores. #80242

topperc · 2024-02-01T05:15:21Z

The stack slot coloring pass is concerned with optimizing spill
slots. If any change is a pass is made over the function to remove
stack stores that use the same register and stack slot as an
immediately preceding load.

The register check is too simple for constant registers like AArch64
and RISC-V's zero register. This register can be used as the result
of a load if we want to discard the result, but still have the memory
access performed. Like for a volatile or atomic load.

If the code sees a load from the zero register followed by a store
of the zero register at the same stack slot, the pass mistakenly
believes the store isn't needed.

Since the main stack coloring optimization is only concerned with
spill slots, it seems reasonable that RemoveDeadStores should
only be concerned with spills. Since we never generate a reload of
x0, this avoids the issue seen by RISC-V.

Test case concept is adapted from pr30821.mir from X86. That test
had to be updated to mark the stack slot as a spill slot.

Fixes #80052.

The stack slot coloring pass is concerned with optimizing spill slots. If any change is a pass is made over the function to remove stack stores that use the same register and stack slot as an immediately preceding load. The register check is too simple for constant registers like AArch64 and RISC-V's zero register. This register can be used as the result of a load if we want to discard the result, but still have the memory access performed. Like for a volatile or atomic load. If the code sees a load from the zero register followed by a store of the zero register at the same stack slot, the pass mistakenly believes the store isn't needed. Since the main stack coloring optimization is only concerned with spill slots, it seems reasonable that RemoveDeadStores should only be concerned with spills. Since we never generate a reload of x0, this avoids the issue seen by RISC-V. Test case concept is adapted from pr30821.mir from X86. That test had to be updated to mark the stack slot as a spill slot. Fixes llvm#80052.

llvmbot · 2024-02-01T05:15:50Z

@llvm/pr-subscribers-backend-x86

Author: Craig Topper (topperc)

Changes

The stack slot coloring pass is concerned with optimizing spill
slots. If any change is a pass is made over the function to remove
stack stores that use the same register and stack slot as an
immediately preceding load.

The register check is too simple for constant registers like AArch64
and RISC-V's zero register. This register can be used as the result
of a load if we want to discard the result, but still have the memory
access performed. Like for a volatile or atomic load.

If the code sees a load from the zero register followed by a store
of the zero register at the same stack slot, the pass mistakenly
believes the store isn't needed.

Since the main stack coloring optimization is only concerned with
spill slots, it seems reasonable that RemoveDeadStores should
only be concerned with spills. Since we never generate a reload of
x0, this avoids the issue seen by RISC-V.

Test case concept is adapted from pr30821.mir from X86. That test
had to be updated to mark the stack slot as a spill slot.

Fixes #80052.

Full diff: https://github.com/llvm/llvm-project/pull/80242.diff

3 Files Affected:

(modified) llvm/lib/CodeGen/StackSlotColoring.cpp (+1-1)
(added) llvm/test/CodeGen/RISCV/pr80052.mir (+125)
(modified) llvm/test/CodeGen/X86/pr30821.mir (+1-1)

diff --git a/llvm/lib/CodeGen/StackSlotColoring.cpp b/llvm/lib/CodeGen/StackSlotColoring.cpp
index c180f4d8f0365..6d3fc740b292a 100644
--- a/llvm/lib/CodeGen/StackSlotColoring.cpp
+++ b/llvm/lib/CodeGen/StackSlotColoring.cpp
@@ -480,7 +480,7 @@ bool StackSlotColoring::RemoveDeadStores(MachineBasicBlock* MBB) {
     if (!(StoreReg = TII->isStoreToStackSlot(*NextMI, SecondSS, StoreSize)))
       continue;
     if (FirstSS != SecondSS || LoadReg != StoreReg || FirstSS == -1 ||
-        LoadSize != StoreSize)
+        LoadSize != StoreSize || !MFI->isSpillSlotObjectIndex(FirstSS))
       continue;
 
     ++NumDead;
diff --git a/llvm/test/CodeGen/RISCV/pr80052.mir b/llvm/test/CodeGen/RISCV/pr80052.mir
new file mode 100644
index 0000000000000..8006697c3bf27
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/pr80052.mir
@@ -0,0 +1,125 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 4
+# RUN: llc %s -mtriple=riscv64 -run-pass=greedy,virtregrewriter,stack-slot-coloring -o - | FileCheck %s
+
+--- |
+  define void @foo() {
+  entry:
+    ; Dummy test just to hold the alloca
+    %alpha = alloca i32, i32 0, align 4
+    ret void
+  }
+
+...
+---
+name:            foo
+alignment:       4
+tracksRegLiveness: true
+frameInfo:
+  maxAlignment:    4
+  localFrameSize:  4
+stack:
+  - { id: 0, name: alpha, size: 1, alignment: 4, local-offset: -4 }
+machineFunctionInfo:
+  varArgsFrameIndex: 0
+  varArgsSaveSize: 0
+body:             |
+  bb.0.entry:
+    ; To trick stack-slot-colouring to run its dead-store-elimination phase,
+    ; which is at fault, we need the register allocator to run, and spill in two
+    ; places that can have their slots merged. Achieve this by volatile-loading
+    ; data into all allocatable GPRs except $x31. Then, volatile storing them
+    ; later, leaving regalloc only $x31 to play with in the middle.
+    $x1 = LW %stack.0.alpha, 0 :: (volatile load (s32) from %ir.alpha)
+    $x5 = LW %stack.0.alpha, 0 :: (volatile load (s32) from %ir.alpha)
+    $x6 = LW %stack.0.alpha, 0 :: (volatile load (s32) from %ir.alpha)
+    $x7 = LW %stack.0.alpha, 0 :: (volatile load (s32) from %ir.alpha)
+    $x8 = LW %stack.0.alpha, 0 :: (volatile load (s32) from %ir.alpha)
+    $x9 = LW %stack.0.alpha, 0 :: (volatile load (s32) from %ir.alpha)
+    $x10 = LW %stack.0.alpha, 0 :: (volatile load (s32) from %ir.alpha)
+    $x11 = LW %stack.0.alpha, 0 :: (volatile load (s32) from %ir.alpha)
+    $x12 = LW %stack.0.alpha, 0 :: (volatile load (s32) from %ir.alpha)
+    $x13 = LW %stack.0.alpha, 0 :: (volatile load (s32) from %ir.alpha)
+    $x14 = LW %stack.0.alpha, 0 :: (volatile load (s32) from %ir.alpha)
+    $x15 = LW %stack.0.alpha, 0 :: (volatile load (s32) from %ir.alpha)
+    $x16 = LW %stack.0.alpha, 0 :: (volatile load (s32) from %ir.alpha)
+    $x17 = LW %stack.0.alpha, 0 :: (volatile load (s32) from %ir.alpha)
+    $x18 = LW %stack.0.alpha, 0 :: (volatile load (s32) from %ir.alpha)
+    $x19 = LW %stack.0.alpha, 0 :: (volatile load (s32) from %ir.alpha)
+    $x20 = LW %stack.0.alpha, 0 :: (volatile load (s32) from %ir.alpha)
+    $x21 = LW %stack.0.alpha, 0 :: (volatile load (s32) from %ir.alpha)
+    $x22 = LW %stack.0.alpha, 0 :: (volatile load (s32) from %ir.alpha)
+    $x23 = LW %stack.0.alpha, 0 :: (volatile load (s32) from %ir.alpha)
+    $x24 = LW %stack.0.alpha, 0 :: (volatile load (s32) from %ir.alpha)
+    $x25 = LW %stack.0.alpha, 0 :: (volatile load (s32) from %ir.alpha)
+    $x26 = LW %stack.0.alpha, 0 :: (volatile load (s32) from %ir.alpha)
+    $x27 = LW %stack.0.alpha, 0 :: (volatile load (s32) from %ir.alpha)
+    $x28 = LW %stack.0.alpha, 0 :: (volatile load (s32) from %ir.alpha)
+    $x29 = LW %stack.0.alpha, 0 :: (volatile load (s32) from %ir.alpha)
+    $x30 = LW %stack.0.alpha, 0 :: (volatile load (s32) from %ir.alpha)
+
+    ; Force the first spill.
+    %0:gpr = LW %stack.0.alpha, 0 :: (volatile load (s32) from %ir.alpha)
+    %1:gpr = LW %stack.0.alpha, 0 :: (volatile load (s32) from %ir.alpha)
+    SW %1, %stack.0.alpha, 0 :: (volatile store (s32) into %ir.alpha)
+    SW %0, %stack.0.alpha, 0 :: (volatile store (s32) into %ir.alpha)
+
+    ; CHECK: renamable $x31 = LW %stack.0.alpha, 0 :: (volatile load (s32) from %ir.alpha)
+    ; CHECK-NEXT: SD killed renamable $x31, %stack.1, 0 :: (store (s64) into %stack.1)
+    ; CHECK-NEXT: renamable $x31 = LW %stack.0.alpha, 0 :: (volatile load (s32) from %ir.alpha)
+    ; CHECK-NEXT: SW killed renamable $x31, %stack.0.alpha, 0 :: (volatile store (s32) into %ir.alpha)
+    ; CHECK-NEXT: renamable $x31 = LD %stack.1, 0 :: (load (s64) from %stack.1)
+    ; CHECK-NEXT: SW killed renamable $x31, %stack.0.alpha, 0 :: (volatile store (s32) into %ir.alpha)
+
+    ; stack-slot-coloring doesn't know that a write to $x0 is discarded.
+    dead $x0 = LW %stack.0.alpha, 0 :: (volatile load (s32) from %ir.alpha)
+    ; This stores 0 rather than the result of the preceding load since $x0
+    ; is special.
+    ; We don't want this store to be deleted.
+    SW $x0, %stack.0.alpha, 0 :: (volatile store (s32) into %ir.alpha)
+
+    ; CHECK-NEXT: dead $x0 = LW %stack.0.alpha, 0 :: (volatile load (s32) from %ir.alpha)
+    ; CHECK-NEXT: SW $x0, %stack.0.alpha, 0 :: (volatile store (s32) into %ir.alpha)
+
+    ; Force a second spill
+    %2:gpr = LW %stack.0.alpha, 0 :: (volatile load (s32) from %ir.alpha)
+    %3:gpr = LW %stack.0.alpha, 0 :: (volatile load (s32) from %ir.alpha)
+    SW %3, %stack.0.alpha, 0 :: (volatile store (s32) into %ir.alpha)
+    SW %2, %stack.0.alpha, 0 :: (volatile store (s32) into %ir.alpha)
+
+    ; CHECK-NEXT: renamable $x31 = LW %stack.0.alpha, 0 :: (volatile load (s32) from %ir.alpha)
+    ; CHECK-NEXT: SD killed renamable $x31, %stack.1, 0 :: (store (s64) into %stack.1)
+    ; CHECK-NEXT: renamable $x31 = LW %stack.0.alpha, 0 :: (volatile load (s32) from %ir.alpha)
+    ; CHECK-NEXT: SW killed renamable $x31, %stack.0.alpha, 0 :: (volatile store (s32) into %ir.alpha)
+    ; CHECK-NEXT: renamable $x31 = LD %stack.1, 0 :: (load (s64) from %stack.1)
+    ; CHECK-NEXT: SW killed renamable $x31, %stack.0.alpha, 0 :: (volatile store (s32) into %ir.alpha)
+
+    SW $x1, %stack.0.alpha, 0 :: (volatile store (s32) into %ir.alpha)
+    SW $x5, %stack.0.alpha, 0 :: (volatile store (s32) into %ir.alpha)
+    SW $x6, %stack.0.alpha, 0 :: (volatile store (s32) into %ir.alpha)
+    SW $x7, %stack.0.alpha, 0 :: (volatile store (s32) into %ir.alpha)
+    SW $x8, %stack.0.alpha, 0 :: (volatile store (s32) into %ir.alpha)
+    SW $x9, %stack.0.alpha, 0 :: (volatile store (s32) into %ir.alpha)
+    SW $x10, %stack.0.alpha, 0 :: (volatile store (s32) into %ir.alpha)
+    SW $x11, %stack.0.alpha, 0 :: (volatile store (s32) into %ir.alpha)
+    SW $x12, %stack.0.alpha, 0 :: (volatile store (s32) into %ir.alpha)
+    SW $x13, %stack.0.alpha, 0 :: (volatile store (s32) into %ir.alpha)
+    SW $x14, %stack.0.alpha, 0 :: (volatile store (s32) into %ir.alpha)
+    SW $x15, %stack.0.alpha, 0 :: (volatile store (s32) into %ir.alpha)
+    SW $x16, %stack.0.alpha, 0 :: (volatile store (s32) into %ir.alpha)
+    SW $x17, %stack.0.alpha, 0 :: (volatile store (s32) into %ir.alpha)
+    SW $x18, %stack.0.alpha, 0 :: (volatile store (s32) into %ir.alpha)
+    SW $x19, %stack.0.alpha, 0 :: (volatile store (s32) into %ir.alpha)
+    SW $x20, %stack.0.alpha, 0 :: (volatile store (s32) into %ir.alpha)
+    SW $x21, %stack.0.alpha, 0 :: (volatile store (s32) into %ir.alpha)
+    SW $x22, %stack.0.alpha, 0 :: (volatile store (s32) into %ir.alpha)
+    SW $x23, %stack.0.alpha, 0 :: (volatile store (s32) into %ir.alpha)
+    SW $x24, %stack.0.alpha, 0 :: (volatile store (s32) into %ir.alpha)
+    SW $x25, %stack.0.alpha, 0 :: (volatile store (s32) into %ir.alpha)
+    SW $x26, %stack.0.alpha, 0 :: (volatile store (s32) into %ir.alpha)
+    SW $x27, %stack.0.alpha, 0 :: (volatile store (s32) into %ir.alpha)
+    SW $x28, %stack.0.alpha, 0 :: (volatile store (s32) into %ir.alpha)
+    SW $x29, %stack.0.alpha, 0 :: (volatile store (s32) into %ir.alpha)
+    SW $x30, %stack.0.alpha, 0 :: (volatile store (s32) into %ir.alpha)
+    PseudoRET
+
+...
diff --git a/llvm/test/CodeGen/X86/pr30821.mir b/llvm/test/CodeGen/X86/pr30821.mir
index 9591d45e7baff..992ef8bbe55f0 100644
--- a/llvm/test/CodeGen/X86/pr30821.mir
+++ b/llvm/test/CodeGen/X86/pr30821.mir
@@ -51,7 +51,7 @@ stack:
   - { id: 1, name: foxtrot, type: default, offset: 0, size: 16, alignment: 16,
       stack-id: default, callee-saved-register: '', callee-saved-restored: true,
       debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
-  - { id: 2, name: india, type: default, offset: 0, size: 16, alignment: 16,
+  - { id: 2, name: india, type: spill-slot, offset: 0, size: 16, alignment: 16,
       stack-id: default, callee-saved-register: '', callee-saved-restored: true,
       debug-info-variable: '', debug-info-expression: '',
       debug-info-location: '' }

arsenm · 2024-02-01T07:06:50Z

llvm/test/CodeGen/RISCV/pr80052.mir

+    ; places that can have their slots merged. Achieve this by volatile-loading
+    ; data into all allocatable GPRs except $x31. Then, volatile storing them
+    ; later, leaving regalloc only $x31 to play with in the middle.
+    $x1 = LW %stack.0.alpha, 0 :: (volatile load (s32) from %ir.alpha)


you can replace all the MMO IR references with %stack.0 and drop the IR section

preames · 2024-02-01T17:44:28Z

LGTM to me, but I'd like a second opinion as I'm not particularly familiar with this part of code.

efriedma-quic · 2024-02-01T18:21:20Z

Shouldn't isLoadFromStackSlot fail if the target register is the zero register? I mean, it's technically a load, and the source is a stack slot, but saying it loads a value into x0 is nonsense.

More generally, looking at the documentation for isLoadFromStackSlot and isStoreToStackSlot, I think they should perform all the checks necessary to make this transform legal.

preames · 2024-02-01T18:30:28Z

Shouldn't isLoadFromStackSlot fail if the target register is the zero register? I mean, it's technically a load, and the source is a stack slot, but saying it loads a value into x0 is nonsense.

More generally, looking at the documentation for isLoadFromStackSlot and isStoreToStackSlot, I think they should perform all the checks necessary to make this transform legal.

Craig and I had chatted about this a bit yesterday. This had been my initial position as well. However, he pointed out that we had other calls of these routines which check the spill slot frame index property after returning. So there's definitely room for a cleanup here, but getting a functional fix in as a starting place seemed reasonable. This approach isn't introducing a new style issue, it's replicating an existing one. :(

efriedma-quic · 2024-02-01T18:37:43Z

I assume you're not actually suggesting it's correct for isLoadFromStackSlot to return x0? I can't see any reasonable way to argue it's a useful result. The fix here only makes it less likely for that to cause issues (since there isn't really any reason for the compiler to generate no-op load from a spill slot).

We could say more generally, that isLoadFromStackSlot/isStoreToStackSlot implementations are generally buggy, so we want to be conservative about how we actually use the results. But that doesn't seem like it should be the first fix here.

preames · 2024-02-01T18:45:35Z

I think the X0 thing here is almost a red herring. This transform would remove a volatile load/store pair in user written code which referenced a stack object. That seems suspect. It just doesn't run unless we happen to merge a stack slot so it's doesn't happen much in practice. This transform running on anything which is not a stack slot is - I think - a bug.

Worth noting is that no other target has the frame index restriction inside the isLoadFromStackSlot routine either.

efriedma-quic · 2024-02-01T19:22:29Z

So I think we have the following points:

isLoadFromStackSlot doesn't make sense for volatile loads.
Returning loads targeting a zero register from isLoadFromStackSlot is bad. This is unlikely to be relevant for non-volatile loads.
The transform here is not intended to be a general DSE transform, so it should be specifically targeting spill slots. This fixes the issue because accesses to spill slots can't be volatile.

So there's a bunch of overlapping ways to constrain the transform that fix the issue.

I'm not against taking this fix as-is, but I also don't really want to leave isLoadFromStackSlot in its current state because it's sort of a trap.

Worth noting is that no other target has the frame index restriction inside the isLoadFromStackSlot routine either.

What restriction are you talking about? Every target checks that for a frame index operand: the routine returns a frame index. As far as I can tell, no target checks any properties of that frame index. (The AArch64 thing about not targeting xzr mentioned in the issue is not about isLoadFromStackSlot itself.)

preames · 2024-02-01T19:40:36Z

also don't really want to leave isLoadFromStackSlot in its current state because it's sort of a trap.

Agreed. There's also some other cleanup here would be good - like migrating the interface to Register. I'm just arguing for fixing the known bug before we worry about the foot-gun in general. :)

What restriction are you talking about?

I'd specifically meant isSpillSlotObjectIndex, sorry if that was not clear. I'd meant "restriction on which type of FI", not "restriction to being an FI".

efriedma-quic

LGTM

* llvm/main: (500 commits) [docs] Add beginner-focused office hours (llvm#80308) [mlir][sparse] external entry method wrapper for sparse tensors (llvm#80326) [StackSlotColoring] Ignore non-spill objects in RemoveDeadStores. (llvm#80242) [libc][stdbit] fix return types (llvm#80337) Revert "[RISCV] Refine cost on Min/Max reduction" (llvm#80340) [TTI]Add support for strided loads/stores. [analyzer][HTMLRewriter] Cache partial rewrite results. (llvm#80220) [flang][openacc][openmp] Use #0 from hlfir.declare value when generating bound ops (llvm#80317) [AArch64][PAC] Expand blend(reg, imm) operation in aarch64-pauth pass (llvm#74729) [SHT_LLVM_BB_ADDR_MAP][llvm-readobj] Implements llvm-readobj handling for PGOAnalysisMap. (llvm#79520) [libc] add bazel support for most of unistd (llvm#80078) [clang-tidy] Remove enforcement of rule C.48 from cppcoreguidelines-prefer-member-init (llvm#80330) [OpenMP] Fix typo (NFC) (llvm#80332) [BOLT] Enable re-writing of Linux kernel binary (llvm#80228) [BOLT] Adjust section sizes based on file offsets (llvm#80226) [libc] fix stdbit include test when not all entrypoints are available (llvm#80323) [RISCV][GISel] RegBank select and instruction select for vector G_ADD, G_SUB (llvm#74114) [RISCV] Add srmcfg CSR from Ssqosid extension. (llvm#79914) [mlir][sparse] add sparsification options to pretty print and debug s… (llvm#80205) [RISCV][MC] MC layer support for the experimental zalasr extension (llvm#79911) ...

…vm#80242) The stack slot coloring pass is concerned with optimizing spill slots. If any change is a pass is made over the function to remove stack stores that use the same register and stack slot as an immediately preceding load. The register check is too simple for constant registers like AArch64 and RISC-V's zero register. This register can be used as the result of a load if we want to discard the result, but still have the memory access performed. Like for a volatile or atomic load. If the code sees a load from the zero register followed by a store of the zero register at the same stack slot, the pass mistakenly believes the store isn't needed. Since the main stack coloring optimization is only concerned with spill slots, it seems reasonable that RemoveDeadStores should only be concerned with spills. Since we never generate a reload of x0, this avoids the issue seen by RISC-V. Test case concept is adapted from pr30821.mir from X86. That test had to be updated to mark the stack slot as a spill slot. Fixes llvm#80052.

topperc added 2 commits January 31, 2024 20:02

[RISCV] Add test case for PR80052. NFC

19f1916

topperc requested a review from preames February 1, 2024 05:15

llvmbot added the backend:X86 label Feb 1, 2024

topperc requested review from arsenm and efriedma-quic February 1, 2024 05:15

arsenm reviewed Feb 1, 2024

View reviewed changes

fixup! remove IR from test.

e42fccf

efriedma-quic approved these changes Feb 1, 2024

View reviewed changes

topperc merged commit 5cf0fb4 into llvm:main Feb 1, 2024
3 of 4 checks passed

topperc deleted the pr/80052 branch February 1, 2024 21:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[StackSlotColoring] Ignore non-spill objects in RemoveDeadStores. #80242

[StackSlotColoring] Ignore non-spill objects in RemoveDeadStores. #80242

topperc commented Feb 1, 2024

llvmbot commented Feb 1, 2024

arsenm Feb 1, 2024

preames commented Feb 1, 2024

efriedma-quic commented Feb 1, 2024

preames commented Feb 1, 2024

efriedma-quic commented Feb 1, 2024

preames commented Feb 1, 2024

efriedma-quic commented Feb 1, 2024

preames commented Feb 1, 2024

efriedma-quic left a comment

[StackSlotColoring] Ignore non-spill objects in RemoveDeadStores. #80242

[StackSlotColoring] Ignore non-spill objects in RemoveDeadStores. #80242

Conversation

topperc commented Feb 1, 2024

llvmbot commented Feb 1, 2024

arsenm Feb 1, 2024

Choose a reason for hiding this comment

preames commented Feb 1, 2024

efriedma-quic commented Feb 1, 2024

preames commented Feb 1, 2024

efriedma-quic commented Feb 1, 2024

preames commented Feb 1, 2024

efriedma-quic commented Feb 1, 2024

preames commented Feb 1, 2024

efriedma-quic left a comment

Choose a reason for hiding this comment