[CFIInserter] Improve `CSRSavedLocation` struct. #168869

mgudim · 2025-11-20T12:46:22Z

(1) Define CSRSavedLocation::Kind and use it in the code. This makes the code more readable and allows to extend it to new kinds. For example, soon I want to add "scalable offset from a given register" kind.

(2) Store the contents in a union. This should reduce memory usage.

llvm/lib/CodeGen/CFIInstrInserter.cpp

github-actions · 2025-11-20T17:36:04Z

🐧 Linux x64 Test Results

167264 tests passed
2955 tests skipped
1 test failed

Failed Tests

(click on a test name to see its output)

MLIR

MLIR.Dialect/XeGPU/propagate-layout-subgroup.mlir (Likely Already Failing)

This test is already failing at the base commit.

Exit Code: 1

Command Output (stdout):
--
# RUN: at line 1
/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/mlir-opt -xevm-attach-target='chip=pvc' -xegpu-propagate-layout="layout-kind=subgroup" -split-input-file /home/gha/actions-runner/_work/llvm-project/llvm-project/mlir/test/Dialect/XeGPU/propagate-layout-subgroup.mlir | /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/mlir/test/Dialect/XeGPU/propagate-layout-subgroup.mlir
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/mlir-opt -xevm-attach-target=chip=pvc -xegpu-propagate-layout=layout-kind=subgroup -split-input-file /home/gha/actions-runner/_work/llvm-project/llvm-project/mlir/test/Dialect/XeGPU/propagate-layout-subgroup.mlir
# note: command had no output on stdout or stderr
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/mlir/test/Dialect/XeGPU/propagate-layout-subgroup.mlir
# .---command stderr------------
# | /home/gha/actions-runner/_work/llvm-project/llvm-project/mlir/test/Dialect/XeGPU/propagate-layout-subgroup.mlir:10:17: error: CHECK-SAME: expected string not found in input
# |  // CHECK-SAME: {layout_result_0 = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>}
# |                 ^
# | <stdin>:5:90: note: scanning from here
# |  %1 = xegpu.load_nd %0 <{layout = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>}> : !xegpu.tensor_desc<256x128xf32, #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>> -> vector<256x128xf32>
# |                                                                                          ^
# | <stdin>:6:17: note: possible intended match here
# |  xegpu.store_nd %1, %0 <{layout = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>}> : vector<256x128xf32>, !xegpu.tensor_desc<256x128xf32, #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>>
# |                 ^
# | /home/gha/actions-runner/_work/llvm-project/llvm-project/mlir/test/Dialect/XeGPU/propagate-layout-subgroup.mlir:36:17: error: CHECK-SAME: expected string not found in input
# |  // CHECK-SAME: {layout_result_0 = #xegpu.layout<sg_layout = [4, 8], sg_data = [64, 32], order = [0, 1]>} :
# |                 ^
# | <stdin>:18:112: note: scanning from here
# |  %2 = xegpu.load_nd %0[0, 0] <{layout = #xegpu.layout<sg_layout = [4, 8], sg_data = [64, 32], order = [0, 1]>}> : !xegpu.tensor_desc<256x128xf32, #xegpu.layout<sg_layout = [4, 8], sg_data = [64, 32], order = [0, 1]>> -> vector<256x128xf32>
# |                                                                                                                ^
# | <stdin>:19:35: note: possible intended match here
# |  %3 = vector.transpose %2, [1, 0] {layout_result_0 = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 64], order = [1, 0]>} : vector<256x128xf32> to vector<128x256xf32>
# |                                   ^
# | 
# | Input file: <stdin>
# | Check file: /home/gha/actions-runner/_work/llvm-project/llvm-project/mlir/test/Dialect/XeGPU/propagate-layout-subgroup.mlir
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |            1: module { 
# |            2:  gpu.module @test [#xevm.target<chip = "pvc">] { 
# |            3:  func.func @store_nd(%arg0: memref<256x128xf32>) { 
# |            4:  %0 = xegpu.create_nd_tdesc %arg0 : memref<256x128xf32> -> !xegpu.tensor_desc<256x128xf32, #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>> 
# |            5:  %1 = xegpu.load_nd %0 <{layout = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>}> : !xegpu.tensor_desc<256x128xf32, #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>> -> vector<256x128xf32> 
# | same:10'0                                                                                              X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
# |            6:  xegpu.store_nd %1, %0 <{layout = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>}> : vector<256x128xf32>, !xegpu.tensor_desc<256x128xf32, #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>> 
# | same:10'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# | same:10'1                     ?                                                                                                                                                                                        possible intended match
# |            7:  return 
# | same:10'0     ~~~~~~~~
# |            8:  } 
# | same:10'0     ~~~
# |            9:  } 
# | same:10'0     ~~~
# |           10: } 
# | same:10'0     ~~
# |           11:  
# | same:10'0     ~
# |           12: // ----- 
# | same:10'0     ~~~~~~~~~
# |           13: module { 
# | same:10'0     ~~~~~~~~~
# |           14:  gpu.module @test [#xevm.target<chip = "pvc">] { 
# | same:10'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           15:  func.func @vector_transpose(%arg0: memref<256x128xf32>, %arg1: memref<128x256xf32>) { 
# | same:10'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           16:  %0 = xegpu.create_nd_tdesc %arg0 : memref<256x128xf32> -> !xegpu.tensor_desc<256x128xf32, #xegpu.layout<sg_layout = [4, 8], sg_data = [64, 32], order = [0, 1]>> 
# |           17:  %1 = xegpu.create_nd_tdesc %arg1 : memref<128x256xf32> -> !xegpu.tensor_desc<128x256xf32, #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 64], order = [1, 0]>> 
# |           18:  %2 = xegpu.load_nd %0[0, 0] <{layout = #xegpu.layout<sg_layout = [4, 8], sg_data = [64, 32], order = [0, 1]>}> : !xegpu.tensor_desc<256x128xf32, #xegpu.layout<sg_layout = [4, 8], sg_data = [64, 32], order = [0, 1]>> -> vector<256x128xf32> 
# | same:36'0                                                                                                                    X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
# |           19:  %3 = vector.transpose %2, [1, 0] {layout_result_0 = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 64], order = [1, 0]>} : vector<256x128xf32> to vector<128x256xf32> 
# | same:36'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# | same:36'1                                       ?                                                                                                                                       possible intended match
# |           20:  xegpu.store_nd %3, %1[0, 0] <{layout = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 64], order = [1, 0]>}> : vector<128x256xf32>, !xegpu.tensor_desc<128x256xf32, #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 64], order = [1, 0]>> 
# | same:36'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           21:  return 
# | same:36'0     ~~~~~~~~
# |           22:  } 
# | same:36'0     ~~~
# |           23:  } 
# | same:36'0     ~~~
# |           24: } 
# | same:36'0     ~~
# |           25:  
# | same:36'0     ~
# | >>>>>>
# `-----------------------------
# error: command failed with exit status: 1

--

If these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the infrastructure label.

llvm/lib/CodeGen/CFIInstrInserter.cpp

preames · 2025-12-04T18:56:42Z

I pulled out a subset of this into #170721 with the goal of getting some of the API changes in, and then returning to the enum/union bits in this change.

(1) Define `CSRSavedLocation::Kind` and use it in the code. This makes the code more readable and allows to extend it to new kinds. For example, soon I want to add "scalable offset from a given register" kind. (2) Store the contents in a union. This should reduce memory usage.

github-actions · 2025-12-10T17:34:12Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvm/lib/CodeGen/CFIInstrInserter.cpp

preames

Can you point to an example where the Kind will simplify code? On it's own, I'm not a fan of remaining parts of this change. I tried to dig through your other open reviews, but didn't find the motivating usage.

preames · 2025-12-11T15:50:24Z

llvm/lib/CodeGen/CFIInstrInserter.cpp

+        CSRLoc = CSRSavedLocation::createRegister(*CSRReg);
+      else if (CSROffset)
+        CSRLoc = CSRSavedLocation::createCFAOffset(*CSROffset);
+      if (CSRLoc.isValid()) {


I would prefer we not add the Invalid state. The prior code didn't need it, your switch based code shouldn't either. Is there a strong reason this needs to exist?

It will be needed here: https://github.com/llvm/llvm-project/pull/168531/files#diff-54d9d06f60ce6c927ea0f2c1380a50bdf93c689d1781186966ef2234660e47c9R421

Unless we want to collect all callee-saved registers in a vector before we come to this line. But... we also need CFIs for exception handling (or something else) and I suspect it may require emitting CFIs for non-callee saved registers.

EDIT: I don't think we'll ever need to care about CFIs for registers which are not returned by getCalleeSavedRegs, so we can make CFIInstrInserter only track those. That should save some memory too. I'll create a separate PR for that.

llvm/lib/CodeGen/CFIInstrInserter.cpp

mgudim · 2025-12-11T20:08:07Z

The point of this is to prepare for future work where Kind can be "scalable offset from CFA" (from the commit message).

Can you point to an example where the Kind will simplify code?
https://github.com/llvm/llvm-project/pull/168869/files#diff-54d9d06f60ce6c927ea0f2c1380a50bdf93c689d1781186966ef2234660e47c9L406

github-actions · 2025-12-16T19:11:01Z

🪟 Windows x64 Test Results

128746 tests passed
2826 tests skipped
1 test failed

Failed Tests

(click on a test name to see its output)

MLIR

MLIR.Dialect/XeGPU/propagate-layout-subgroup.mlir

Exit Code: 1

Command Output (stdout):
--
# RUN: at line 1
c:\_work\llvm-project\llvm-project\build\bin\mlir-opt.exe -xevm-attach-target='chip=pvc' -xegpu-propagate-layout="layout-kind=subgroup" -split-input-file C:\_work\llvm-project\llvm-project\mlir\test\Dialect\XeGPU\propagate-layout-subgroup.mlir | c:\_work\llvm-project\llvm-project\build\bin\filecheck.exe C:\_work\llvm-project\llvm-project\mlir\test\Dialect\XeGPU\propagate-layout-subgroup.mlir
# executed command: 'c:\_work\llvm-project\llvm-project\build\bin\mlir-opt.exe' -xevm-attach-target=chip=pvc -xegpu-propagate-layout=layout-kind=subgroup -split-input-file 'C:\_work\llvm-project\llvm-project\mlir\test\Dialect\XeGPU\propagate-layout-subgroup.mlir'
# note: command had no output on stdout or stderr
# executed command: 'c:\_work\llvm-project\llvm-project\build\bin\filecheck.exe' 'C:\_work\llvm-project\llvm-project\mlir\test\Dialect\XeGPU\propagate-layout-subgroup.mlir'
# .---command stderr------------
# | C:\_work\llvm-project\llvm-project\mlir\test\Dialect\XeGPU\propagate-layout-subgroup.mlir:10:17: error: CHECK-SAME: expected string not found in input
# |  // CHECK-SAME: {layout_result_0 = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>}
# |                 ^
# | <stdin>:5:90: note: scanning from here
# |  %1 = xegpu.load_nd %0 <{layout = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>}> : !xegpu.tensor_desc<256x128xf32, #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>> -> vector<256x128xf32>
# |                                                                                          ^
# | <stdin>:6:17: note: possible intended match here
# |  xegpu.store_nd %1, %0 <{layout = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>}> : vector<256x128xf32>, !xegpu.tensor_desc<256x128xf32, #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>>
# |                 ^
# | C:\_work\llvm-project\llvm-project\mlir\test\Dialect\XeGPU\propagate-layout-subgroup.mlir:36:17: error: CHECK-SAME: expected string not found in input
# |  // CHECK-SAME: {layout_result_0 = #xegpu.layout<sg_layout = [4, 8], sg_data = [64, 32], order = [0, 1]>} :
# |                 ^
# | <stdin>:18:112: note: scanning from here
# |  %2 = xegpu.load_nd %0[0, 0] <{layout = #xegpu.layout<sg_layout = [4, 8], sg_data = [64, 32], order = [0, 1]>}> : !xegpu.tensor_desc<256x128xf32, #xegpu.layout<sg_layout = [4, 8], sg_data = [64, 32], order = [0, 1]>> -> vector<256x128xf32>
# |                                                                                                                ^
# | <stdin>:19:35: note: possible intended match here
# |  %3 = vector.transpose %2, [1, 0] {layout_result_0 = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 64], order = [1, 0]>} : vector<256x128xf32> to vector<128x256xf32>
# |                                   ^
# | 
# | Input file: <stdin>
# | Check file: C:\_work\llvm-project\llvm-project\mlir\test\Dialect\XeGPU\propagate-layout-subgroup.mlir
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |            1: module { 
# |            2:  gpu.module @test [#xevm.target<chip = "pvc">] { 
# |            3:  func.func @store_nd(%arg0: memref<256x128xf32>) { 
# |            4:  %0 = xegpu.create_nd_tdesc %arg0 : memref<256x128xf32> -> !xegpu.tensor_desc<256x128xf32, #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>> 
# |            5:  %1 = xegpu.load_nd %0 <{layout = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>}> : !xegpu.tensor_desc<256x128xf32, #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>> -> vector<256x128xf32> 
# | same:10'0                                                                                              X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
# |            6:  xegpu.store_nd %1, %0 <{layout = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>}> : vector<256x128xf32>, !xegpu.tensor_desc<256x128xf32, #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>> 
# | same:10'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# | same:10'1                     ?                                                                                                                                                                                        possible intended match
# |            7:  return 
# | same:10'0     ~~~~~~~~
# |            8:  } 
# | same:10'0     ~~~
# |            9:  } 
# | same:10'0     ~~~
# |           10: } 
# | same:10'0     ~~
# |           11:  
# | same:10'0     ~
# |           12: // ----- 
# | same:10'0     ~~~~~~~~~
# |           13: module { 
# | same:10'0     ~~~~~~~~~
# |           14:  gpu.module @test [#xevm.target<chip = "pvc">] { 
# | same:10'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           15:  func.func @vector_transpose(%arg0: memref<256x128xf32>, %arg1: memref<128x256xf32>) { 
# | same:10'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           16:  %0 = xegpu.create_nd_tdesc %arg0 : memref<256x128xf32> -> !xegpu.tensor_desc<256x128xf32, #xegpu.layout<sg_layout = [4, 8], sg_data = [64, 32], order = [0, 1]>> 
# |           17:  %1 = xegpu.create_nd_tdesc %arg1 : memref<128x256xf32> -> !xegpu.tensor_desc<128x256xf32, #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 64], order = [1, 0]>> 
# |           18:  %2 = xegpu.load_nd %0[0, 0] <{layout = #xegpu.layout<sg_layout = [4, 8], sg_data = [64, 32], order = [0, 1]>}> : !xegpu.tensor_desc<256x128xf32, #xegpu.layout<sg_layout = [4, 8], sg_data = [64, 32], order = [0, 1]>> -> vector<256x128xf32> 
# | same:36'0                                                                                                                    X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
# |           19:  %3 = vector.transpose %2, [1, 0] {layout_result_0 = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 64], order = [1, 0]>} : vector<256x128xf32> to vector<128x256xf32> 
# | same:36'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# | same:36'1                                       ?                                                                                                                                       possible intended match
# |           20:  xegpu.store_nd %3, %1[0, 0] <{layout = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 64], order = [1, 0]>}> : vector<128x256xf32>, !xegpu.tensor_desc<128x256xf32, #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 64], order = [1, 0]>> 
# | same:36'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           21:  return 
# | same:36'0     ~~~~~~~~
# |           22:  } 
# | same:36'0     ~~~
# |           23:  } 
# | same:36'0     ~~~
# |           24: } 
# | same:36'0     ~~
# |           25:  
# | same:36'0     ~
# | >>>>>>
# `-----------------------------
# error: command failed with exit status: 1

--

If these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the infrastructure label.

MaskRay · 2025-12-18T06:12:35Z

The CFIInstrInserter pass in LLVM can insert .cfi_def_cfa_* .cfi_offset .cfi_restore to adjust the CFA and callee-saved registers. The CFIFixup pass in LLVM can insert .cfi_restore_state .cfi_remember_state. CFIFixup generated information is more space-efficient and is therefore preferred. Have you considered adopting CFIFixup instead?

mgudim · 2025-12-18T16:01:13Z

@MaskRay Thanks for taking a look!

Have you considered adopting CFIFixup instead?

Yes. From the comments in CFIFixup it looks like it relies on several assumptions which may not hold if we go the "save-csr-early" path. (i. e. saving / restoring csr is just a regular spill). I haven't looked at the exact algorithm in CFIFixup so I don't know if these assumptions will be hard to overcome. I just went with CFIInstrInserter instead.

...can insert .cfi_restore_state .cfi_remember_state. CFIFixup generated information is more space-efficient and is therefore preferred

I think we can also implement this optimization in CFIInstrInserter. It could be something like this: once all additional (needed) CFIs are inserted, we traverse the blocks once again in layout order looking for pairs of points where the state is the same. Then, if legal and profitable, we can insert "cfi_remember_state / cfi_restore_state".

But since this is an optimization, I think we can postpone it until all the core functionality works. I vaguely remember that I measured the size of cfi sections and it was significantly larger, so it will be important.

P.S. actually now when I think about it, it is unclear if the algorithm in CFIInstrInserter is correct when cfi instructions are inside loops. For example, if a BB is its own predecessor, we can't really calculate the incoming state in one traversal. Something like iteration untill a fixed point is needed. I'll think more about this!

EDIT: if a block is it's own predecessor then CFI state on entry and exit to this block have to be the same. So that's not a good example.

mgudim · 2025-12-18T22:26:07Z

@MaskRay

The current alg can be modified to work with loops and for that we actually DO NEED the "Invalid" state (@preames).
I'll write up the explanation soon.

mgudim · 2025-12-19T18:08:13Z

Consider the following example:

entry:
...
BEQ %C, %T, %F

T:
$x5 = $x18
B %end

F:
SD $x18, %stack.0


end:
$x18 = LD %stack.0

after PEI, CFIs will be generated:

entry:
...
BEQ %C, %T, %F

T:
$x5 = $x18
.cfi_register $x18, $x5
B %end

F:
SD $x18, %stack.0
.cfi_offset $x18, 8


end:
$x18 = LD %stack.0
.cfi_register $x18, $x18

Here we see that two different CFI states for x18 reach the beginning of %end. It does not mean that CFIs are inconsistent, it means we can choose any of the two. Actually, here we can make a smart choice to minimize number of inserted CFIs. For the algorithm in CFIInstrInserter it means that we can initialize the incomming CFI state of a basic block with the outgoing state of ANY of the predecessors, UNLESS the state is undefined (invalid). Consider this example:

entry:
SD $x18, %stack.0
.cfi_offset $x18, 8

loop:
$x5 = $x18
.cfi_register $x18, $x5
...
BEQ %C, %loop, %end

end:
ret

When we need to initalize the incoming CFI state for %loop, we can only choose the state coming from %entry because the state of the other predecessor (namely %loop itself) is uninitialized.

@preames I hope this justifies why we need the Invalid type.

mgudim requested review from MaskRay, lenary and topperc November 20, 2025 12:46

llvmbot added the llvm:codegen label Nov 20, 2025

s-barannikov reviewed Nov 20, 2025

View reviewed changes

topperc reviewed Nov 21, 2025

View reviewed changes

llvm/lib/CodeGen/CFIInstrInserter.cpp Outdated Show resolved Hide resolved

llvm/lib/CodeGen/CFIInstrInserter.cpp Show resolved Hide resolved

Mikhail Gudim added 3 commits December 10, 2025 08:21

addressed review comments.

9d6fbb4

fixed a wrong assertion.

0db6b47

mgudim force-pushed the csrsavedloc branch 2 times, most recently from c1ffa81 to 51edcce Compare December 10, 2025 16:55

addressed review comments.

76b10c1

mgudim force-pushed the csrsavedloc branch from 51edcce to 76b10c1 Compare December 10, 2025 19:24

topperc reviewed Dec 10, 2025

View reviewed changes

llvm/lib/CodeGen/CFIInstrInserter.cpp Outdated Show resolved Hide resolved

preames requested changes Dec 11, 2025

View reviewed changes

This was referenced Dec 12, 2025

[WIP][CodeGen][DebugInfo][RISCV] Support scalable offsets in CFI #170607

Open

[RISCV][WIP] Let RA do the CSR saves. #90819

Open

addressed review comments.

be1b8c1

mgudim force-pushed the csrsavedloc branch from 7c9697f to be1b8c1 Compare December 17, 2025 21:43

[CFIInserter] Improve CSRSavedLocation struct. #168869

Are you sure you want to change the base?

[CFIInserter] Improve CSRSavedLocation struct. #168869

Uh oh!

Conversation

mgudim commented Nov 20, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🐧 Linux x64 Test Results

Failed Tests

MLIR

Uh oh!

Uh oh!

Uh oh!

preames commented Dec 4, 2025

Uh oh!

github-actions bot commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

preames left a comment

Choose a reason for hiding this comment

Uh oh!

preames Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

mgudim Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

mgudim Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mgudim commented Dec 11, 2025

Uh oh!

github-actions bot commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🪟 Windows x64 Test Results

Failed Tests

MLIR

Uh oh!

MaskRay commented Dec 18, 2025

Uh oh!

mgudim commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mgudim commented Dec 18, 2025

Uh oh!

mgudim commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[CFIInserter] Improve `CSRSavedLocation` struct. #168869

[CFIInserter] Improve `CSRSavedLocation` struct. #168869

github-actions bot commented Nov 20, 2025 •

edited

Loading

github-actions bot commented Dec 10, 2025 •

edited

Loading

github-actions bot commented Dec 16, 2025 •

edited

Loading

mgudim commented Dec 18, 2025 •

edited

Loading