Skip to content

Conversation

@mgudim
Copy link
Contributor

@mgudim mgudim commented Nov 20, 2025

(1) Define CSRSavedLocation::Kind and use it in the code. This makes the code more readable and allows to extend it to new kinds. For example, soon I want to add "scalable offset from a given register" kind.

(2) Store the contents in a union. This should reduce memory usage.

@github-actions
Copy link

github-actions bot commented Nov 20, 2025

🐧 Linux x64 Test Results

  • 167264 tests passed
  • 2955 tests skipped
  • 1 test failed

Failed Tests

(click on a test name to see its output)

MLIR

MLIR.Dialect/XeGPU/propagate-layout-subgroup.mlir (Likely Already Failing) This test is already failing at the base commit.
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 1
/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/mlir-opt -xevm-attach-target='chip=pvc' -xegpu-propagate-layout="layout-kind=subgroup" -split-input-file /home/gha/actions-runner/_work/llvm-project/llvm-project/mlir/test/Dialect/XeGPU/propagate-layout-subgroup.mlir | /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/mlir/test/Dialect/XeGPU/propagate-layout-subgroup.mlir
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/mlir-opt -xevm-attach-target=chip=pvc -xegpu-propagate-layout=layout-kind=subgroup -split-input-file /home/gha/actions-runner/_work/llvm-project/llvm-project/mlir/test/Dialect/XeGPU/propagate-layout-subgroup.mlir
# note: command had no output on stdout or stderr
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/mlir/test/Dialect/XeGPU/propagate-layout-subgroup.mlir
# .---command stderr------------
# | /home/gha/actions-runner/_work/llvm-project/llvm-project/mlir/test/Dialect/XeGPU/propagate-layout-subgroup.mlir:10:17: error: CHECK-SAME: expected string not found in input
# |  // CHECK-SAME: {layout_result_0 = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>}
# |                 ^
# | <stdin>:5:90: note: scanning from here
# |  %1 = xegpu.load_nd %0 <{layout = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>}> : !xegpu.tensor_desc<256x128xf32, #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>> -> vector<256x128xf32>
# |                                                                                          ^
# | <stdin>:6:17: note: possible intended match here
# |  xegpu.store_nd %1, %0 <{layout = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>}> : vector<256x128xf32>, !xegpu.tensor_desc<256x128xf32, #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>>
# |                 ^
# | /home/gha/actions-runner/_work/llvm-project/llvm-project/mlir/test/Dialect/XeGPU/propagate-layout-subgroup.mlir:36:17: error: CHECK-SAME: expected string not found in input
# |  // CHECK-SAME: {layout_result_0 = #xegpu.layout<sg_layout = [4, 8], sg_data = [64, 32], order = [0, 1]>} :
# |                 ^
# | <stdin>:18:112: note: scanning from here
# |  %2 = xegpu.load_nd %0[0, 0] <{layout = #xegpu.layout<sg_layout = [4, 8], sg_data = [64, 32], order = [0, 1]>}> : !xegpu.tensor_desc<256x128xf32, #xegpu.layout<sg_layout = [4, 8], sg_data = [64, 32], order = [0, 1]>> -> vector<256x128xf32>
# |                                                                                                                ^
# | <stdin>:19:35: note: possible intended match here
# |  %3 = vector.transpose %2, [1, 0] {layout_result_0 = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 64], order = [1, 0]>} : vector<256x128xf32> to vector<128x256xf32>
# |                                   ^
# | 
# | Input file: <stdin>
# | Check file: /home/gha/actions-runner/_work/llvm-project/llvm-project/mlir/test/Dialect/XeGPU/propagate-layout-subgroup.mlir
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |            1: module { 
# |            2:  gpu.module @test [#xevm.target<chip = "pvc">] { 
# |            3:  func.func @store_nd(%arg0: memref<256x128xf32>) { 
# |            4:  %0 = xegpu.create_nd_tdesc %arg0 : memref<256x128xf32> -> !xegpu.tensor_desc<256x128xf32, #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>> 
# |            5:  %1 = xegpu.load_nd %0 <{layout = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>}> : !xegpu.tensor_desc<256x128xf32, #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>> -> vector<256x128xf32> 
# | same:10'0                                                                                              X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
# |            6:  xegpu.store_nd %1, %0 <{layout = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>}> : vector<256x128xf32>, !xegpu.tensor_desc<256x128xf32, #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>> 
# | same:10'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# | same:10'1                     ?                                                                                                                                                                                        possible intended match
# |            7:  return 
# | same:10'0     ~~~~~~~~
# |            8:  } 
# | same:10'0     ~~~
# |            9:  } 
# | same:10'0     ~~~
# |           10: } 
# | same:10'0     ~~
# |           11:  
# | same:10'0     ~
# |           12: // ----- 
# | same:10'0     ~~~~~~~~~
# |           13: module { 
# | same:10'0     ~~~~~~~~~
# |           14:  gpu.module @test [#xevm.target<chip = "pvc">] { 
# | same:10'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           15:  func.func @vector_transpose(%arg0: memref<256x128xf32>, %arg1: memref<128x256xf32>) { 
# | same:10'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           16:  %0 = xegpu.create_nd_tdesc %arg0 : memref<256x128xf32> -> !xegpu.tensor_desc<256x128xf32, #xegpu.layout<sg_layout = [4, 8], sg_data = [64, 32], order = [0, 1]>> 
# |           17:  %1 = xegpu.create_nd_tdesc %arg1 : memref<128x256xf32> -> !xegpu.tensor_desc<128x256xf32, #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 64], order = [1, 0]>> 
# |           18:  %2 = xegpu.load_nd %0[0, 0] <{layout = #xegpu.layout<sg_layout = [4, 8], sg_data = [64, 32], order = [0, 1]>}> : !xegpu.tensor_desc<256x128xf32, #xegpu.layout<sg_layout = [4, 8], sg_data = [64, 32], order = [0, 1]>> -> vector<256x128xf32> 
# | same:36'0                                                                                                                    X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
# |           19:  %3 = vector.transpose %2, [1, 0] {layout_result_0 = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 64], order = [1, 0]>} : vector<256x128xf32> to vector<128x256xf32> 
# | same:36'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# | same:36'1                                       ?                                                                                                                                       possible intended match
# |           20:  xegpu.store_nd %3, %1[0, 0] <{layout = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 64], order = [1, 0]>}> : vector<128x256xf32>, !xegpu.tensor_desc<128x256xf32, #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 64], order = [1, 0]>> 
# | same:36'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           21:  return 
# | same:36'0     ~~~~~~~~
# |           22:  } 
# | same:36'0     ~~~
# |           23:  } 
# | same:36'0     ~~~
# |           24: } 
# | same:36'0     ~~
# |           25:  
# | same:36'0     ~
# | >>>>>>
# `-----------------------------
# error: command failed with exit status: 1

--

If these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the infrastructure label.

@preames
Copy link
Collaborator

preames commented Dec 4, 2025

I pulled out a subset of this into #170721 with the goal of getting some of the API changes in, and then returning to the enum/union bits in this change.

Mikhail Gudim added 3 commits December 10, 2025 08:21
(1) Define `CSRSavedLocation::Kind` and use it in the code. This makes
the code more readable and allows to extend it to new kinds. For
example, soon I want to add "scalable offset from a given register"
kind.

(2) Store the contents in a union. This should reduce memory usage.
@mgudim mgudim force-pushed the csrsavedloc branch 2 times, most recently from c1ffa81 to 51edcce Compare December 10, 2025 16:55
@github-actions
Copy link

github-actions bot commented Dec 10, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Collaborator

@preames preames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point to an example where the Kind will simplify code? On it's own, I'm not a fan of remaining parts of this change. I tried to dig through your other open reviews, but didn't find the motivating usage.

CSRLoc = CSRSavedLocation::createRegister(*CSRReg);
else if (CSROffset)
CSRLoc = CSRSavedLocation::createCFAOffset(*CSROffset);
if (CSRLoc.isValid()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer we not add the Invalid state. The prior code didn't need it, your switch based code shouldn't either. Is there a strong reason this needs to exist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be needed here: https://github.com/llvm/llvm-project/pull/168531/files#diff-54d9d06f60ce6c927ea0f2c1380a50bdf93c689d1781186966ef2234660e47c9R421

Unless we want to collect all callee-saved registers in a vector before we come to this line. But... we also need CFIs for exception handling (or something else) and I suspect it may require emitting CFIs for non-callee saved registers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: I don't think we'll ever need to care about CFIs for registers which are not returned by getCalleeSavedRegs, so we can make CFIInstrInserter only track those. That should save some memory too. I'll create a separate PR for that.

@mgudim
Copy link
Contributor Author

mgudim commented Dec 11, 2025

The point of this is to prepare for future work where Kind can be "scalable offset from CFA" (from the commit message).

Can you point to an example where the Kind will simplify code?
https://github.com/llvm/llvm-project/pull/168869/files#diff-54d9d06f60ce6c927ea0f2c1380a50bdf93c689d1781186966ef2234660e47c9L406

@github-actions
Copy link

github-actions bot commented Dec 16, 2025

🪟 Windows x64 Test Results

  • 128746 tests passed
  • 2826 tests skipped
  • 1 test failed

Failed Tests

(click on a test name to see its output)

MLIR

MLIR.Dialect/XeGPU/propagate-layout-subgroup.mlir
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 1
c:\_work\llvm-project\llvm-project\build\bin\mlir-opt.exe -xevm-attach-target='chip=pvc' -xegpu-propagate-layout="layout-kind=subgroup" -split-input-file C:\_work\llvm-project\llvm-project\mlir\test\Dialect\XeGPU\propagate-layout-subgroup.mlir | c:\_work\llvm-project\llvm-project\build\bin\filecheck.exe C:\_work\llvm-project\llvm-project\mlir\test\Dialect\XeGPU\propagate-layout-subgroup.mlir
# executed command: 'c:\_work\llvm-project\llvm-project\build\bin\mlir-opt.exe' -xevm-attach-target=chip=pvc -xegpu-propagate-layout=layout-kind=subgroup -split-input-file 'C:\_work\llvm-project\llvm-project\mlir\test\Dialect\XeGPU\propagate-layout-subgroup.mlir'
# note: command had no output on stdout or stderr
# executed command: 'c:\_work\llvm-project\llvm-project\build\bin\filecheck.exe' 'C:\_work\llvm-project\llvm-project\mlir\test\Dialect\XeGPU\propagate-layout-subgroup.mlir'
# .---command stderr------------
# | C:\_work\llvm-project\llvm-project\mlir\test\Dialect\XeGPU\propagate-layout-subgroup.mlir:10:17: error: CHECK-SAME: expected string not found in input
# |  // CHECK-SAME: {layout_result_0 = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>}
# |                 ^
# | <stdin>:5:90: note: scanning from here
# |  %1 = xegpu.load_nd %0 <{layout = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>}> : !xegpu.tensor_desc<256x128xf32, #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>> -> vector<256x128xf32>
# |                                                                                          ^
# | <stdin>:6:17: note: possible intended match here
# |  xegpu.store_nd %1, %0 <{layout = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>}> : vector<256x128xf32>, !xegpu.tensor_desc<256x128xf32, #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>>
# |                 ^
# | C:\_work\llvm-project\llvm-project\mlir\test\Dialect\XeGPU\propagate-layout-subgroup.mlir:36:17: error: CHECK-SAME: expected string not found in input
# |  // CHECK-SAME: {layout_result_0 = #xegpu.layout<sg_layout = [4, 8], sg_data = [64, 32], order = [0, 1]>} :
# |                 ^
# | <stdin>:18:112: note: scanning from here
# |  %2 = xegpu.load_nd %0[0, 0] <{layout = #xegpu.layout<sg_layout = [4, 8], sg_data = [64, 32], order = [0, 1]>}> : !xegpu.tensor_desc<256x128xf32, #xegpu.layout<sg_layout = [4, 8], sg_data = [64, 32], order = [0, 1]>> -> vector<256x128xf32>
# |                                                                                                                ^
# | <stdin>:19:35: note: possible intended match here
# |  %3 = vector.transpose %2, [1, 0] {layout_result_0 = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 64], order = [1, 0]>} : vector<256x128xf32> to vector<128x256xf32>
# |                                   ^
# | 
# | Input file: <stdin>
# | Check file: C:\_work\llvm-project\llvm-project\mlir\test\Dialect\XeGPU\propagate-layout-subgroup.mlir
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |            1: module { 
# |            2:  gpu.module @test [#xevm.target<chip = "pvc">] { 
# |            3:  func.func @store_nd(%arg0: memref<256x128xf32>) { 
# |            4:  %0 = xegpu.create_nd_tdesc %arg0 : memref<256x128xf32> -> !xegpu.tensor_desc<256x128xf32, #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>> 
# |            5:  %1 = xegpu.load_nd %0 <{layout = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>}> : !xegpu.tensor_desc<256x128xf32, #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>> -> vector<256x128xf32> 
# | same:10'0                                                                                              X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
# |            6:  xegpu.store_nd %1, %0 <{layout = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>}> : vector<256x128xf32>, !xegpu.tensor_desc<256x128xf32, #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32]>> 
# | same:10'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# | same:10'1                     ?                                                                                                                                                                                        possible intended match
# |            7:  return 
# | same:10'0     ~~~~~~~~
# |            8:  } 
# | same:10'0     ~~~
# |            9:  } 
# | same:10'0     ~~~
# |           10: } 
# | same:10'0     ~~
# |           11:  
# | same:10'0     ~
# |           12: // ----- 
# | same:10'0     ~~~~~~~~~
# |           13: module { 
# | same:10'0     ~~~~~~~~~
# |           14:  gpu.module @test [#xevm.target<chip = "pvc">] { 
# | same:10'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           15:  func.func @vector_transpose(%arg0: memref<256x128xf32>, %arg1: memref<128x256xf32>) { 
# | same:10'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           16:  %0 = xegpu.create_nd_tdesc %arg0 : memref<256x128xf32> -> !xegpu.tensor_desc<256x128xf32, #xegpu.layout<sg_layout = [4, 8], sg_data = [64, 32], order = [0, 1]>> 
# |           17:  %1 = xegpu.create_nd_tdesc %arg1 : memref<128x256xf32> -> !xegpu.tensor_desc<128x256xf32, #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 64], order = [1, 0]>> 
# |           18:  %2 = xegpu.load_nd %0[0, 0] <{layout = #xegpu.layout<sg_layout = [4, 8], sg_data = [64, 32], order = [0, 1]>}> : !xegpu.tensor_desc<256x128xf32, #xegpu.layout<sg_layout = [4, 8], sg_data = [64, 32], order = [0, 1]>> -> vector<256x128xf32> 
# | same:36'0                                                                                                                    X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
# |           19:  %3 = vector.transpose %2, [1, 0] {layout_result_0 = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 64], order = [1, 0]>} : vector<256x128xf32> to vector<128x256xf32> 
# | same:36'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# | same:36'1                                       ?                                                                                                                                       possible intended match
# |           20:  xegpu.store_nd %3, %1[0, 0] <{layout = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 64], order = [1, 0]>}> : vector<128x256xf32>, !xegpu.tensor_desc<128x256xf32, #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 64], order = [1, 0]>> 
# | same:36'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           21:  return 
# | same:36'0     ~~~~~~~~
# |           22:  } 
# | same:36'0     ~~~
# |           23:  } 
# | same:36'0     ~~~
# |           24: } 
# | same:36'0     ~~
# |           25:  
# | same:36'0     ~
# | >>>>>>
# `-----------------------------
# error: command failed with exit status: 1

--

If these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the infrastructure label.

@MaskRay
Copy link
Member

MaskRay commented Dec 18, 2025

The CFIInstrInserter pass in LLVM can insert .cfi_def_cfa_* .cfi_offset .cfi_restore to adjust the CFA and callee-saved registers. The CFIFixup pass in LLVM can insert .cfi_restore_state .cfi_remember_state. CFIFixup generated information is more space-efficient and is therefore preferred. Have you considered adopting CFIFixup instead?

@mgudim
Copy link
Contributor Author

mgudim commented Dec 18, 2025

@MaskRay Thanks for taking a look!

Have you considered adopting CFIFixup instead?

Yes. From the comments in CFIFixup it looks like it relies on several assumptions which may not hold if we go the "save-csr-early" path. (i. e. saving / restoring csr is just a regular spill). I haven't looked at the exact algorithm in CFIFixup so I don't know if these assumptions will be hard to overcome. I just went with CFIInstrInserter instead.

...can insert .cfi_restore_state .cfi_remember_state. CFIFixup generated information is more space-efficient and is therefore preferred

I think we can also implement this optimization in CFIInstrInserter. It could be something like this: once all additional (needed) CFIs are inserted, we traverse the blocks once again in layout order looking for pairs of points where the state is the same. Then, if legal and profitable, we can insert "cfi_remember_state / cfi_restore_state".

But since this is an optimization, I think we can postpone it until all the core functionality works. I vaguely remember that I measured the size of cfi sections and it was significantly larger, so it will be important.

P.S. actually now when I think about it, it is unclear if the algorithm in CFIInstrInserter is correct when cfi instructions are inside loops. For example, if a BB is its own predecessor, we can't really calculate the incoming state in one traversal. Something like iteration untill a fixed point is needed. I'll think more about this!

EDIT: if a block is it's own predecessor then CFI state on entry and exit to this block have to be the same. So that's not a good example.

@mgudim
Copy link
Contributor Author

mgudim commented Dec 18, 2025

@MaskRay

The current alg can be modified to work with loops and for that we actually DO NEED the "Invalid" state (@preames).
I'll write up the explanation soon.

@mgudim
Copy link
Contributor Author

mgudim commented Dec 19, 2025

Consider the following example:

entry:
...
BEQ %C, %T, %F

T:
$x5 = $x18
B %end

F:
SD $x18, %stack.0


end:
$x18 = LD %stack.0

after PEI, CFIs will be generated:

entry:
...
BEQ %C, %T, %F

T:
$x5 = $x18
.cfi_register $x18, $x5
B %end

F:
SD $x18, %stack.0
.cfi_offset $x18, 8


end:
$x18 = LD %stack.0
.cfi_register $x18, $x18

Here we see that two different CFI states for x18 reach the beginning of %end. It does not mean that CFIs are inconsistent, it means we can choose any of the two. Actually, here we can make a smart choice to minimize number of inserted CFIs. For the algorithm in CFIInstrInserter it means that we can initialize the incomming CFI state of a basic block with the outgoing state of ANY of the predecessors, UNLESS the state is undefined (invalid). Consider this example:

entry:
SD $x18, %stack.0
.cfi_offset $x18, 8

loop:
$x5 = $x18
.cfi_register $x18, $x5
...
BEQ %C, %loop, %end

end:
ret

When we need to initalize the incoming CFI state for %loop, we can only choose the state coming from %entry because the state of the other predecessor (namely %loop itself) is uninitialized.

@preames I hope this justifies why we need the Invalid type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants