Skip to content

Add late optimization pass for riscv #117060

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

mikhailramalho
Copy link
Member

@mikhailramalho mikhailramalho commented Nov 20, 2024

This patch adds a late optimization pass (riscv only) to remove blocks with a single instruction, a beqz or bnez that compares the zero register.

The code here is based on HexagonEarlyIfConversion.

A few passes work together to generate the unoptimized code: Virtual Register Rewriter, Post-RA pseudo instruction, Branch Probability Basic Block Placement, and Machine Copy Propagation Pass produce:

bb.0 (%ir-block.2):
  successors: %bb.2(0x40000000), %bb.1(0x40000000); %bb.2(50.00%), %bb.1(50.00%)
  renamable $x11 = ANDI killed renamable $x11, 1
  BNE killed renamable $x11, $x0, %bb.2 

bb.4:
; predecessors: %bb.1, %bb.2
  renamable $x10 = LUI 912096
  renamable $x10 = ADDIW killed renamable $x10, -1330
  PseudoRET implicit $x10

bb.1:
; predecessors: %bb.0
  successors: %bb.5(0x40000000), %bb.4(0x40000000); %bb.5(50.00%), %bb.4(50.00%)
  BEQ $x0, $x0, %bb.4

The BNE can jump straight to bb.4, since bb.1 only has one instruction and always jumps to bb.4.

A couple of passes can remove this branch if they are present early (e.g., CodeGen Prepare and early tail duplication can remove a block with an unconditional branch), but given that BEQ $x0, $x0, %bb.4 only appears after the Machine Copy Propagation Pass, we end up generating unoptimized code.

This patch fixes it by adding the new optimization pass right after the Machine Copy Propagation Pass.

TODO:

  • All tests pass but there is a crash in SPEC dof_tools.c
  • Better naming? I plan to implement some peephole optimizations but changing this one is clearly not a peephole optimization
  • the branch probability is wrong on the new block:
bb.1 (%ir-block.3):
; predecessors: %bb.0
  successors: %bb.4(0x80000000), %bb.3(0x00000000); %bb.4(100.00%), %bb.3(0.00%) ; <---- it should be 50%
  liveins: $x10
  renamable $x10 = XORI killed renamable $x10, -1
  renamable $x10 = BEXTI killed renamable $x10, 13
  BNE killed renamable $x10, $x0, %bb.4
  • Commit the test (without the optimization) once this PR is stable enough, so we can see what this PR is fixing

Signed-off-by: Mikhail R. Gadelha <mikhail@igalia.com>
Signed-off-by: Mikhail R. Gadelha <mikhail@igalia.com>
Signed-off-by: Mikhail R. Gadelha <mikhail@igalia.com>
Copy link

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff b49c4af186a6de8f201ed6a4c326ebf822d4fd84 f75f6a174387487d99d1c98a44bdbd56fec6fc6b --extensions cpp,h -- llvm/lib/Target/RISCV/RISCVLatePeephole.cpp llvm/lib/Target/RISCV/RISCV.h llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
View the diff from clang-format here.
diff --git a/llvm/lib/Target/RISCV/RISCVLatePeephole.cpp b/llvm/lib/Target/RISCV/RISCVLatePeephole.cpp
index 11add97aa8..d30cbdf68b 100644
--- a/llvm/lib/Target/RISCV/RISCVLatePeephole.cpp
+++ b/llvm/lib/Target/RISCV/RISCVLatePeephole.cpp
@@ -59,7 +59,8 @@ char RISCVLatePeepholeOpt::ID = 0;
 INITIALIZE_PASS(RISCVLatePeepholeOpt, "riscv-late-peephole",
                 RISCV_LATE_PEEPHOLE_NAME, false, false)
 
-void RISCVLatePeepholeOpt::removeBlock(MachineBasicBlock *B, MachineBasicBlock *NewB) {
+void RISCVLatePeepholeOpt::removeBlock(MachineBasicBlock *B,
+                                       MachineBasicBlock *NewB) {
   LLVM_DEBUG(dbgs() << "Removing block '#'" << B->getNumber() << "\n");
 
   // Transfer the immediate dominator information from B to its descendants.


void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addRequired<MachineDominatorTreeWrapperPass>();
AU.addPreserved<MachineDominatorTreeWrapperPass>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we don't preserve the dominator tree? Do any passes use it after this?

@preames
Copy link
Collaborator

preames commented Nov 22, 2024

A couple of random thoughts on possible fixes. There's aren't exclusive, it's possible we can/should do multiple or none of these.

If we have zicond, the whole problem goes away. There may also be an arithmetic lowering available for this particular select - I haven't closely looked.

The problematic branch is created by MachineBlockPlacement when duplicating the tail. It seems odd to me that it appears to make no effort to simplify after duplicating. This would seem to unblock additional transformations.

I can't find any target which implements this "trivially evaluated cond branch -> unconditional branch" in analyzeBranch, but I also can't find any reason why you can't do this if AllowModify=true.

@topperc
Copy link
Collaborator

topperc commented Nov 22, 2024

I can't find any target which implements this "trivially evaluated cond branch -> unconditional branch" in analyzeBranch, but I also can't find any reason why you can't do this if AllowModify=true.

Are you suggesting analyzeBranch should look through copies to figure this out?

@preames
Copy link
Collaborator

preames commented Nov 22, 2024

I can't find any target which implements this "trivially evaluated cond branch -> unconditional branch" in analyzeBranch, but I also can't find any reason why you can't do this if AllowModify=true.

Are you suggesting analyzeBranch should look through copies to figure this out?

No, I was suggesting that after copy propagation that analyzeBranch could perform at least the conversion to the unconditional branch. Though honestly, recognizing the ADDI XN, x0, 0 idiom might not be crazy either.


Register DstReg = T1I->getOperand(0).getReg();
Register SrcReg = T1I->getOperand(1).getReg();
if (DstReg != SrcReg)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we keep this as a separate late peephole, I'd suggest splitting this into two transforms:

  1. Transform one does the bne zero, zero -> jump conversion.
  2. Transform two does the jump threading of an edge through a single basis block.

Doing it that way should make the code easier to read, and test. It'll also be more powerful.

Note that we probably won't keep the separate pass, so I wouldn't necessarily spend time implementing this just yet.

@mikhailramalho
Copy link
Member Author

Yesterday I noticed RISCVInstrInfo::optimizeCondBranch, which has code to detect ADDI x0, imm. Maybe we can use it in the future to handle this case of bez/bnez zero simplification.

Currently, it works only for virtual regs, and it's designed to simplify some cases of blt/bge, but it should be straightforward to change it

michaelmaitland added a commit to michaelmaitland/llvm-project that referenced this pull request Mar 18, 2025
…ructions

This is an alternative to llvm#117060, and is stacked on llvm#131684. Marking
@mikhailramalho as Co-Author here because I got the idea of a late peephole pass
and the test case from llvm#117060.

I use a late pass because we introduce the optimizable branches so late in the
pipeline.

Co-authored-by: Mikhail R. Gadelha <mikhail@igalia.com>
michaelmaitland added a commit to michaelmaitland/llvm-project that referenced this pull request Mar 18, 2025
…ructions

This is an alternative to llvm#117060, and is stacked on llvm#131684. Marking
@mikhailramalho as Co-Author here because I got the idea of a late peephole pass
and the test case from llvm#117060.

I use a late pass because we introduce the optimizable branches so late in the
pipeline.

Co-authored-by: Mikhail R. Gadelha <mikhail@igalia.com>
michaelmaitland added a commit to michaelmaitland/llvm-project that referenced this pull request Mar 18, 2025
…ructions

This is an alternative to llvm#117060, and is stacked on llvm#131684. Marking
@mikhailramalho as Co-Author here because I got the idea of a late peephole pass
and the test case from llvm#117060.

I use a late pass because we introduce the optimizable branches so late in the
pipeline.

Co-authored-by: Mikhail R. Gadelha <mikhail@igalia.com>
@@ -552,6 +553,7 @@ void RISCVPassConfig::addPreEmitPass() {
EnableRISCVCopyPropagation)
addPass(createMachineCopyPropagationPass(true));
addPass(&BranchRelaxationPassID);
addPass(createRISCVLatePeepholeOptPass());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we should be messing with branches after the branch relaxation pass

mikhailramalho added a commit to mikhailramalho/llvm-project that referenced this pull request Mar 26, 2025
This patch is an alternative to PRs llvm#117060, llvm#131684, llvm#131728.

The patch adds a late optimization pass that replaces conditional
branches that can be statically evaluated with an unconditinal branch.

Adding michael as a co-author as most of code that evaluates the
condition comes from llvm#131684.
mikhailramalho added a commit that referenced this pull request Mar 27, 2025
This patch is an alternative to PRs #117060, #131684, #131728.

The patch adds a late optimization pass that replaces conditional
branches that can be statically evaluated with an unconditinal branch.

Adding Michael as a co-author as most of the code that evaluates the
condition comes from #131684.

Co-authored-by: Michael Maitland michaeltmaitland@gmail.com
@mikhailramalho mikhailramalho deleted the riscv-late-opt branch March 27, 2025 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants