[RISCV] Add late optimization pass for RISC-V to optimize branch instructions #131728

michaelmaitland · 2025-03-18T04:50:39Z

This is an alternative to #117060, and is stacked on #131684. Marking
@mikhailramalho as co-author here because I got the idea of a late peephole pass
and the test case from #117060.

We use a late pass because we introduce the optimizable branches so late in the
pipeline.

Co-authored-by: Mikhail R. Gadelha mikhail@igalia.com

There are two changes here. The first that we teach analyzeBranch how to evaluate a conditional branch followed by a unconditional branch such that we can evaluate the conditional branch statically. Often, we will see comparison to one or zero since SelectionDAG often uses i1 for the conditional comparison. As a result, we handle this specific case. We handle only EQ and NEQ for now, but this can be expanded in the future. We can also expand on handling arbitrary constants in the future. The second change is that we pass AllowModify=false to analyzeBranch in the tail merging code. The reason we do this is because this code is doing some clever tricks to the branch code that it will restore later. Now that we are actually optimizing branches in analyzeBranch, we have to be careful not to mess up this canonical form that the tail merging code expects.

llvmbot · 2025-03-18T04:51:02Z

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-risc-v

Author: Michael Maitland (michaelmaitland)

Changes

This is an alternative to #117060, and is stacked on #131684. Marking
@mikhailramalho as co-author here because I got the idea of a late peephole pass
and the test case from #117060.

I use a late pass because we introduce the optimizable branches so late in the
pipeline.

Co-authored-by: Mikhail R. Gadelha <mikhail@igalia.com>

Patch is 32.48 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/131728.diff

14 Files Affected:

(modified) llvm/lib/CodeGen/BranchFolding.cpp (+4-3)
(modified) llvm/lib/Target/RISCV/CMakeLists.txt (+1)
(modified) llvm/lib/Target/RISCV/RISCV.h (+3)
(modified) llvm/lib/Target/RISCV/RISCVInstrInfo.cpp (+110)
(modified) llvm/lib/Target/RISCV/RISCVInstrInfo.h (+20)
(added) llvm/lib/Target/RISCV/RISCVLatePeephole.cpp (+87)
(modified) llvm/lib/Target/RISCV/RISCVTargetMachine.cpp (+2)
(modified) llvm/test/CodeGen/RISCV/GlobalISel/rv32zbb.ll (-20)
(modified) llvm/test/CodeGen/RISCV/O0-pipeline.ll (+1)
(modified) llvm/test/CodeGen/RISCV/O3-pipeline.ll (+1)
(modified) llvm/test/CodeGen/RISCV/branch_zero.ll (+6-16)
(modified) llvm/test/CodeGen/RISCV/push-pop-opt-crash.ll (+31-31)
(modified) llvm/test/CodeGen/RISCV/rvv/vxrm-insert-out-of-loop.ll (+29-37)
(added) llvm/test/CodeGen/RISCV/simplify-condbr.ll (+179)

diff --git a/llvm/lib/CodeGen/BranchFolding.cpp b/llvm/lib/CodeGen/BranchFolding.cpp
index 5218e39b88222..b4e0b4bf4585e 100644
--- a/llvm/lib/CodeGen/BranchFolding.cpp
+++ b/llvm/lib/CodeGen/BranchFolding.cpp
@@ -467,7 +467,7 @@ static void FixTail(MachineBasicBlock *CurMBB, MachineBasicBlock *SuccBB,
   DebugLoc dl = CurMBB->findBranchDebugLoc();
   if (!dl)
     dl = BranchDL;
-  if (I != MF->end() && !TII->analyzeBranch(*CurMBB, TBB, FBB, Cond, true)) {
+  if (I != MF->end() && !TII->analyzeBranch(*CurMBB, TBB, FBB, Cond)) {
     MachineBasicBlock *NextBB = &*I;
     if (TBB == NextBB && !Cond.empty() && !FBB) {
       if (!TII->reverseBranchCondition(Cond)) {
@@ -1107,7 +1107,7 @@ bool BranchFolder::TailMergeBlocks(MachineFunction &MF) {
 
       MachineBasicBlock *TBB = nullptr, *FBB = nullptr;
       SmallVector<MachineOperand, 4> Cond;
-      if (!TII->analyzeBranch(*PBB, TBB, FBB, Cond, true)) {
+      if (!TII->analyzeBranch(*PBB, TBB, FBB, Cond)) {
         // Failing case: IBB is the target of a cbr, and we cannot reverse the
         // branch.
         SmallVector<MachineOperand, 4> NewCond(Cond);
@@ -1564,7 +1564,8 @@ bool BranchFolder::OptimizeBlock(MachineBasicBlock *MBB) {
     //    Loop: xxx; jcc Out; jmp Loop
     // we want:
     //    Loop: xxx; jncc Loop; jmp Out
-    if (CurTBB && CurFBB && CurFBB == MBB && CurTBB != MBB) {
+    if (CurTBB && CurFBB && CurFBB == MBB && CurTBB != MBB &&
+        !CurCond.empty()) {
       SmallVector<MachineOperand, 4> NewCond(CurCond);
       if (!TII->reverseBranchCondition(NewCond)) {
         DebugLoc Dl = MBB->findBranchDebugLoc();
diff --git a/llvm/lib/Target/RISCV/CMakeLists.txt b/llvm/lib/Target/RISCV/CMakeLists.txt
index e8d00f4df7c86..3fff898411d7a 100644
--- a/llvm/lib/Target/RISCV/CMakeLists.txt
+++ b/llvm/lib/Target/RISCV/CMakeLists.txt
@@ -35,6 +35,7 @@ add_llvm_target(RISCVCodeGen
   RISCVConstantPoolValue.cpp
   RISCVDeadRegisterDefinitions.cpp
   RISCVMakeCompressible.cpp
+  RISCVLatePeephole.cpp
   RISCVExpandAtomicPseudoInsts.cpp
   RISCVExpandPseudoInsts.cpp
   RISCVFoldMemOffset.cpp
diff --git a/llvm/lib/Target/RISCV/RISCV.h b/llvm/lib/Target/RISCV/RISCV.h
index 641e2eb4094f9..cd6045355a9ef 100644
--- a/llvm/lib/Target/RISCV/RISCV.h
+++ b/llvm/lib/Target/RISCV/RISCV.h
@@ -43,6 +43,9 @@ FunctionPass *createRISCVISelDag(RISCVTargetMachine &TM,
 FunctionPass *createRISCVMakeCompressibleOptPass();
 void initializeRISCVMakeCompressibleOptPass(PassRegistry &);
 
+FunctionPass *createRISCVLatePeepholeOptPass();
+void initializeRISCVLatePeepholeOptPass(PassRegistry &);
+
 FunctionPass *createRISCVGatherScatterLoweringPass();
 void initializeRISCVGatherScatterLoweringPass(PassRegistry &);
 
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
index 2fdf6bd36e88f..e0d364c4a1306 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
@@ -1005,6 +1005,109 @@ RISCVCC::CondCode RISCVCC::getOppositeBranchCondition(RISCVCC::CondCode CC) {
   }
 }
 
+// Return true if MO definitely contains the value one.
+static bool isOne(MachineOperand &MO) {
+  if (MO.isImm() && MO.getImm() == 1)
+    return true;
+
+  if (!MO.isReg() || !MO.getReg().isVirtual())
+    return false;
+
+  MachineRegisterInfo &MRI =
+      MO.getParent()->getParent()->getParent()->getRegInfo();
+  MachineInstr *DefMI = MRI.getUniqueVRegDef(MO.getReg());
+  if (!DefMI)
+    return false;
+
+  // For now, just check the canonical one value.
+  if (DefMI->getOpcode() == RISCV::ADDI &&
+      DefMI->getOperand(1).getReg() == RISCV::X0 &&
+      DefMI->getOperand(2).getImm() == 1)
+    return true;
+
+  return false;
+}
+
+// Return true if MO definitely contains the value zero.
+static bool isZero(MachineOperand &MO) {
+  if (MO.isImm() && MO.getImm() == 0)
+    return true;
+  if (MO.isReg() && MO.getReg() == RISCV::X0)
+    return true;
+  return false;
+}
+
+bool RISCVInstrInfo::trySimplifyCondBr(
+    MachineBasicBlock &MBB, MachineBasicBlock *TBB, MachineBasicBlock *FBB,
+    SmallVectorImpl<MachineOperand> &Cond) const {
+
+  if (!TBB || Cond.size() != 3)
+    return false;
+
+  RISCVCC::CondCode CC = static_cast<RISCVCC::CondCode>(Cond[0].getImm());
+  auto LHS = Cond[1];
+  auto RHS = Cond[2];
+
+  MachineBasicBlock *Folded = nullptr;
+  switch (CC) {
+  default:
+    // TODO: Implement for more CCs
+    return false;
+  case RISCVCC::COND_EQ: {
+    // We can statically evaluate that we take the first branch
+    if ((isZero(LHS) && isZero(RHS)) || (isOne(LHS) && isOne(RHS))) {
+      Folded = TBB;
+      break;
+    }
+    // We can statically evaluate that we take the second branch
+    if ((isZero(LHS) && isOne(RHS)) || (isOne(LHS) && isZero(RHS))) {
+      Folded = FBB;
+      break;
+    }
+    return false;
+  }
+  case RISCVCC::COND_NE: {
+    // We can statically evaluate that we take the first branch
+    if ((isOne(LHS) && isZero(RHS)) || (isZero(LHS) && isOne(RHS))) {
+      Folded = TBB;
+      break;
+    }
+    // We can statically evaluate that we take the second branch
+    if ((isZero(LHS) && isZero(RHS)) || (isOne(LHS) && isOne(RHS))) {
+      Folded = FBB;
+      break;
+    }
+    return false;
+  }
+  }
+
+  // At this point, its legal to optimize.
+  removeBranch(MBB);
+  Cond.clear();
+
+  // Only need to insert a branch if we're not falling through.
+  if (Folded) {
+    DebugLoc DL = MBB.findBranchDebugLoc();
+    insertBranch(MBB, Folded, nullptr, {}, DL);
+  }
+
+  // Update the successors. Remove them all and add back the correct one.
+  while (!MBB.succ_empty())
+    MBB.removeSuccessor(MBB.succ_end() - 1);
+
+  // If it's a fallthrough, we need to figure out where MBB is going.
+  if (!Folded) {
+    MachineFunction::iterator Fallthrough = ++MBB.getIterator();
+    if (Fallthrough != MBB.getParent()->end())
+      MBB.addSuccessor(&*Fallthrough);
+  } else
+    MBB.addSuccessor(Folded);
+
+  TBB = Folded;
+  FBB = nullptr;
+  return true;
+}
+
 bool RISCVInstrInfo::analyzeBranch(MachineBasicBlock &MBB,
                                    MachineBasicBlock *&TBB,
                                    MachineBasicBlock *&FBB,
@@ -1062,6 +1165,9 @@ bool RISCVInstrInfo::analyzeBranch(MachineBasicBlock &MBB,
   // Handle a single conditional branch.
   if (NumTerminators == 1 && I->getDesc().isConditionalBranch()) {
     parseCondBranch(*I, TBB, Cond);
+    // Try to fold the branch of the conditional branch into a the fallthru.
+    if (AllowModify)
+      trySimplifyCondBr(MBB, TBB, FBB, Cond);
     return false;
   }
 
@@ -1070,6 +1176,10 @@ bool RISCVInstrInfo::analyzeBranch(MachineBasicBlock &MBB,
       I->getDesc().isUnconditionalBranch()) {
     parseCondBranch(*std::prev(I), TBB, Cond);
     FBB = getBranchDestBlock(*I);
+    // Try to fold the branch of the conditional branch into an unconditional
+    // branch.
+    if (AllowModify)
+      trySimplifyCondBr(MBB, TBB, FBB, Cond);
     return false;
   }
 
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.h b/llvm/lib/Target/RISCV/RISCVInstrInfo.h
index 656cb38e11297..d00b6f57d10e0 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.h
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.h
@@ -306,6 +306,26 @@ class RISCVInstrInfo : public RISCVGenInstrInfo {
 
   static bool isLdStSafeToPair(const MachineInstr &LdSt,
                                const TargetRegisterInfo *TRI);
+  /// Return true if the branch represented by the conditional branch with
+  /// components TBB, FBB, and CurCond was folded into an unconditional branch.
+  ///
+  /// If FBB is nullptr, then the the input represents a conditional branch with
+  /// a fallthrough.
+  ///
+  /// For example:
+  /// BRCOND EQ 0, 0, BB1
+  /// BR BB2
+  ///
+  /// can be simplified to BR BB1 since 0 == 0 statically. On the other hand,
+  ///
+  ///
+  /// BRCOND EQ 0, 1, BB1
+  /// BR BB2
+  ///
+  /// can be simplified to BR BB2 because 0 != 1 statically.
+  bool trySimplifyCondBr(MachineBasicBlock &MBB, MachineBasicBlock *TBB,
+                         MachineBasicBlock *FBB,
+                         SmallVectorImpl<MachineOperand> &Cond) const;
 
 protected:
   const RISCVSubtarget &STI;
diff --git a/llvm/lib/Target/RISCV/RISCVLatePeephole.cpp b/llvm/lib/Target/RISCV/RISCVLatePeephole.cpp
new file mode 100644
index 0000000000000..5ecee5069d3cd
--- /dev/null
+++ b/llvm/lib/Target/RISCV/RISCVLatePeephole.cpp
@@ -0,0 +1,87 @@
+//===-- RISCVLatePeephole.cpp - Late stage peephole optimization ----------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// This file provides RISC-V late peephole optimizations
+///
+//===----------------------------------------------------------------------===//
+
+#include "MCTargetDesc/RISCVMCTargetDesc.h"
+#include "RISCV.h"
+#include "RISCVInstrInfo.h"
+#include "RISCVSubtarget.h"
+#include "llvm/CodeGen/MachineBasicBlock.h"
+#include "llvm/CodeGen/MachineBranchProbabilityInfo.h"
+#include "llvm/CodeGen/MachineDominators.h"
+#include "llvm/CodeGen/MachineInstrBuilder.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/CodeGen/RegisterScavenging.h"
+#include "llvm/MC/TargetRegistry.h"
+#include "llvm/Support/Debug.h"
+
+using namespace llvm;
+
+#define DEBUG_TYPE "riscv-late-peephole"
+#define RISCV_LATE_PEEPHOLE_NAME "RISC-V Late Stage Peephole"
+
+namespace {
+
+struct RISCVLatePeepholeOpt : public MachineFunctionPass {
+  static char ID;
+
+  RISCVLatePeepholeOpt() : MachineFunctionPass(ID) {}
+
+  StringRef getPassName() const override { return RISCV_LATE_PEEPHOLE_NAME; }
+
+  void getAnalysisUsage(AnalysisUsage &AU) const override {
+    AU.addRequired<MachineDominatorTreeWrapperPass>();
+    AU.addPreserved<MachineDominatorTreeWrapperPass>();
+    MachineFunctionPass::getAnalysisUsage(AU);
+  }
+
+  bool runOnMachineFunction(MachineFunction &Fn) override;
+
+private:
+  bool optimizeBlock(MachineBasicBlock &MBB);
+
+  const RISCVInstrInfo *TII = nullptr;
+};
+} // namespace
+
+char RISCVLatePeepholeOpt::ID = 0;
+INITIALIZE_PASS(RISCVLatePeepholeOpt, "riscv-late-peephole",
+                RISCV_LATE_PEEPHOLE_NAME, false, false)
+
+bool RISCVLatePeepholeOpt::optimizeBlock(MachineBasicBlock &MBB) {
+
+  // Use trySimplifyCondBr directly to know whether the optimization occured.
+  MachineBasicBlock *TBB, *FBB;
+  SmallVector<MachineOperand, 4> Cond;
+  if (!TII->analyzeBranch(MBB, TBB, FBB, Cond, false))
+    return TII->trySimplifyCondBr(MBB, TBB, FBB, Cond);
+
+  return false;
+}
+
+bool RISCVLatePeepholeOpt::runOnMachineFunction(MachineFunction &MF) {
+  if (skipFunction(MF.getFunction()))
+    return false;
+
+  TII = MF.getSubtarget<RISCVSubtarget>().getInstrInfo();
+
+  bool MadeChange = false;
+
+  for (MachineBasicBlock &MBB : MF)
+    MadeChange |= optimizeBlock(MBB);
+
+  return MadeChange;
+}
+
+/// Returns an instance of the Make Compressible Optimization pass.
+FunctionPass *llvm::createRISCVLatePeepholeOptPass() {
+  return new RISCVLatePeepholeOpt();
+}
diff --git a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
index f78e5f8147d98..a283bd02bf8fa 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
@@ -128,6 +128,7 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeRISCVTarget() {
   initializeKCFIPass(*PR);
   initializeRISCVDeadRegisterDefinitionsPass(*PR);
   initializeRISCVMakeCompressibleOptPass(*PR);
+  initializeRISCVLatePeepholeOptPass(*PR);
   initializeRISCVGatherScatterLoweringPass(*PR);
   initializeRISCVCodeGenPreparePass(*PR);
   initializeRISCVPostRAExpandPseudoPass(*PR);
@@ -567,6 +568,7 @@ void RISCVPassConfig::addPreEmitPass() {
     addPass(createMachineCopyPropagationPass(true));
   addPass(&BranchRelaxationPassID);
   addPass(createRISCVMakeCompressibleOptPass());
+  addPass(createRISCVLatePeepholeOptPass());
 }
 
 void RISCVPassConfig::addPreEmitPass2() {
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/rv32zbb.ll b/llvm/test/CodeGen/RISCV/GlobalISel/rv32zbb.ll
index 338925059862c..74ec7308cb646 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/rv32zbb.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/rv32zbb.ll
@@ -357,11 +357,6 @@ define i64 @ctpop_i64(i64 %a) nounwind {
 define i1 @ctpop_i64_ugt_two(i64 %a) nounwind {
 ; RV32I-LABEL: ctpop_i64_ugt_two:
 ; RV32I:       # %bb.0:
-; RV32I-NEXT:    beqz zero, .LBB6_2
-; RV32I-NEXT:  # %bb.1:
-; RV32I-NEXT:    sltiu a0, zero, 0
-; RV32I-NEXT:    ret
-; RV32I-NEXT:  .LBB6_2:
 ; RV32I-NEXT:    srli a2, a0, 1
 ; RV32I-NEXT:    lui a3, 349525
 ; RV32I-NEXT:    lui a4, 209715
@@ -404,11 +399,6 @@ define i1 @ctpop_i64_ugt_two(i64 %a) nounwind {
 ;
 ; RV32ZBB-LABEL: ctpop_i64_ugt_two:
 ; RV32ZBB:       # %bb.0:
-; RV32ZBB-NEXT:    beqz zero, .LBB6_2
-; RV32ZBB-NEXT:  # %bb.1:
-; RV32ZBB-NEXT:    sltiu a0, zero, 0
-; RV32ZBB-NEXT:    ret
-; RV32ZBB-NEXT:  .LBB6_2:
 ; RV32ZBB-NEXT:    cpop a0, a0
 ; RV32ZBB-NEXT:    cpop a1, a1
 ; RV32ZBB-NEXT:    add a0, a1, a0
@@ -422,11 +412,6 @@ define i1 @ctpop_i64_ugt_two(i64 %a) nounwind {
 define i1 @ctpop_i64_ugt_one(i64 %a) nounwind {
 ; RV32I-LABEL: ctpop_i64_ugt_one:
 ; RV32I:       # %bb.0:
-; RV32I-NEXT:    beqz zero, .LBB7_2
-; RV32I-NEXT:  # %bb.1:
-; RV32I-NEXT:    snez a0, zero
-; RV32I-NEXT:    ret
-; RV32I-NEXT:  .LBB7_2:
 ; RV32I-NEXT:    srli a2, a0, 1
 ; RV32I-NEXT:    lui a3, 349525
 ; RV32I-NEXT:    lui a4, 209715
@@ -470,11 +455,6 @@ define i1 @ctpop_i64_ugt_one(i64 %a) nounwind {
 ;
 ; RV32ZBB-LABEL: ctpop_i64_ugt_one:
 ; RV32ZBB:       # %bb.0:
-; RV32ZBB-NEXT:    beqz zero, .LBB7_2
-; RV32ZBB-NEXT:  # %bb.1:
-; RV32ZBB-NEXT:    snez a0, zero
-; RV32ZBB-NEXT:    ret
-; RV32ZBB-NEXT:  .LBB7_2:
 ; RV32ZBB-NEXT:    cpop a0, a0
 ; RV32ZBB-NEXT:    cpop a1, a1
 ; RV32ZBB-NEXT:    add a0, a1, a0
diff --git a/llvm/test/CodeGen/RISCV/O0-pipeline.ll b/llvm/test/CodeGen/RISCV/O0-pipeline.ll
index f93cb65897210..29ec19b7e35a7 100644
--- a/llvm/test/CodeGen/RISCV/O0-pipeline.ll
+++ b/llvm/test/CodeGen/RISCV/O0-pipeline.ll
@@ -64,6 +64,7 @@
 ; CHECK-NEXT:       Implement the 'patchable-function' attribute
 ; CHECK-NEXT:       Branch relaxation pass
 ; CHECK-NEXT:       RISC-V Make Compressible
+; CHECK-NEXT:       RISC-V Late Stage Peephole
 ; CHECK-NEXT:       Contiguously Lay Out Funclets
 ; CHECK-NEXT:       Remove Loads Into Fake Uses
 ; CHECK-NEXT:       StackMap Liveness Analysis
diff --git a/llvm/test/CodeGen/RISCV/O3-pipeline.ll b/llvm/test/CodeGen/RISCV/O3-pipeline.ll
index 976d1ee003a1f..b3698caf7f0f6 100644
--- a/llvm/test/CodeGen/RISCV/O3-pipeline.ll
+++ b/llvm/test/CodeGen/RISCV/O3-pipeline.ll
@@ -196,6 +196,7 @@
 ; CHECK-NEXT:       Machine Copy Propagation Pass
 ; CHECK-NEXT:       Branch relaxation pass
 ; CHECK-NEXT:       RISC-V Make Compressible
+; CHECK-NEXT:       RISC-V Late Stage Peephole
 ; CHECK-NEXT:       Contiguously Lay Out Funclets
 ; CHECK-NEXT:       Remove Loads Into Fake Uses
 ; CHECK-NEXT:       StackMap Liveness Analysis
diff --git a/llvm/test/CodeGen/RISCV/branch_zero.ll b/llvm/test/CodeGen/RISCV/branch_zero.ll
index fd0979977ba3b..2c13c28647516 100644
--- a/llvm/test/CodeGen/RISCV/branch_zero.ll
+++ b/llvm/test/CodeGen/RISCV/branch_zero.ll
@@ -5,16 +5,11 @@
 define void @foo(i16 %finder_idx) {
 ; CHECK-LABEL: foo:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:  .LBB0_1: # %for.body
-; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:  # %bb.1: # %for.body
 ; CHECK-NEXT:    slli a0, a0, 48
-; CHECK-NEXT:    bltz a0, .LBB0_4
+; CHECK-NEXT:    bltz a0, .LBB0_3
 ; CHECK-NEXT:  # %bb.2: # %while.cond.preheader.i
-; CHECK-NEXT:    # in Loop: Header=BB0_1 Depth=1
-; CHECK-NEXT:    li a0, 0
-; CHECK-NEXT:    bnez zero, .LBB0_1
-; CHECK-NEXT:  # %bb.3: # %while.body
-; CHECK-NEXT:  .LBB0_4: # %while.cond1.preheader.i
+; CHECK-NEXT:  .LBB0_3: # %while.cond1.preheader.i
 entry:
   br label %for.body
 
@@ -46,16 +41,11 @@ if.then:
 define void @bar(i16 %finder_idx) {
 ; CHECK-LABEL: bar:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:  .LBB1_1: # %for.body
-; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:  # %bb.1: # %for.body
 ; CHECK-NEXT:    slli a0, a0, 48
-; CHECK-NEXT:    bgez a0, .LBB1_4
+; CHECK-NEXT:    bgez a0, .LBB1_3
 ; CHECK-NEXT:  # %bb.2: # %while.cond.preheader.i
-; CHECK-NEXT:    # in Loop: Header=BB1_1 Depth=1
-; CHECK-NEXT:    li a0, 0
-; CHECK-NEXT:    bnez zero, .LBB1_1
-; CHECK-NEXT:  # %bb.3: # %while.body
-; CHECK-NEXT:  .LBB1_4: # %while.cond1.preheader.i
+; CHECK-NEXT:  .LBB1_3: # %while.cond1.preheader.i
 entry:
   br label %for.body
 
diff --git a/llvm/test/CodeGen/RISCV/push-pop-opt-crash.ll b/llvm/test/CodeGen/RISCV/push-pop-opt-crash.ll
index 1e72529b17f59..00689c3136517 100644
--- a/llvm/test/CodeGen/RISCV/push-pop-opt-crash.ll
+++ b/llvm/test/CodeGen/RISCV/push-pop-opt-crash.ll
@@ -1,3 +1,4 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
 ; RUN: llc -mattr=+zcmp -verify-machineinstrs  \
 ; RUN: -mtriple=riscv32 -target-abi=ilp32 < %s \
 ; RUN: | FileCheck %s -check-prefixes=RV32IZCMP
@@ -11,40 +12,39 @@
 
 declare dso_local void @f1() local_unnamed_addr
 declare dso_local void @f2() local_unnamed_addr
-define  dso_local void @f0() local_unnamed_addr {
+define  dso_local void @f0(i1 %c) local_unnamed_addr {
 ; RV32IZCMP-LABEL: f0:
-; RV32IZCMP: 	.cfi_startproc
-; RV32IZCMP-NEXT: # %bb.0:                                # %entry
-; RV32IZCMP-NEXT: 	bnez	zero, .LBB0_2
-; RV32IZCMP-NEXT: # %bb.1:                                # %if.T
-; RV32IZCMP-NEXT: 	cm.push	{ra}, -16
-; RV32IZCMP-NEXT: 	.cfi_def_cfa_offset 16
-; RV32IZCMP-NEXT: 	.cfi_offset ra, -4
-; RV32IZCMP-NEXT: 	call	f1
-; RV32IZCMP-NEXT: 	cm.pop	{ra}, 16
-; RV32IZCMP-NEXT:     .cfi_restore ra
-; RV32IZCMP-NEXT:     .cfi_def_cfa_offset 0
-; RV32IZCMP-NEXT: .LBB0_2:                                # %if.F
-; RV32IZCMP-NEXT: 	tail	f2
-; RV32IZCMP-NEXT: .Lfunc_end0:
-
+; RV32IZCMP:       # %bb.0: # %entry
+; RV32IZCMP-NEXT:    andi a0, a0, 1
+; RV32IZCMP-NEXT:    beqz a0, .LBB0_2
+; RV32IZCMP-NEXT:  # %bb.1: # %if.T
+; RV32IZCMP-NEXT:    cm.push {ra}, -16
+; RV32IZCMP-NEXT:    .cfi_def_cfa_offset 16
+; RV32IZCMP-NEXT:    .cfi_offset ra, -4
+; RV32IZCMP-NEXT:    call f1
+; RV32IZCMP-NEXT:    cm.pop {ra}, 16
+; RV32IZCMP-NEXT:    .cfi_restore ra
+; RV32IZCMP-NEXT:    .cfi_def_cfa_offset 0
+; RV32IZCMP-NEXT:  .LBB0_2: # %if.F
+; RV32IZCMP-NEXT:    tail f2
+;
 ; RV64IZCMP-LABEL: f0:
-; RV64IZCMP: 	.cfi_startproc
-; RV64IZCMP-NEXT: # %bb.0:                                # %entry
-; RV64IZCMP-NEXT: 	bnez	zero, .LBB0_2
-; RV64IZCMP-NEXT: # %bb.1:                                # %if.T
-; RV64IZCMP-NEXT: 	cm.push	{ra}, -16
-; RV64IZCMP-NEXT: 	.cfi_def_cfa_offset 16
-; RV64IZCMP-NEXT: 	.cfi_offset ra, -8
-; RV64IZCMP-NEXT: 	call	f1
-; RV64IZCMP-NEXT: 	cm.pop	{ra}, 16
-; RV64IZCMP-NEXT:     .cfi_restore ra
-; RV64IZCMP-NEXT:     .cfi_def_cfa_offset 0
-; RV64IZCMP-NEXT: .LBB0_2:                                # %if.F
-; RV64IZCMP-NEXT: 	tail	f2
-; RV64IZCMP-NEXT: .Lfunc_end0:
+; RV64IZCMP:       # %bb.0: # %entry
+; RV64IZCMP-NEXT:    andi a0, a0, 1
+; RV64IZCMP-NEXT:    beqz a0, .LBB0_2
+; RV64IZCMP-NEXT:  # %bb.1: # %if.T
+; RV64IZCMP-NEXT:    cm.push {ra}, -16
+; RV64IZCMP-NEXT:    .cfi_def_cfa_offset 16
+; RV64IZCMP-NEXT:    .cfi_offset ra, -8
+; RV64IZCMP-NEXT:    call f1
+; RV64IZCMP-NEXT:    cm.pop {ra}, 16
+; RV64IZCMP-NEXT:    .cfi_restore ra
+; RV64IZCMP-NEXT:    .cfi_def_cfa_offset 0
+; RV64IZCMP-NEXT:  .LBB0_2: # %if.F
+; RV64IZCMP-NEXT:    tail f2
+
 entry:
-  br i1 poison, label %if.T, label %if.F
+  br i1 %c, label %if.T, label %if.F
 
 if.T:
   tail call void @f1()
diff --git a/llvm/test/CodeGen/RISCV/rvv/vxrm-insert-out-of-loop.ll b/llvm/test/CodeGen/RISCV/rvv/vxrm-insert-out-of-loop.ll
index c35f05be304cc..5251074717c93 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vxrm-insert-out-of-loop.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vxrm-insert-out-of-loop.ll
@@ -14,9 +14,9 @@
 define void @test1(ptr nocapture noundef writeonly %dst, i32 noundef signext %i_dst_stride, ptr nocapture noundef readonly %src1, i32 noundef signext %i_src1_stride, ptr nocapture noundef readonly %src2, i32 noundef signext %i_src2_stride, i32 noundef sig...
[truncated]

@mikhailramalho

…ructions This is an alternative to llvm#117060, and is stacked on llvm#131684. Marking @mikhailramalho as Co-Author here because I got the idea of a late peephole pass and the test case from llvm#117060. I use a late pass because we introduce the optimizable branches so late in the pipeline. Co-authored-by: Mikhail R. Gadelha <mikhail@igalia.com>

wangpc-pp

Can you provide some performance or IC data? It seems the gain is not that large.

mikhailramalho · 2025-03-18T16:31:52Z

Can you provide some performance or IC data? It seems the gain is not that large.

I'll post the results here tomorrow morning

michaelmaitland · 2025-03-18T16:32:51Z

Can you provide some performance or IC data? It seems the gain is not that large.

I'll post the results here tomorrow morning

Thanks @mikhailramalho. I really appreciate the assistance!

topperc · 2025-03-19T16:35:13Z

llvm/lib/Target/RISCV/RISCVTargetMachine.cpp

@@ -567,6 +568,7 @@ void RISCVPassConfig::addPreEmitPass() {
    addPass(createMachineCopyPropagationPass(true));
  addPass(&BranchRelaxationPassID);
  addPass(createRISCVMakeCompressibleOptPass());
+  addPass(createRISCVLatePeepholeOptPass());


I'm not sure we should mess with branches after BranchRelaxationPass

I'm checking my notes here and the pattern that generates the branch, i.e.:

bb.2: renamable $x10 = XORI killed renamable $x10, -1 renamable $x10 = BEXTI killed renamable $x10, 13 BNE killed renamable $x10, $x0, %bb.5

Seems to be formed by MachineBlockPlacementPass, so maybe we can move this change to the beginning of addPreEmitPass and still be able to optimize it.

I would put it immediately before branch relaxation, unless you're also sure that machine copy propagation won't also introduce these kinds of branches.

@lenary, MCP is definitely introducing these kinds of branches.

topperc · 2025-03-19T16:35:36Z

llvm/lib/Target/RISCV/RISCVLatePeephole.cpp

+
+bool RISCVLatePeepholeOpt::optimizeBlock(MachineBasicBlock &MBB) {
+
+  // Use trySimplifyCondBr directly to know whether the optimization occured.


mikhailramalho · 2025-03-20T19:21:46Z

Sorry for the delay, I was seeing some strange numbers, so I ran everything twice: https://lnt.lukelau.me/db_default/v4/nts/338?show_delta=yes&show_previous=yes&show_stddev=yes&show_mad=yes&show_all=yes&show_all_samples=yes&show_sample_counts=yes&show_small_diff=yes&num_comparison_runs=0&test_filter=&test_min_value_filter=&aggregation_fn=min&MW_confidence_lv=0.05&compare_to=337&submit=Update

I wouldn't trust the result of gcc_r, there is a high variance there
The regression on xalancbmk_r is strange, I'll have to take a closer look at it

michaelmaitland · 2025-03-20T19:53:58Z

Sorry for the delay, I was seeing some strange numbers, so I ran everything twice: https://lnt.lukelau.me/db_default/v4/nts/338?show_delta=yes&show_previous=yes&show_stddev=yes&show_mad=yes&show_all=yes&show_all_samples=yes&show_sample_counts=yes&show_small_diff=yes&num_comparison_runs=0&test_filter=&test_min_value_filter=&aggregation_fn=min&MW_confidence_lv=0.05&compare_to=337&submit=Update

I wouldn't trust the result of gcc_r, there is a high variance there

The regression on xalancbmk_r is strange, I'll have to take a closer look at it

@mikhailramalho so what I'm hearing is that we should count this as a pretty nice gain in xz? I know xalancbmk usually has high variance too, but this looks quite large.

topperc · 2025-03-20T19:55:45Z

Sorry for the delay, I was seeing some strange numbers, so I ran everything twice: https://lnt.lukelau.me/db_default/v4/nts/338?show_delta=yes&show_previous=yes&show_stddev=yes&show_mad=yes&show_all=yes&show_all_samples=yes&show_sample_counts=yes&show_small_diff=yes&num_comparison_runs=0&test_filter=&test_min_value_filter=&aggregation_fn=min&MW_confidence_lv=0.05&compare_to=337&submit=Update

I wouldn't trust the result of gcc_r, there is a high variance there

The regression on xalancbmk_r is strange, I'll have to take a closer look at it

@mikhailramalho so what I'm hearing is that we should count this as a pretty nice gain in xz? I know xalancbmk usually has high variance too, but this looks quite large.

Does #117060 also improve xz?

mikhailramalho · 2025-03-20T20:26:13Z

@mikhailramalho so what I'm hearing is that we should count this as a pretty nice gain in xz? I know xalancbmk usually has high variance too, but this looks quite large.

I'd say we can trust the results of xz and parest, they were consistent across the two full runs I did.

Does #117060 also improve xz?

I didn't test it back then, so I don't know :/

This patch is an alternative to PRs llvm#117060, llvm#131684, llvm#131728. The patch adds a late optimization pass that replaces conditional branches that can be statically evaluated with an unconditinal branch. Adding michael as a co-author as most of code that evaluates the condition comes from llvm#131684.

This patch is an alternative to PRs #117060, #131684, #131728. The patch adds a late optimization pass that replaces conditional branches that can be statically evaluated with an unconditinal branch. Adding Michael as a co-author as most of the code that evaluates the condition comes from #131684. Co-authored-by: Michael Maitland michaeltmaitland@gmail.com

This is a follow-up patch to PR llvm#133256. This patch adds the branch folding pass after the newly added late optimization pass for riscv, which reduces code size in all SPEC benchmarks (except libm). The improvements are: 500.perlbench_r (-3.37%), 544.nab_r (-3.06%), 557.xz_r (-2.82%), 523.xalancbmk_r (-2.64%), 520.omnetpp_r (-2.34%), 531.deepsjeng_r (-2.27%), 502.gcc_r (-2.19%), 526.blender_r (-2.11%), 538.imagick_r (-2.03%), 505.mcf_r (-1.82%), 541.leela_r (-1.74%), 511.povray_r (-1.62%), 510.parest_r (-1.62%), 508.namd_r (-1.57%), 525.x264_r (-1.47%). Geo mean is -2.07%. Some caveats: * On llvm#131728 I mentioned a 7% improvement on execution time of xz, but that's no longer the case. I went back and also tried to reproduce the result with the code from llvm#131728 and couldn't. Now the results from that PR and this one are the same: an overall code size reduction but no exec time improvements. * The root cause of the large number is not yet clear for me. I'm still investigating it.

michaelmaitland added 3 commits March 17, 2025 18:59

precommit test case

4511b8c

precommit test case

1d0f096

michaelmaitland added the backend:RISC-V label Mar 18, 2025

michaelmaitland requested review from asb, preames, mikhailramalho, lukel97 and topperc March 18, 2025 04:50

llvmbot added the llvm:globalisel label Mar 18, 2025

michaelmaitland force-pushed the analyze-branch-opt-more branch from 35e2715 to e064268 Compare March 18, 2025 04:52

wangpc-pp reviewed Mar 18, 2025

View reviewed changes

michaelmaitland mentioned this pull request Mar 18, 2025

[RISCV] Optimize conditional branches that can be statically evaluated #131684

Closed

topperc reviewed Mar 19, 2025

View reviewed changes

mikhailramalho mentioned this pull request Mar 27, 2025

[RISCV] Add late optimization pass for riscv #133256

Merged

michaelmaitland closed this Mar 27, 2025

mikhailramalho mentioned this pull request Apr 8, 2025

[RISCV] Add branch folding before branch relaxation #134760

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISCV] Add late optimization pass for RISC-V to optimize branch instructions #131728

[RISCV] Add late optimization pass for RISC-V to optimize branch instructions #131728

michaelmaitland commented Mar 18, 2025 •

edited

Loading

llvmbot commented Mar 18, 2025 •

edited

Loading

wangpc-pp left a comment

mikhailramalho commented Mar 18, 2025

michaelmaitland commented Mar 18, 2025

topperc Mar 19, 2025

mikhailramalho Mar 19, 2025

lenary Mar 19, 2025

michaelmaitland Mar 19, 2025

topperc Mar 19, 2025

mikhailramalho commented Mar 20, 2025

michaelmaitland commented Mar 20, 2025

topperc commented Mar 20, 2025

mikhailramalho commented Mar 20, 2025


		bool RISCVLatePeepholeOpt::optimizeBlock(MachineBasicBlock &MBB) {

		// Use trySimplifyCondBr directly to know whether the optimization occured.

[RISCV] Add late optimization pass for RISC-V to optimize branch instructions #131728

[RISCV] Add late optimization pass for RISC-V to optimize branch instructions #131728

Conversation

michaelmaitland commented Mar 18, 2025 • edited Loading

llvmbot commented Mar 18, 2025 • edited Loading

wangpc-pp left a comment

Choose a reason for hiding this comment

mikhailramalho commented Mar 18, 2025

michaelmaitland commented Mar 18, 2025

topperc Mar 19, 2025

Choose a reason for hiding this comment

mikhailramalho Mar 19, 2025

Choose a reason for hiding this comment

lenary Mar 19, 2025

Choose a reason for hiding this comment

michaelmaitland Mar 19, 2025

Choose a reason for hiding this comment

topperc Mar 19, 2025

Choose a reason for hiding this comment

mikhailramalho commented Mar 20, 2025

michaelmaitland commented Mar 20, 2025

topperc commented Mar 20, 2025

mikhailramalho commented Mar 20, 2025

michaelmaitland commented Mar 18, 2025 •

edited

Loading

llvmbot commented Mar 18, 2025 •

edited

Loading