Skip to content

[RISCV] Add late optimization pass for RISC-V to optimize branch instructions #131728

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

michaelmaitland
Copy link
Contributor

@michaelmaitland michaelmaitland commented Mar 18, 2025

This is an alternative to #117060, and is stacked on #131684. Marking
@mikhailramalho as co-author here because I got the idea of a late peephole pass
and the test case from #117060.

We use a late pass because we introduce the optimizable branches so late in the
pipeline.

Co-authored-by: Mikhail R. Gadelha mikhail@igalia.com

There are two changes here.

The first that we teach analyzeBranch how to
evaluate a conditional branch followed by a unconditional branch such that
we can evaluate the conditional branch statically. Often, we will see comparison
to one or zero since SelectionDAG often uses i1 for the conditional comparison.
As a result, we handle this specific case. We handle only EQ and NEQ for now,
but this can be expanded in the future. We can also expand on handling arbitrary
constants in the future.

The second change is that we pass AllowModify=false to analyzeBranch in the
tail merging code. The reason we do this is because this code is doing some
clever tricks to the branch code that it will restore later. Now that we are
actually optimizing branches in analyzeBranch, we have to be careful not to
mess up this canonical form that the tail merging code expects.
@llvmbot
Copy link
Member

llvmbot commented Mar 18, 2025

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-risc-v

Author: Michael Maitland (michaelmaitland)

Changes

This is an alternative to #117060, and is stacked on #131684. Marking
@mikhailramalho as co-author here because I got the idea of a late peephole pass
and the test case from #117060.

I use a late pass because we introduce the optimizable branches so late in the
pipeline.

Co-authored-by: Mikhail R. Gadelha <mikhail@igalia.com>


Patch is 32.48 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/131728.diff

14 Files Affected:

  • (modified) llvm/lib/CodeGen/BranchFolding.cpp (+4-3)
  • (modified) llvm/lib/Target/RISCV/CMakeLists.txt (+1)
  • (modified) llvm/lib/Target/RISCV/RISCV.h (+3)
  • (modified) llvm/lib/Target/RISCV/RISCVInstrInfo.cpp (+110)
  • (modified) llvm/lib/Target/RISCV/RISCVInstrInfo.h (+20)
  • (added) llvm/lib/Target/RISCV/RISCVLatePeephole.cpp (+87)
  • (modified) llvm/lib/Target/RISCV/RISCVTargetMachine.cpp (+2)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/rv32zbb.ll (-20)
  • (modified) llvm/test/CodeGen/RISCV/O0-pipeline.ll (+1)
  • (modified) llvm/test/CodeGen/RISCV/O3-pipeline.ll (+1)
  • (modified) llvm/test/CodeGen/RISCV/branch_zero.ll (+6-16)
  • (modified) llvm/test/CodeGen/RISCV/push-pop-opt-crash.ll (+31-31)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vxrm-insert-out-of-loop.ll (+29-37)
  • (added) llvm/test/CodeGen/RISCV/simplify-condbr.ll (+179)
diff --git a/llvm/lib/CodeGen/BranchFolding.cpp b/llvm/lib/CodeGen/BranchFolding.cpp
index 5218e39b88222..b4e0b4bf4585e 100644
--- a/llvm/lib/CodeGen/BranchFolding.cpp
+++ b/llvm/lib/CodeGen/BranchFolding.cpp
@@ -467,7 +467,7 @@ static void FixTail(MachineBasicBlock *CurMBB, MachineBasicBlock *SuccBB,
   DebugLoc dl = CurMBB->findBranchDebugLoc();
   if (!dl)
     dl = BranchDL;
-  if (I != MF->end() && !TII->analyzeBranch(*CurMBB, TBB, FBB, Cond, true)) {
+  if (I != MF->end() && !TII->analyzeBranch(*CurMBB, TBB, FBB, Cond)) {
     MachineBasicBlock *NextBB = &*I;
     if (TBB == NextBB && !Cond.empty() && !FBB) {
       if (!TII->reverseBranchCondition(Cond)) {
@@ -1107,7 +1107,7 @@ bool BranchFolder::TailMergeBlocks(MachineFunction &MF) {
 
       MachineBasicBlock *TBB = nullptr, *FBB = nullptr;
       SmallVector<MachineOperand, 4> Cond;
-      if (!TII->analyzeBranch(*PBB, TBB, FBB, Cond, true)) {
+      if (!TII->analyzeBranch(*PBB, TBB, FBB, Cond)) {
         // Failing case: IBB is the target of a cbr, and we cannot reverse the
         // branch.
         SmallVector<MachineOperand, 4> NewCond(Cond);
@@ -1564,7 +1564,8 @@ bool BranchFolder::OptimizeBlock(MachineBasicBlock *MBB) {
     //    Loop: xxx; jcc Out; jmp Loop
     // we want:
     //    Loop: xxx; jncc Loop; jmp Out
-    if (CurTBB && CurFBB && CurFBB == MBB && CurTBB != MBB) {
+    if (CurTBB && CurFBB && CurFBB == MBB && CurTBB != MBB &&
+        !CurCond.empty()) {
       SmallVector<MachineOperand, 4> NewCond(CurCond);
       if (!TII->reverseBranchCondition(NewCond)) {
         DebugLoc Dl = MBB->findBranchDebugLoc();
diff --git a/llvm/lib/Target/RISCV/CMakeLists.txt b/llvm/lib/Target/RISCV/CMakeLists.txt
index e8d00f4df7c86..3fff898411d7a 100644
--- a/llvm/lib/Target/RISCV/CMakeLists.txt
+++ b/llvm/lib/Target/RISCV/CMakeLists.txt
@@ -35,6 +35,7 @@ add_llvm_target(RISCVCodeGen
   RISCVConstantPoolValue.cpp
   RISCVDeadRegisterDefinitions.cpp
   RISCVMakeCompressible.cpp
+  RISCVLatePeephole.cpp
   RISCVExpandAtomicPseudoInsts.cpp
   RISCVExpandPseudoInsts.cpp
   RISCVFoldMemOffset.cpp
diff --git a/llvm/lib/Target/RISCV/RISCV.h b/llvm/lib/Target/RISCV/RISCV.h
index 641e2eb4094f9..cd6045355a9ef 100644
--- a/llvm/lib/Target/RISCV/RISCV.h
+++ b/llvm/lib/Target/RISCV/RISCV.h
@@ -43,6 +43,9 @@ FunctionPass *createRISCVISelDag(RISCVTargetMachine &TM,
 FunctionPass *createRISCVMakeCompressibleOptPass();
 void initializeRISCVMakeCompressibleOptPass(PassRegistry &);
 
+FunctionPass *createRISCVLatePeepholeOptPass();
+void initializeRISCVLatePeepholeOptPass(PassRegistry &);
+
 FunctionPass *createRISCVGatherScatterLoweringPass();
 void initializeRISCVGatherScatterLoweringPass(PassRegistry &);
 
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
index 2fdf6bd36e88f..e0d364c4a1306 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
@@ -1005,6 +1005,109 @@ RISCVCC::CondCode RISCVCC::getOppositeBranchCondition(RISCVCC::CondCode CC) {
   }
 }
 
+// Return true if MO definitely contains the value one.
+static bool isOne(MachineOperand &MO) {
+  if (MO.isImm() && MO.getImm() == 1)
+    return true;
+
+  if (!MO.isReg() || !MO.getReg().isVirtual())
+    return false;
+
+  MachineRegisterInfo &MRI =
+      MO.getParent()->getParent()->getParent()->getRegInfo();
+  MachineInstr *DefMI = MRI.getUniqueVRegDef(MO.getReg());
+  if (!DefMI)
+    return false;
+
+  // For now, just check the canonical one value.
+  if (DefMI->getOpcode() == RISCV::ADDI &&
+      DefMI->getOperand(1).getReg() == RISCV::X0 &&
+      DefMI->getOperand(2).getImm() == 1)
+    return true;
+
+  return false;
+}
+
+// Return true if MO definitely contains the value zero.
+static bool isZero(MachineOperand &MO) {
+  if (MO.isImm() && MO.getImm() == 0)
+    return true;
+  if (MO.isReg() && MO.getReg() == RISCV::X0)
+    return true;
+  return false;
+}
+
+bool RISCVInstrInfo::trySimplifyCondBr(
+    MachineBasicBlock &MBB, MachineBasicBlock *TBB, MachineBasicBlock *FBB,
+    SmallVectorImpl<MachineOperand> &Cond) const {
+
+  if (!TBB || Cond.size() != 3)
+    return false;
+
+  RISCVCC::CondCode CC = static_cast<RISCVCC::CondCode>(Cond[0].getImm());
+  auto LHS = Cond[1];
+  auto RHS = Cond[2];
+
+  MachineBasicBlock *Folded = nullptr;
+  switch (CC) {
+  default:
+    // TODO: Implement for more CCs
+    return false;
+  case RISCVCC::COND_EQ: {
+    // We can statically evaluate that we take the first branch
+    if ((isZero(LHS) && isZero(RHS)) || (isOne(LHS) && isOne(RHS))) {
+      Folded = TBB;
+      break;
+    }
+    // We can statically evaluate that we take the second branch
+    if ((isZero(LHS) && isOne(RHS)) || (isOne(LHS) && isZero(RHS))) {
+      Folded = FBB;
+      break;
+    }
+    return false;
+  }
+  case RISCVCC::COND_NE: {
+    // We can statically evaluate that we take the first branch
+    if ((isOne(LHS) && isZero(RHS)) || (isZero(LHS) && isOne(RHS))) {
+      Folded = TBB;
+      break;
+    }
+    // We can statically evaluate that we take the second branch
+    if ((isZero(LHS) && isZero(RHS)) || (isOne(LHS) && isOne(RHS))) {
+      Folded = FBB;
+      break;
+    }
+    return false;
+  }
+  }
+
+  // At this point, its legal to optimize.
+  removeBranch(MBB);
+  Cond.clear();
+
+  // Only need to insert a branch if we're not falling through.
+  if (Folded) {
+    DebugLoc DL = MBB.findBranchDebugLoc();
+    insertBranch(MBB, Folded, nullptr, {}, DL);
+  }
+
+  // Update the successors. Remove them all and add back the correct one.
+  while (!MBB.succ_empty())
+    MBB.removeSuccessor(MBB.succ_end() - 1);
+
+  // If it's a fallthrough, we need to figure out where MBB is going.
+  if (!Folded) {
+    MachineFunction::iterator Fallthrough = ++MBB.getIterator();
+    if (Fallthrough != MBB.getParent()->end())
+      MBB.addSuccessor(&*Fallthrough);
+  } else
+    MBB.addSuccessor(Folded);
+
+  TBB = Folded;
+  FBB = nullptr;
+  return true;
+}
+
 bool RISCVInstrInfo::analyzeBranch(MachineBasicBlock &MBB,
                                    MachineBasicBlock *&TBB,
                                    MachineBasicBlock *&FBB,
@@ -1062,6 +1165,9 @@ bool RISCVInstrInfo::analyzeBranch(MachineBasicBlock &MBB,
   // Handle a single conditional branch.
   if (NumTerminators == 1 && I->getDesc().isConditionalBranch()) {
     parseCondBranch(*I, TBB, Cond);
+    // Try to fold the branch of the conditional branch into a the fallthru.
+    if (AllowModify)
+      trySimplifyCondBr(MBB, TBB, FBB, Cond);
     return false;
   }
 
@@ -1070,6 +1176,10 @@ bool RISCVInstrInfo::analyzeBranch(MachineBasicBlock &MBB,
       I->getDesc().isUnconditionalBranch()) {
     parseCondBranch(*std::prev(I), TBB, Cond);
     FBB = getBranchDestBlock(*I);
+    // Try to fold the branch of the conditional branch into an unconditional
+    // branch.
+    if (AllowModify)
+      trySimplifyCondBr(MBB, TBB, FBB, Cond);
     return false;
   }
 
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.h b/llvm/lib/Target/RISCV/RISCVInstrInfo.h
index 656cb38e11297..d00b6f57d10e0 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.h
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.h
@@ -306,6 +306,26 @@ class RISCVInstrInfo : public RISCVGenInstrInfo {
 
   static bool isLdStSafeToPair(const MachineInstr &LdSt,
                                const TargetRegisterInfo *TRI);
+  /// Return true if the branch represented by the conditional branch with
+  /// components TBB, FBB, and CurCond was folded into an unconditional branch.
+  ///
+  /// If FBB is nullptr, then the the input represents a conditional branch with
+  /// a fallthrough.
+  ///
+  /// For example:
+  /// BRCOND EQ 0, 0, BB1
+  /// BR BB2
+  ///
+  /// can be simplified to BR BB1 since 0 == 0 statically. On the other hand,
+  ///
+  ///
+  /// BRCOND EQ 0, 1, BB1
+  /// BR BB2
+  ///
+  /// can be simplified to BR BB2 because 0 != 1 statically.
+  bool trySimplifyCondBr(MachineBasicBlock &MBB, MachineBasicBlock *TBB,
+                         MachineBasicBlock *FBB,
+                         SmallVectorImpl<MachineOperand> &Cond) const;
 
 protected:
   const RISCVSubtarget &STI;
diff --git a/llvm/lib/Target/RISCV/RISCVLatePeephole.cpp b/llvm/lib/Target/RISCV/RISCVLatePeephole.cpp
new file mode 100644
index 0000000000000..5ecee5069d3cd
--- /dev/null
+++ b/llvm/lib/Target/RISCV/RISCVLatePeephole.cpp
@@ -0,0 +1,87 @@
+//===-- RISCVLatePeephole.cpp - Late stage peephole optimization ----------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// This file provides RISC-V late peephole optimizations
+///
+//===----------------------------------------------------------------------===//
+
+#include "MCTargetDesc/RISCVMCTargetDesc.h"
+#include "RISCV.h"
+#include "RISCVInstrInfo.h"
+#include "RISCVSubtarget.h"
+#include "llvm/CodeGen/MachineBasicBlock.h"
+#include "llvm/CodeGen/MachineBranchProbabilityInfo.h"
+#include "llvm/CodeGen/MachineDominators.h"
+#include "llvm/CodeGen/MachineInstrBuilder.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/CodeGen/RegisterScavenging.h"
+#include "llvm/MC/TargetRegistry.h"
+#include "llvm/Support/Debug.h"
+
+using namespace llvm;
+
+#define DEBUG_TYPE "riscv-late-peephole"
+#define RISCV_LATE_PEEPHOLE_NAME "RISC-V Late Stage Peephole"
+
+namespace {
+
+struct RISCVLatePeepholeOpt : public MachineFunctionPass {
+  static char ID;
+
+  RISCVLatePeepholeOpt() : MachineFunctionPass(ID) {}
+
+  StringRef getPassName() const override { return RISCV_LATE_PEEPHOLE_NAME; }
+
+  void getAnalysisUsage(AnalysisUsage &AU) const override {
+    AU.addRequired<MachineDominatorTreeWrapperPass>();
+    AU.addPreserved<MachineDominatorTreeWrapperPass>();
+    MachineFunctionPass::getAnalysisUsage(AU);
+  }
+
+  bool runOnMachineFunction(MachineFunction &Fn) override;
+
+private:
+  bool optimizeBlock(MachineBasicBlock &MBB);
+
+  const RISCVInstrInfo *TII = nullptr;
+};
+} // namespace
+
+char RISCVLatePeepholeOpt::ID = 0;
+INITIALIZE_PASS(RISCVLatePeepholeOpt, "riscv-late-peephole",
+                RISCV_LATE_PEEPHOLE_NAME, false, false)
+
+bool RISCVLatePeepholeOpt::optimizeBlock(MachineBasicBlock &MBB) {
+
+  // Use trySimplifyCondBr directly to know whether the optimization occured.
+  MachineBasicBlock *TBB, *FBB;
+  SmallVector<MachineOperand, 4> Cond;
+  if (!TII->analyzeBranch(MBB, TBB, FBB, Cond, false))
+    return TII->trySimplifyCondBr(MBB, TBB, FBB, Cond);
+
+  return false;
+}
+
+bool RISCVLatePeepholeOpt::runOnMachineFunction(MachineFunction &MF) {
+  if (skipFunction(MF.getFunction()))
+    return false;
+
+  TII = MF.getSubtarget<RISCVSubtarget>().getInstrInfo();
+
+  bool MadeChange = false;
+
+  for (MachineBasicBlock &MBB : MF)
+    MadeChange |= optimizeBlock(MBB);
+
+  return MadeChange;
+}
+
+/// Returns an instance of the Make Compressible Optimization pass.
+FunctionPass *llvm::createRISCVLatePeepholeOptPass() {
+  return new RISCVLatePeepholeOpt();
+}
diff --git a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
index f78e5f8147d98..a283bd02bf8fa 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
@@ -128,6 +128,7 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeRISCVTarget() {
   initializeKCFIPass(*PR);
   initializeRISCVDeadRegisterDefinitionsPass(*PR);
   initializeRISCVMakeCompressibleOptPass(*PR);
+  initializeRISCVLatePeepholeOptPass(*PR);
   initializeRISCVGatherScatterLoweringPass(*PR);
   initializeRISCVCodeGenPreparePass(*PR);
   initializeRISCVPostRAExpandPseudoPass(*PR);
@@ -567,6 +568,7 @@ void RISCVPassConfig::addPreEmitPass() {
     addPass(createMachineCopyPropagationPass(true));
   addPass(&BranchRelaxationPassID);
   addPass(createRISCVMakeCompressibleOptPass());
+  addPass(createRISCVLatePeepholeOptPass());
 }
 
 void RISCVPassConfig::addPreEmitPass2() {
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/rv32zbb.ll b/llvm/test/CodeGen/RISCV/GlobalISel/rv32zbb.ll
index 338925059862c..74ec7308cb646 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/rv32zbb.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/rv32zbb.ll
@@ -357,11 +357,6 @@ define i64 @ctpop_i64(i64 %a) nounwind {
 define i1 @ctpop_i64_ugt_two(i64 %a) nounwind {
 ; RV32I-LABEL: ctpop_i64_ugt_two:
 ; RV32I:       # %bb.0:
-; RV32I-NEXT:    beqz zero, .LBB6_2
-; RV32I-NEXT:  # %bb.1:
-; RV32I-NEXT:    sltiu a0, zero, 0
-; RV32I-NEXT:    ret
-; RV32I-NEXT:  .LBB6_2:
 ; RV32I-NEXT:    srli a2, a0, 1
 ; RV32I-NEXT:    lui a3, 349525
 ; RV32I-NEXT:    lui a4, 209715
@@ -404,11 +399,6 @@ define i1 @ctpop_i64_ugt_two(i64 %a) nounwind {
 ;
 ; RV32ZBB-LABEL: ctpop_i64_ugt_two:
 ; RV32ZBB:       # %bb.0:
-; RV32ZBB-NEXT:    beqz zero, .LBB6_2
-; RV32ZBB-NEXT:  # %bb.1:
-; RV32ZBB-NEXT:    sltiu a0, zero, 0
-; RV32ZBB-NEXT:    ret
-; RV32ZBB-NEXT:  .LBB6_2:
 ; RV32ZBB-NEXT:    cpop a0, a0
 ; RV32ZBB-NEXT:    cpop a1, a1
 ; RV32ZBB-NEXT:    add a0, a1, a0
@@ -422,11 +412,6 @@ define i1 @ctpop_i64_ugt_two(i64 %a) nounwind {
 define i1 @ctpop_i64_ugt_one(i64 %a) nounwind {
 ; RV32I-LABEL: ctpop_i64_ugt_one:
 ; RV32I:       # %bb.0:
-; RV32I-NEXT:    beqz zero, .LBB7_2
-; RV32I-NEXT:  # %bb.1:
-; RV32I-NEXT:    snez a0, zero
-; RV32I-NEXT:    ret
-; RV32I-NEXT:  .LBB7_2:
 ; RV32I-NEXT:    srli a2, a0, 1
 ; RV32I-NEXT:    lui a3, 349525
 ; RV32I-NEXT:    lui a4, 209715
@@ -470,11 +455,6 @@ define i1 @ctpop_i64_ugt_one(i64 %a) nounwind {
 ;
 ; RV32ZBB-LABEL: ctpop_i64_ugt_one:
 ; RV32ZBB:       # %bb.0:
-; RV32ZBB-NEXT:    beqz zero, .LBB7_2
-; RV32ZBB-NEXT:  # %bb.1:
-; RV32ZBB-NEXT:    snez a0, zero
-; RV32ZBB-NEXT:    ret
-; RV32ZBB-NEXT:  .LBB7_2:
 ; RV32ZBB-NEXT:    cpop a0, a0
 ; RV32ZBB-NEXT:    cpop a1, a1
 ; RV32ZBB-NEXT:    add a0, a1, a0
diff --git a/llvm/test/CodeGen/RISCV/O0-pipeline.ll b/llvm/test/CodeGen/RISCV/O0-pipeline.ll
index f93cb65897210..29ec19b7e35a7 100644
--- a/llvm/test/CodeGen/RISCV/O0-pipeline.ll
+++ b/llvm/test/CodeGen/RISCV/O0-pipeline.ll
@@ -64,6 +64,7 @@
 ; CHECK-NEXT:       Implement the 'patchable-function' attribute
 ; CHECK-NEXT:       Branch relaxation pass
 ; CHECK-NEXT:       RISC-V Make Compressible
+; CHECK-NEXT:       RISC-V Late Stage Peephole
 ; CHECK-NEXT:       Contiguously Lay Out Funclets
 ; CHECK-NEXT:       Remove Loads Into Fake Uses
 ; CHECK-NEXT:       StackMap Liveness Analysis
diff --git a/llvm/test/CodeGen/RISCV/O3-pipeline.ll b/llvm/test/CodeGen/RISCV/O3-pipeline.ll
index 976d1ee003a1f..b3698caf7f0f6 100644
--- a/llvm/test/CodeGen/RISCV/O3-pipeline.ll
+++ b/llvm/test/CodeGen/RISCV/O3-pipeline.ll
@@ -196,6 +196,7 @@
 ; CHECK-NEXT:       Machine Copy Propagation Pass
 ; CHECK-NEXT:       Branch relaxation pass
 ; CHECK-NEXT:       RISC-V Make Compressible
+; CHECK-NEXT:       RISC-V Late Stage Peephole
 ; CHECK-NEXT:       Contiguously Lay Out Funclets
 ; CHECK-NEXT:       Remove Loads Into Fake Uses
 ; CHECK-NEXT:       StackMap Liveness Analysis
diff --git a/llvm/test/CodeGen/RISCV/branch_zero.ll b/llvm/test/CodeGen/RISCV/branch_zero.ll
index fd0979977ba3b..2c13c28647516 100644
--- a/llvm/test/CodeGen/RISCV/branch_zero.ll
+++ b/llvm/test/CodeGen/RISCV/branch_zero.ll
@@ -5,16 +5,11 @@
 define void @foo(i16 %finder_idx) {
 ; CHECK-LABEL: foo:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:  .LBB0_1: # %for.body
-; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:  # %bb.1: # %for.body
 ; CHECK-NEXT:    slli a0, a0, 48
-; CHECK-NEXT:    bltz a0, .LBB0_4
+; CHECK-NEXT:    bltz a0, .LBB0_3
 ; CHECK-NEXT:  # %bb.2: # %while.cond.preheader.i
-; CHECK-NEXT:    # in Loop: Header=BB0_1 Depth=1
-; CHECK-NEXT:    li a0, 0
-; CHECK-NEXT:    bnez zero, .LBB0_1
-; CHECK-NEXT:  # %bb.3: # %while.body
-; CHECK-NEXT:  .LBB0_4: # %while.cond1.preheader.i
+; CHECK-NEXT:  .LBB0_3: # %while.cond1.preheader.i
 entry:
   br label %for.body
 
@@ -46,16 +41,11 @@ if.then:
 define void @bar(i16 %finder_idx) {
 ; CHECK-LABEL: bar:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:  .LBB1_1: # %for.body
-; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:  # %bb.1: # %for.body
 ; CHECK-NEXT:    slli a0, a0, 48
-; CHECK-NEXT:    bgez a0, .LBB1_4
+; CHECK-NEXT:    bgez a0, .LBB1_3
 ; CHECK-NEXT:  # %bb.2: # %while.cond.preheader.i
-; CHECK-NEXT:    # in Loop: Header=BB1_1 Depth=1
-; CHECK-NEXT:    li a0, 0
-; CHECK-NEXT:    bnez zero, .LBB1_1
-; CHECK-NEXT:  # %bb.3: # %while.body
-; CHECK-NEXT:  .LBB1_4: # %while.cond1.preheader.i
+; CHECK-NEXT:  .LBB1_3: # %while.cond1.preheader.i
 entry:
   br label %for.body
 
diff --git a/llvm/test/CodeGen/RISCV/push-pop-opt-crash.ll b/llvm/test/CodeGen/RISCV/push-pop-opt-crash.ll
index 1e72529b17f59..00689c3136517 100644
--- a/llvm/test/CodeGen/RISCV/push-pop-opt-crash.ll
+++ b/llvm/test/CodeGen/RISCV/push-pop-opt-crash.ll
@@ -1,3 +1,4 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
 ; RUN: llc -mattr=+zcmp -verify-machineinstrs  \
 ; RUN: -mtriple=riscv32 -target-abi=ilp32 < %s \
 ; RUN: | FileCheck %s -check-prefixes=RV32IZCMP
@@ -11,40 +12,39 @@
 
 declare dso_local void @f1() local_unnamed_addr
 declare dso_local void @f2() local_unnamed_addr
-define  dso_local void @f0() local_unnamed_addr {
+define  dso_local void @f0(i1 %c) local_unnamed_addr {
 ; RV32IZCMP-LABEL: f0:
-; RV32IZCMP: 	.cfi_startproc
-; RV32IZCMP-NEXT: # %bb.0:                                # %entry
-; RV32IZCMP-NEXT: 	bnez	zero, .LBB0_2
-; RV32IZCMP-NEXT: # %bb.1:                                # %if.T
-; RV32IZCMP-NEXT: 	cm.push	{ra}, -16
-; RV32IZCMP-NEXT: 	.cfi_def_cfa_offset 16
-; RV32IZCMP-NEXT: 	.cfi_offset ra, -4
-; RV32IZCMP-NEXT: 	call	f1
-; RV32IZCMP-NEXT: 	cm.pop	{ra}, 16
-; RV32IZCMP-NEXT:     .cfi_restore ra
-; RV32IZCMP-NEXT:     .cfi_def_cfa_offset 0
-; RV32IZCMP-NEXT: .LBB0_2:                                # %if.F
-; RV32IZCMP-NEXT: 	tail	f2
-; RV32IZCMP-NEXT: .Lfunc_end0:
-
+; RV32IZCMP:       # %bb.0: # %entry
+; RV32IZCMP-NEXT:    andi a0, a0, 1
+; RV32IZCMP-NEXT:    beqz a0, .LBB0_2
+; RV32IZCMP-NEXT:  # %bb.1: # %if.T
+; RV32IZCMP-NEXT:    cm.push {ra}, -16
+; RV32IZCMP-NEXT:    .cfi_def_cfa_offset 16
+; RV32IZCMP-NEXT:    .cfi_offset ra, -4
+; RV32IZCMP-NEXT:    call f1
+; RV32IZCMP-NEXT:    cm.pop {ra}, 16
+; RV32IZCMP-NEXT:    .cfi_restore ra
+; RV32IZCMP-NEXT:    .cfi_def_cfa_offset 0
+; RV32IZCMP-NEXT:  .LBB0_2: # %if.F
+; RV32IZCMP-NEXT:    tail f2
+;
 ; RV64IZCMP-LABEL: f0:
-; RV64IZCMP: 	.cfi_startproc
-; RV64IZCMP-NEXT: # %bb.0:                                # %entry
-; RV64IZCMP-NEXT: 	bnez	zero, .LBB0_2
-; RV64IZCMP-NEXT: # %bb.1:                                # %if.T
-; RV64IZCMP-NEXT: 	cm.push	{ra}, -16
-; RV64IZCMP-NEXT: 	.cfi_def_cfa_offset 16
-; RV64IZCMP-NEXT: 	.cfi_offset ra, -8
-; RV64IZCMP-NEXT: 	call	f1
-; RV64IZCMP-NEXT: 	cm.pop	{ra}, 16
-; RV64IZCMP-NEXT:     .cfi_restore ra
-; RV64IZCMP-NEXT:     .cfi_def_cfa_offset 0
-; RV64IZCMP-NEXT: .LBB0_2:                                # %if.F
-; RV64IZCMP-NEXT: 	tail	f2
-; RV64IZCMP-NEXT: .Lfunc_end0:
+; RV64IZCMP:       # %bb.0: # %entry
+; RV64IZCMP-NEXT:    andi a0, a0, 1
+; RV64IZCMP-NEXT:    beqz a0, .LBB0_2
+; RV64IZCMP-NEXT:  # %bb.1: # %if.T
+; RV64IZCMP-NEXT:    cm.push {ra}, -16
+; RV64IZCMP-NEXT:    .cfi_def_cfa_offset 16
+; RV64IZCMP-NEXT:    .cfi_offset ra, -8
+; RV64IZCMP-NEXT:    call f1
+; RV64IZCMP-NEXT:    cm.pop {ra}, 16
+; RV64IZCMP-NEXT:    .cfi_restore ra
+; RV64IZCMP-NEXT:    .cfi_def_cfa_offset 0
+; RV64IZCMP-NEXT:  .LBB0_2: # %if.F
+; RV64IZCMP-NEXT:    tail f2
+
 entry:
-  br i1 poison, label %if.T, label %if.F
+  br i1 %c, label %if.T, label %if.F
 
 if.T:
   tail call void @f1()
diff --git a/llvm/test/CodeGen/RISCV/rvv/vxrm-insert-out-of-loop.ll b/llvm/test/CodeGen/RISCV/rvv/vxrm-insert-out-of-loop.ll
index c35f05be304cc..5251074717c93 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vxrm-insert-out-of-loop.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vxrm-insert-out-of-loop.ll
@@ -14,9 +14,9 @@
 define void @test1(ptr nocapture noundef writeonly %dst, i32 noundef signext %i_dst_stride, ptr nocapture noundef readonly %src1, i32 noundef signext %i_src1_stride, ptr nocapture noundef readonly %src2, i32 noundef signext %i_src2_stride, i32 noundef sig...
[truncated]

…ructions

This is an alternative to llvm#117060, and is stacked on llvm#131684. Marking
@mikhailramalho as Co-Author here because I got the idea of a late peephole pass
and the test case from llvm#117060.

I use a late pass because we introduce the optimizable branches so late in the
pipeline.

Co-authored-by: Mikhail R. Gadelha <mikhail@igalia.com>
@michaelmaitland michaelmaitland force-pushed the analyze-branch-opt-more branch from 35e2715 to e064268 Compare March 18, 2025 04:52
Copy link
Contributor

@wangpc-pp wangpc-pp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide some performance or IC data? It seems the gain is not that large.

@mikhailramalho
Copy link
Member

Can you provide some performance or IC data? It seems the gain is not that large.

I'll post the results here tomorrow morning

@michaelmaitland
Copy link
Contributor Author

Can you provide some performance or IC data? It seems the gain is not that large.

I'll post the results here tomorrow morning

Thanks @mikhailramalho. I really appreciate the assistance!

@@ -567,6 +568,7 @@ void RISCVPassConfig::addPreEmitPass() {
addPass(createMachineCopyPropagationPass(true));
addPass(&BranchRelaxationPassID);
addPass(createRISCVMakeCompressibleOptPass());
addPass(createRISCVLatePeepholeOptPass());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we should mess with branches after BranchRelaxationPass

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm checking my notes here and the pattern that generates the branch, i.e.:

bb.2:
  renamable $x10 = XORI killed renamable $x10, -1
  renamable $x10 = BEXTI killed renamable $x10, 13
  BNE killed renamable $x10, $x0, %bb.5

Seems to be formed by MachineBlockPlacementPass, so maybe we can move this change to the beginning of addPreEmitPass and still be able to optimize it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put it immediately before branch relaxation, unless you're also sure that machine copy propagation won't also introduce these kinds of branches.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lenary, MCP is definitely introducing these kinds of branches.


bool RISCVLatePeepholeOpt::optimizeBlock(MachineBasicBlock &MBB) {

// Use trySimplifyCondBr directly to know whether the optimization occured.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

occurred*

@mikhailramalho
Copy link
Member

Sorry for the delay, I was seeing some strange numbers, so I ran everything twice: https://lnt.lukelau.me/db_default/v4/nts/338?show_delta=yes&show_previous=yes&show_stddev=yes&show_mad=yes&show_all=yes&show_all_samples=yes&show_sample_counts=yes&show_small_diff=yes&num_comparison_runs=0&test_filter=&test_min_value_filter=&aggregation_fn=min&MW_confidence_lv=0.05&compare_to=337&submit=Update

  • I wouldn't trust the result of gcc_r, there is a high variance there
  • The regression on xalancbmk_r is strange, I'll have to take a closer look at it

@michaelmaitland
Copy link
Contributor Author

Sorry for the delay, I was seeing some strange numbers, so I ran everything twice: https://lnt.lukelau.me/db_default/v4/nts/338?show_delta=yes&show_previous=yes&show_stddev=yes&show_mad=yes&show_all=yes&show_all_samples=yes&show_sample_counts=yes&show_small_diff=yes&num_comparison_runs=0&test_filter=&test_min_value_filter=&aggregation_fn=min&MW_confidence_lv=0.05&compare_to=337&submit=Update

  • I wouldn't trust the result of gcc_r, there is a high variance there
  • The regression on xalancbmk_r is strange, I'll have to take a closer look at it

@mikhailramalho so what I'm hearing is that we should count this as a pretty nice gain in xz? I know xalancbmk usually has high variance too, but this looks quite large.

@topperc
Copy link
Collaborator

topperc commented Mar 20, 2025

Sorry for the delay, I was seeing some strange numbers, so I ran everything twice: https://lnt.lukelau.me/db_default/v4/nts/338?show_delta=yes&show_previous=yes&show_stddev=yes&show_mad=yes&show_all=yes&show_all_samples=yes&show_sample_counts=yes&show_small_diff=yes&num_comparison_runs=0&test_filter=&test_min_value_filter=&aggregation_fn=min&MW_confidence_lv=0.05&compare_to=337&submit=Update

  • I wouldn't trust the result of gcc_r, there is a high variance there
  • The regression on xalancbmk_r is strange, I'll have to take a closer look at it

@mikhailramalho so what I'm hearing is that we should count this as a pretty nice gain in xz? I know xalancbmk usually has high variance too, but this looks quite large.

Does #117060 also improve xz?

@mikhailramalho
Copy link
Member

@mikhailramalho so what I'm hearing is that we should count this as a pretty nice gain in xz? I know xalancbmk usually has high variance too, but this looks quite large.

I'd say we can trust the results of xz and parest, they were consistent across the two full runs I did.

Does #117060 also improve xz?

I didn't test it back then, so I don't know :/

mikhailramalho added a commit to mikhailramalho/llvm-project that referenced this pull request Mar 26, 2025
This patch is an alternative to PRs llvm#117060, llvm#131684, llvm#131728.

The patch adds a late optimization pass that replaces conditional
branches that can be statically evaluated with an unconditinal branch.

Adding michael as a co-author as most of code that evaluates the
condition comes from llvm#131684.
mikhailramalho added a commit that referenced this pull request Mar 27, 2025
This patch is an alternative to PRs #117060, #131684, #131728.

The patch adds a late optimization pass that replaces conditional
branches that can be statically evaluated with an unconditinal branch.

Adding Michael as a co-author as most of the code that evaluates the
condition comes from #131684.

Co-authored-by: Michael Maitland michaeltmaitland@gmail.com
mikhailramalho added a commit to mikhailramalho/llvm-project that referenced this pull request Apr 7, 2025
This is a follow-up patch to PR llvm#133256.

This patch adds the branch folding pass after the newly added late
optimization pass for riscv, which reduces code size in all SPEC
benchmarks (except libm).

The improvements are: 500.perlbench_r (-3.37%), 544.nab_r (-3.06%),
557.xz_r (-2.82%), 523.xalancbmk_r (-2.64%), 520.omnetpp_r (-2.34%),
531.deepsjeng_r (-2.27%), 502.gcc_r (-2.19%), 526.blender_r (-2.11%),
538.imagick_r (-2.03%), 505.mcf_r (-1.82%), 541.leela_r (-1.74%),
511.povray_r (-1.62%), 510.parest_r (-1.62%), 508.namd_r (-1.57%),
525.x264_r (-1.47%).

Geo mean is -2.07%.

Some caveats:
* On llvm#131728 I mentioned a 7% improvement on execution time of xz, but
  that's no longer the case. I went back and also tried to reproduce the
  result with the code from llvm#131728 and couldn't. Now the results from
  that PR and this one are the same: an overall code size reduction but
  no exec time improvements.
* The root cause of the large number is not yet clear for me. I'm still
  investigating it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants