Skip to content

[RISCV] Add branch folding before branch relaxation #134760

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

mikhailramalho
Copy link
Member

This is a follow-up patch to PR #133256.

This patch adds the branch folding pass after the newly added late optimization pass for riscv, which reduces code size in all SPEC benchmarks except for libm.

The improvements are: 500.perlbench_r (-3.37%), 544.nab_r (-3.06%), 557.xz_r (-2.82%), 523.xalancbmk_r (-2.64%), 520.omnetpp_r (-2.34%), 531.deepsjeng_r (-2.27%), 502.gcc_r (-2.19%), 526.blender_r (-2.11%), 538.imagick_r (-2.03%), 505.mcf_r (-1.82%), 541.leela_r (-1.74%), 511.povray_r (-1.62%), 510.parest_r (-1.62%), 508.namd_r (-1.57%), 525.x264_r (-1.47%).

The geo mean is -2.07%.

Some caveats:

This is a follow-up patch to PR llvm#133256.

This patch adds the branch folding pass after the newly added late
optimization pass for riscv, which reduces code size in all SPEC
benchmarks (except libm).

The improvements are: 500.perlbench_r (-3.37%), 544.nab_r (-3.06%),
557.xz_r (-2.82%), 523.xalancbmk_r (-2.64%), 520.omnetpp_r (-2.34%),
531.deepsjeng_r (-2.27%), 502.gcc_r (-2.19%), 526.blender_r (-2.11%),
538.imagick_r (-2.03%), 505.mcf_r (-1.82%), 541.leela_r (-1.74%),
511.povray_r (-1.62%), 510.parest_r (-1.62%), 508.namd_r (-1.57%),
525.x264_r (-1.47%).

Geo mean is -2.07%.

Some caveats:
* On llvm#131728 I mentioned a 7% improvement on execution time of xz, but
  that's no longer the case. I went back and also tried to reproduce the
  result with the code from llvm#131728 and couldn't. Now the results from
  that PR and this one are the same: an overall code size reduction but
  no exec time improvements.
* The root cause of the large number is not yet clear for me. I'm still
  investigating it.
Signed-off-by: Mikhail R. Gadelha <mikhail@igalia.com>
Signed-off-by: Mikhail R. Gadelha <mikhail@igalia.com>
@llvmbot
Copy link
Member

llvmbot commented Apr 8, 2025

@llvm/pr-subscribers-backend-risc-v

@llvm/pr-subscribers-llvm-transforms

Author: Mikhail R. Gadelha (mikhailramalho)

Changes

This is a follow-up patch to PR #133256.

This patch adds the branch folding pass after the newly added late optimization pass for riscv, which reduces code size in all SPEC benchmarks except for libm.

The improvements are: 500.perlbench_r (-3.37%), 544.nab_r (-3.06%), 557.xz_r (-2.82%), 523.xalancbmk_r (-2.64%), 520.omnetpp_r (-2.34%), 531.deepsjeng_r (-2.27%), 502.gcc_r (-2.19%), 526.blender_r (-2.11%), 538.imagick_r (-2.03%), 505.mcf_r (-1.82%), 541.leela_r (-1.74%), 511.povray_r (-1.62%), 510.parest_r (-1.62%), 508.namd_r (-1.57%), 525.x264_r (-1.47%).

The geo mean is -2.07%.

Some caveats:

  • On PR #131728 I mentioned a ~7% improvement in the execution time of xz, but that's no longer the case. I went back and also tried to reproduce the result with the code from #131728 and couldn't. Now, the results from that PR and this one are the same: an overall code size reduction but no exec time improvements. The previous number I reported was likely a measurement error.
  • The root cause of the large number is not yet clear to me. I'm still investigating it.

Patch is 1.37 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/134760.diff

50 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVTargetMachine.cpp (+1)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/rotl-rotr.ll (+405-426)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/rv32zbb-zbkb.ll (+18-19)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/rv32zbb.ll (-20)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/shifts.ll (+132-135)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll (+1359-1428)
  • (modified) llvm/test/CodeGen/RISCV/O0-pipeline.ll (+5-1)
  • (modified) llvm/test/CodeGen/RISCV/O3-pipeline.ll (+5-1)
  • (modified) llvm/test/CodeGen/RISCV/atomic-rmw-discard.ll (+16-24)
  • (modified) llvm/test/CodeGen/RISCV/atomic-signext.ll (+54-54)
  • (modified) llvm/test/CodeGen/RISCV/bfloat-br-fcmp.ll (-16)
  • (modified) llvm/test/CodeGen/RISCV/bittest.ll (+231-231)
  • (modified) llvm/test/CodeGen/RISCV/branch_zero.ll (+8-10)
  • (modified) llvm/test/CodeGen/RISCV/cmp-bool.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/copyprop.ll (+6-9)
  • (modified) llvm/test/CodeGen/RISCV/csr-first-use-cost.ll (+27-27)
  • (modified) llvm/test/CodeGen/RISCV/double-br-fcmp.ll (-32)
  • (modified) llvm/test/CodeGen/RISCV/double-maximum-minimum.ll (+27-36)
  • (modified) llvm/test/CodeGen/RISCV/float-br-fcmp.ll (-32)
  • (modified) llvm/test/CodeGen/RISCV/float-maximum-minimum.ll (+24-32)
  • (modified) llvm/test/CodeGen/RISCV/forced-atomics.ll (+12-13)
  • (modified) llvm/test/CodeGen/RISCV/fpclamptosat.ll (+176-192)
  • (modified) llvm/test/CodeGen/RISCV/frame-info.ll (+40-40)
  • (modified) llvm/test/CodeGen/RISCV/half-br-fcmp.ll (-64)
  • (modified) llvm/test/CodeGen/RISCV/half-maximum-minimum.ll (+12-16)
  • (modified) llvm/test/CodeGen/RISCV/machine-pipeliner.ll (+9-9)
  • (modified) llvm/test/CodeGen/RISCV/machine-sink-load-immediate.ll (+3-63)
  • (modified) llvm/test/CodeGen/RISCV/reduce-unnecessary-extension.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/riscv-tail-dup-size.ll (+4-7)
  • (modified) llvm/test/CodeGen/RISCV/rv32zbb.ll (+7-7)
  • (modified) llvm/test/CodeGen/RISCV/rvv/copyprop.mir (+3-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/expandload.ll (+997-2991)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll (+1181-1390)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-scatter.ll (+586-956)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-unaligned.ll (+19-28)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fpclamptosat_vec.ll (+264-348)
  • (modified) llvm/test/CodeGen/RISCV/rvv/pr93587.ll (-10)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vcpop-shl-zext-opt.ll (+16-16)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vsetvli-insert-crossbb.ll (+22-22)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vxrm-insert-out-of-loop.ll (+29-37)
  • (modified) llvm/test/CodeGen/RISCV/sadd_sat.ll (+18-24)
  • (modified) llvm/test/CodeGen/RISCV/sadd_sat_plus.ll (+18-24)
  • (modified) llvm/test/CodeGen/RISCV/setcc-logic.ll (+100-100)
  • (modified) llvm/test/CodeGen/RISCV/sext-zext-trunc.ll (+38-38)
  • (modified) llvm/test/CodeGen/RISCV/shifts.ll (+2-3)
  • (modified) llvm/test/CodeGen/RISCV/simplify-condbr.ll (+13-18)
  • (modified) llvm/test/CodeGen/RISCV/ssub_sat.ll (+18-24)
  • (modified) llvm/test/CodeGen/RISCV/ssub_sat_plus.ll (+18-24)
  • (modified) llvm/test/CodeGen/RISCV/xcvbi.ll (+2-4)
  • (modified) llvm/test/Transforms/LoopStrengthReduce/RISCV/lsr-drop-solution.ll (+10-10)
diff --git a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
index 7fb64be3975d5..63bd0f4c20497 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
@@ -570,6 +570,7 @@ void RISCVPassConfig::addPreEmitPass() {
     addPass(createMachineCopyPropagationPass(true));
   if (TM->getOptLevel() >= CodeGenOptLevel::Default)
     addPass(createRISCVLateBranchOptPass());
+  addPass(&BranchFolderPassID);
   addPass(&BranchRelaxationPassID);
   addPass(createRISCVMakeCompressibleOptPass());
 }
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/rotl-rotr.ll b/llvm/test/CodeGen/RISCV/GlobalISel/rotl-rotr.ll
index 8a786fc9993d2..da8678f9a9916 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/rotl-rotr.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/rotl-rotr.ll
@@ -296,44 +296,43 @@ define i64 @rotr_64(i64 %x, i64 %y) nounwind {
 ; RV32I-NEXT:    bltu a5, a4, .LBB3_2
 ; RV32I-NEXT:  # %bb.1:
 ; RV32I-NEXT:    srl a6, a1, a5
-; RV32I-NEXT:    mv a3, a0
-; RV32I-NEXT:    bnez a5, .LBB3_3
-; RV32I-NEXT:    j .LBB3_4
+; RV32I-NEXT:    j .LBB3_3
 ; RV32I-NEXT:  .LBB3_2:
 ; RV32I-NEXT:    srl a3, a0, a2
 ; RV32I-NEXT:    neg a6, a5
 ; RV32I-NEXT:    sll a6, a1, a6
 ; RV32I-NEXT:    or a6, a3, a6
-; RV32I-NEXT:    mv a3, a0
-; RV32I-NEXT:    beqz a5, .LBB3_4
 ; RV32I-NEXT:  .LBB3_3:
+; RV32I-NEXT:    mv a3, a0
+; RV32I-NEXT:    beqz a5, .LBB3_5
+; RV32I-NEXT:  # %bb.4:
 ; RV32I-NEXT:    mv a3, a6
-; RV32I-NEXT:  .LBB3_4:
+; RV32I-NEXT:  .LBB3_5:
 ; RV32I-NEXT:    neg a6, a2
-; RV32I-NEXT:    bltu a5, a4, .LBB3_7
-; RV32I-NEXT:  # %bb.5:
+; RV32I-NEXT:    bltu a5, a4, .LBB3_9
+; RV32I-NEXT:  # %bb.6:
 ; RV32I-NEXT:    li a2, 0
+; RV32I-NEXT:  .LBB3_7:
 ; RV32I-NEXT:    andi a5, a6, 63
-; RV32I-NEXT:    bgeu a5, a4, .LBB3_8
-; RV32I-NEXT:  .LBB3_6:
+; RV32I-NEXT:    bgeu a5, a4, .LBB3_10
+; RV32I-NEXT:  # %bb.8:
 ; RV32I-NEXT:    sll a4, a0, a6
 ; RV32I-NEXT:    neg a7, a5
 ; RV32I-NEXT:    srl a0, a0, a7
 ; RV32I-NEXT:    sll a6, a1, a6
 ; RV32I-NEXT:    or a0, a0, a6
-; RV32I-NEXT:    bnez a5, .LBB3_9
-; RV32I-NEXT:    j .LBB3_10
-; RV32I-NEXT:  .LBB3_7:
+; RV32I-NEXT:    bnez a5, .LBB3_11
+; RV32I-NEXT:    j .LBB3_12
+; RV32I-NEXT:  .LBB3_9:
 ; RV32I-NEXT:    srl a2, a1, a2
-; RV32I-NEXT:    andi a5, a6, 63
-; RV32I-NEXT:    bltu a5, a4, .LBB3_6
-; RV32I-NEXT:  .LBB3_8:
+; RV32I-NEXT:    j .LBB3_7
+; RV32I-NEXT:  .LBB3_10:
 ; RV32I-NEXT:    li a4, 0
 ; RV32I-NEXT:    sll a0, a0, a5
-; RV32I-NEXT:    beqz a5, .LBB3_10
-; RV32I-NEXT:  .LBB3_9:
+; RV32I-NEXT:    beqz a5, .LBB3_12
+; RV32I-NEXT:  .LBB3_11:
 ; RV32I-NEXT:    mv a1, a0
-; RV32I-NEXT:  .LBB3_10:
+; RV32I-NEXT:  .LBB3_12:
 ; RV32I-NEXT:    or a0, a3, a4
 ; RV32I-NEXT:    or a1, a2, a1
 ; RV32I-NEXT:    ret
@@ -353,44 +352,43 @@ define i64 @rotr_64(i64 %x, i64 %y) nounwind {
 ; RV32ZBB-NEXT:    bltu a5, a4, .LBB3_2
 ; RV32ZBB-NEXT:  # %bb.1:
 ; RV32ZBB-NEXT:    srl a6, a1, a5
-; RV32ZBB-NEXT:    mv a3, a0
-; RV32ZBB-NEXT:    bnez a5, .LBB3_3
-; RV32ZBB-NEXT:    j .LBB3_4
+; RV32ZBB-NEXT:    j .LBB3_3
 ; RV32ZBB-NEXT:  .LBB3_2:
 ; RV32ZBB-NEXT:    srl a3, a0, a2
 ; RV32ZBB-NEXT:    neg a6, a5
 ; RV32ZBB-NEXT:    sll a6, a1, a6
 ; RV32ZBB-NEXT:    or a6, a3, a6
-; RV32ZBB-NEXT:    mv a3, a0
-; RV32ZBB-NEXT:    beqz a5, .LBB3_4
 ; RV32ZBB-NEXT:  .LBB3_3:
+; RV32ZBB-NEXT:    mv a3, a0
+; RV32ZBB-NEXT:    beqz a5, .LBB3_5
+; RV32ZBB-NEXT:  # %bb.4:
 ; RV32ZBB-NEXT:    mv a3, a6
-; RV32ZBB-NEXT:  .LBB3_4:
+; RV32ZBB-NEXT:  .LBB3_5:
 ; RV32ZBB-NEXT:    neg a6, a2
-; RV32ZBB-NEXT:    bltu a5, a4, .LBB3_7
-; RV32ZBB-NEXT:  # %bb.5:
+; RV32ZBB-NEXT:    bltu a5, a4, .LBB3_9
+; RV32ZBB-NEXT:  # %bb.6:
 ; RV32ZBB-NEXT:    li a2, 0
+; RV32ZBB-NEXT:  .LBB3_7:
 ; RV32ZBB-NEXT:    andi a5, a6, 63
-; RV32ZBB-NEXT:    bgeu a5, a4, .LBB3_8
-; RV32ZBB-NEXT:  .LBB3_6:
+; RV32ZBB-NEXT:    bgeu a5, a4, .LBB3_10
+; RV32ZBB-NEXT:  # %bb.8:
 ; RV32ZBB-NEXT:    sll a4, a0, a6
 ; RV32ZBB-NEXT:    neg a7, a5
 ; RV32ZBB-NEXT:    srl a0, a0, a7
 ; RV32ZBB-NEXT:    sll a6, a1, a6
 ; RV32ZBB-NEXT:    or a0, a0, a6
-; RV32ZBB-NEXT:    bnez a5, .LBB3_9
-; RV32ZBB-NEXT:    j .LBB3_10
-; RV32ZBB-NEXT:  .LBB3_7:
+; RV32ZBB-NEXT:    bnez a5, .LBB3_11
+; RV32ZBB-NEXT:    j .LBB3_12
+; RV32ZBB-NEXT:  .LBB3_9:
 ; RV32ZBB-NEXT:    srl a2, a1, a2
-; RV32ZBB-NEXT:    andi a5, a6, 63
-; RV32ZBB-NEXT:    bltu a5, a4, .LBB3_6
-; RV32ZBB-NEXT:  .LBB3_8:
+; RV32ZBB-NEXT:    j .LBB3_7
+; RV32ZBB-NEXT:  .LBB3_10:
 ; RV32ZBB-NEXT:    li a4, 0
 ; RV32ZBB-NEXT:    sll a0, a0, a5
-; RV32ZBB-NEXT:    beqz a5, .LBB3_10
-; RV32ZBB-NEXT:  .LBB3_9:
+; RV32ZBB-NEXT:    beqz a5, .LBB3_12
+; RV32ZBB-NEXT:  .LBB3_11:
 ; RV32ZBB-NEXT:    mv a1, a0
-; RV32ZBB-NEXT:  .LBB3_10:
+; RV32ZBB-NEXT:  .LBB3_12:
 ; RV32ZBB-NEXT:    or a0, a3, a4
 ; RV32ZBB-NEXT:    or a1, a2, a1
 ; RV32ZBB-NEXT:    ret
@@ -407,44 +405,43 @@ define i64 @rotr_64(i64 %x, i64 %y) nounwind {
 ; RV32XTHEADBB-NEXT:    bltu a5, a4, .LBB3_2
 ; RV32XTHEADBB-NEXT:  # %bb.1:
 ; RV32XTHEADBB-NEXT:    srl a6, a1, a5
-; RV32XTHEADBB-NEXT:    mv a3, a0
-; RV32XTHEADBB-NEXT:    bnez a5, .LBB3_3
-; RV32XTHEADBB-NEXT:    j .LBB3_4
+; RV32XTHEADBB-NEXT:    j .LBB3_3
 ; RV32XTHEADBB-NEXT:  .LBB3_2:
 ; RV32XTHEADBB-NEXT:    srl a3, a0, a2
 ; RV32XTHEADBB-NEXT:    neg a6, a5
 ; RV32XTHEADBB-NEXT:    sll a6, a1, a6
 ; RV32XTHEADBB-NEXT:    or a6, a3, a6
-; RV32XTHEADBB-NEXT:    mv a3, a0
-; RV32XTHEADBB-NEXT:    beqz a5, .LBB3_4
 ; RV32XTHEADBB-NEXT:  .LBB3_3:
+; RV32XTHEADBB-NEXT:    mv a3, a0
+; RV32XTHEADBB-NEXT:    beqz a5, .LBB3_5
+; RV32XTHEADBB-NEXT:  # %bb.4:
 ; RV32XTHEADBB-NEXT:    mv a3, a6
-; RV32XTHEADBB-NEXT:  .LBB3_4:
+; RV32XTHEADBB-NEXT:  .LBB3_5:
 ; RV32XTHEADBB-NEXT:    neg a6, a2
-; RV32XTHEADBB-NEXT:    bltu a5, a4, .LBB3_7
-; RV32XTHEADBB-NEXT:  # %bb.5:
+; RV32XTHEADBB-NEXT:    bltu a5, a4, .LBB3_9
+; RV32XTHEADBB-NEXT:  # %bb.6:
 ; RV32XTHEADBB-NEXT:    li a2, 0
+; RV32XTHEADBB-NEXT:  .LBB3_7:
 ; RV32XTHEADBB-NEXT:    andi a5, a6, 63
-; RV32XTHEADBB-NEXT:    bgeu a5, a4, .LBB3_8
-; RV32XTHEADBB-NEXT:  .LBB3_6:
+; RV32XTHEADBB-NEXT:    bgeu a5, a4, .LBB3_10
+; RV32XTHEADBB-NEXT:  # %bb.8:
 ; RV32XTHEADBB-NEXT:    sll a4, a0, a6
 ; RV32XTHEADBB-NEXT:    neg a7, a5
 ; RV32XTHEADBB-NEXT:    srl a0, a0, a7
 ; RV32XTHEADBB-NEXT:    sll a6, a1, a6
 ; RV32XTHEADBB-NEXT:    or a0, a0, a6
-; RV32XTHEADBB-NEXT:    bnez a5, .LBB3_9
-; RV32XTHEADBB-NEXT:    j .LBB3_10
-; RV32XTHEADBB-NEXT:  .LBB3_7:
+; RV32XTHEADBB-NEXT:    bnez a5, .LBB3_11
+; RV32XTHEADBB-NEXT:    j .LBB3_12
+; RV32XTHEADBB-NEXT:  .LBB3_9:
 ; RV32XTHEADBB-NEXT:    srl a2, a1, a2
-; RV32XTHEADBB-NEXT:    andi a5, a6, 63
-; RV32XTHEADBB-NEXT:    bltu a5, a4, .LBB3_6
-; RV32XTHEADBB-NEXT:  .LBB3_8:
+; RV32XTHEADBB-NEXT:    j .LBB3_7
+; RV32XTHEADBB-NEXT:  .LBB3_10:
 ; RV32XTHEADBB-NEXT:    li a4, 0
 ; RV32XTHEADBB-NEXT:    sll a0, a0, a5
-; RV32XTHEADBB-NEXT:    beqz a5, .LBB3_10
-; RV32XTHEADBB-NEXT:  .LBB3_9:
+; RV32XTHEADBB-NEXT:    beqz a5, .LBB3_12
+; RV32XTHEADBB-NEXT:  .LBB3_11:
 ; RV32XTHEADBB-NEXT:    mv a1, a0
-; RV32XTHEADBB-NEXT:  .LBB3_10:
+; RV32XTHEADBB-NEXT:  .LBB3_12:
 ; RV32XTHEADBB-NEXT:    or a0, a3, a4
 ; RV32XTHEADBB-NEXT:    or a1, a2, a1
 ; RV32XTHEADBB-NEXT:    ret
@@ -961,43 +958,42 @@ define i64 @rotl_64_mask_and_127_and_63(i64 %x, i64 %y) nounwind {
 ; RV32I-NEXT:  # %bb.1:
 ; RV32I-NEXT:    li a3, 0
 ; RV32I-NEXT:    sll a7, a0, a6
-; RV32I-NEXT:    mv a5, a1
-; RV32I-NEXT:    bnez a6, .LBB11_3
-; RV32I-NEXT:    j .LBB11_4
+; RV32I-NEXT:    j .LBB11_3
 ; RV32I-NEXT:  .LBB11_2:
 ; RV32I-NEXT:    sll a3, a0, a2
 ; RV32I-NEXT:    neg a5, a6
 ; RV32I-NEXT:    srl a5, a0, a5
 ; RV32I-NEXT:    sll a7, a1, a2
 ; RV32I-NEXT:    or a7, a5, a7
-; RV32I-NEXT:    mv a5, a1
-; RV32I-NEXT:    beqz a6, .LBB11_4
 ; RV32I-NEXT:  .LBB11_3:
+; RV32I-NEXT:    mv a5, a1
+; RV32I-NEXT:    beqz a6, .LBB11_5
+; RV32I-NEXT:  # %bb.4:
 ; RV32I-NEXT:    mv a5, a7
-; RV32I-NEXT:  .LBB11_4:
+; RV32I-NEXT:  .LBB11_5:
 ; RV32I-NEXT:    neg a2, a2
 ; RV32I-NEXT:    andi a6, a2, 63
-; RV32I-NEXT:    bltu a6, a4, .LBB11_6
-; RV32I-NEXT:  # %bb.5:
+; RV32I-NEXT:    bltu a6, a4, .LBB11_7
+; RV32I-NEXT:  # %bb.6:
 ; RV32I-NEXT:    srl a7, a1, a6
-; RV32I-NEXT:    bnez a6, .LBB11_7
-; RV32I-NEXT:    j .LBB11_8
-; RV32I-NEXT:  .LBB11_6:
+; RV32I-NEXT:    bnez a6, .LBB11_8
+; RV32I-NEXT:    j .LBB11_9
+; RV32I-NEXT:  .LBB11_7:
 ; RV32I-NEXT:    srl a7, a0, a2
 ; RV32I-NEXT:    neg t0, a6
 ; RV32I-NEXT:    sll t0, a1, t0
 ; RV32I-NEXT:    or a7, a7, t0
-; RV32I-NEXT:    beqz a6, .LBB11_8
-; RV32I-NEXT:  .LBB11_7:
-; RV32I-NEXT:    mv a0, a7
+; RV32I-NEXT:    beqz a6, .LBB11_9
 ; RV32I-NEXT:  .LBB11_8:
-; RV32I-NEXT:    bltu a6, a4, .LBB11_10
-; RV32I-NEXT:  # %bb.9:
+; RV32I-NEXT:    mv a0, a7
+; RV32I-NEXT:  .LBB11_9:
+; RV32I-NEXT:    bltu a6, a4, .LBB11_11
+; RV32I-NEXT:  # %bb.10:
 ; RV32I-NEXT:    li a1, 0
-; RV32I-NEXT:    j .LBB11_11
-; RV32I-NEXT:  .LBB11_10:
-; RV32I-NEXT:    srl a1, a1, a2
+; RV32I-NEXT:    j .LBB11_12
 ; RV32I-NEXT:  .LBB11_11:
+; RV32I-NEXT:    srl a1, a1, a2
+; RV32I-NEXT:  .LBB11_12:
 ; RV32I-NEXT:    or a0, a3, a0
 ; RV32I-NEXT:    or a1, a5, a1
 ; RV32I-NEXT:    ret
@@ -1018,43 +1014,42 @@ define i64 @rotl_64_mask_and_127_and_63(i64 %x, i64 %y) nounwind {
 ; RV32ZBB-NEXT:  # %bb.1:
 ; RV32ZBB-NEXT:    li a3, 0
 ; RV32ZBB-NEXT:    sll a7, a0, a6
-; RV32ZBB-NEXT:    mv a5, a1
-; RV32ZBB-NEXT:    bnez a6, .LBB11_3
-; RV32ZBB-NEXT:    j .LBB11_4
+; RV32ZBB-NEXT:    j .LBB11_3
 ; RV32ZBB-NEXT:  .LBB11_2:
 ; RV32ZBB-NEXT:    sll a3, a0, a2
 ; RV32ZBB-NEXT:    neg a5, a6
 ; RV32ZBB-NEXT:    srl a5, a0, a5
 ; RV32ZBB-NEXT:    sll a7, a1, a2
 ; RV32ZBB-NEXT:    or a7, a5, a7
-; RV32ZBB-NEXT:    mv a5, a1
-; RV32ZBB-NEXT:    beqz a6, .LBB11_4
 ; RV32ZBB-NEXT:  .LBB11_3:
+; RV32ZBB-NEXT:    mv a5, a1
+; RV32ZBB-NEXT:    beqz a6, .LBB11_5
+; RV32ZBB-NEXT:  # %bb.4:
 ; RV32ZBB-NEXT:    mv a5, a7
-; RV32ZBB-NEXT:  .LBB11_4:
+; RV32ZBB-NEXT:  .LBB11_5:
 ; RV32ZBB-NEXT:    neg a2, a2
 ; RV32ZBB-NEXT:    andi a6, a2, 63
-; RV32ZBB-NEXT:    bltu a6, a4, .LBB11_6
-; RV32ZBB-NEXT:  # %bb.5:
+; RV32ZBB-NEXT:    bltu a6, a4, .LBB11_7
+; RV32ZBB-NEXT:  # %bb.6:
 ; RV32ZBB-NEXT:    srl a7, a1, a6
-; RV32ZBB-NEXT:    bnez a6, .LBB11_7
-; RV32ZBB-NEXT:    j .LBB11_8
-; RV32ZBB-NEXT:  .LBB11_6:
+; RV32ZBB-NEXT:    bnez a6, .LBB11_8
+; RV32ZBB-NEXT:    j .LBB11_9
+; RV32ZBB-NEXT:  .LBB11_7:
 ; RV32ZBB-NEXT:    srl a7, a0, a2
 ; RV32ZBB-NEXT:    neg t0, a6
 ; RV32ZBB-NEXT:    sll t0, a1, t0
 ; RV32ZBB-NEXT:    or a7, a7, t0
-; RV32ZBB-NEXT:    beqz a6, .LBB11_8
-; RV32ZBB-NEXT:  .LBB11_7:
-; RV32ZBB-NEXT:    mv a0, a7
+; RV32ZBB-NEXT:    beqz a6, .LBB11_9
 ; RV32ZBB-NEXT:  .LBB11_8:
-; RV32ZBB-NEXT:    bltu a6, a4, .LBB11_10
-; RV32ZBB-NEXT:  # %bb.9:
+; RV32ZBB-NEXT:    mv a0, a7
+; RV32ZBB-NEXT:  .LBB11_9:
+; RV32ZBB-NEXT:    bltu a6, a4, .LBB11_11
+; RV32ZBB-NEXT:  # %bb.10:
 ; RV32ZBB-NEXT:    li a1, 0
-; RV32ZBB-NEXT:    j .LBB11_11
-; RV32ZBB-NEXT:  .LBB11_10:
-; RV32ZBB-NEXT:    srl a1, a1, a2
+; RV32ZBB-NEXT:    j .LBB11_12
 ; RV32ZBB-NEXT:  .LBB11_11:
+; RV32ZBB-NEXT:    srl a1, a1, a2
+; RV32ZBB-NEXT:  .LBB11_12:
 ; RV32ZBB-NEXT:    or a0, a3, a0
 ; RV32ZBB-NEXT:    or a1, a5, a1
 ; RV32ZBB-NEXT:    ret
@@ -1075,43 +1070,42 @@ define i64 @rotl_64_mask_and_127_and_63(i64 %x, i64 %y) nounwind {
 ; RV32XTHEADBB-NEXT:  # %bb.1:
 ; RV32XTHEADBB-NEXT:    li a3, 0
 ; RV32XTHEADBB-NEXT:    sll a7, a0, a6
-; RV32XTHEADBB-NEXT:    mv a5, a1
-; RV32XTHEADBB-NEXT:    bnez a6, .LBB11_3
-; RV32XTHEADBB-NEXT:    j .LBB11_4
+; RV32XTHEADBB-NEXT:    j .LBB11_3
 ; RV32XTHEADBB-NEXT:  .LBB11_2:
 ; RV32XTHEADBB-NEXT:    sll a3, a0, a2
 ; RV32XTHEADBB-NEXT:    neg a5, a6
 ; RV32XTHEADBB-NEXT:    srl a5, a0, a5
 ; RV32XTHEADBB-NEXT:    sll a7, a1, a2
 ; RV32XTHEADBB-NEXT:    or a7, a5, a7
-; RV32XTHEADBB-NEXT:    mv a5, a1
-; RV32XTHEADBB-NEXT:    beqz a6, .LBB11_4
 ; RV32XTHEADBB-NEXT:  .LBB11_3:
+; RV32XTHEADBB-NEXT:    mv a5, a1
+; RV32XTHEADBB-NEXT:    beqz a6, .LBB11_5
+; RV32XTHEADBB-NEXT:  # %bb.4:
 ; RV32XTHEADBB-NEXT:    mv a5, a7
-; RV32XTHEADBB-NEXT:  .LBB11_4:
+; RV32XTHEADBB-NEXT:  .LBB11_5:
 ; RV32XTHEADBB-NEXT:    neg a2, a2
 ; RV32XTHEADBB-NEXT:    andi a6, a2, 63
-; RV32XTHEADBB-NEXT:    bltu a6, a4, .LBB11_6
-; RV32XTHEADBB-NEXT:  # %bb.5:
+; RV32XTHEADBB-NEXT:    bltu a6, a4, .LBB11_7
+; RV32XTHEADBB-NEXT:  # %bb.6:
 ; RV32XTHEADBB-NEXT:    srl a7, a1, a6
-; RV32XTHEADBB-NEXT:    bnez a6, .LBB11_7
-; RV32XTHEADBB-NEXT:    j .LBB11_8
-; RV32XTHEADBB-NEXT:  .LBB11_6:
+; RV32XTHEADBB-NEXT:    bnez a6, .LBB11_8
+; RV32XTHEADBB-NEXT:    j .LBB11_9
+; RV32XTHEADBB-NEXT:  .LBB11_7:
 ; RV32XTHEADBB-NEXT:    srl a7, a0, a2
 ; RV32XTHEADBB-NEXT:    neg t0, a6
 ; RV32XTHEADBB-NEXT:    sll t0, a1, t0
 ; RV32XTHEADBB-NEXT:    or a7, a7, t0
-; RV32XTHEADBB-NEXT:    beqz a6, .LBB11_8
-; RV32XTHEADBB-NEXT:  .LBB11_7:
-; RV32XTHEADBB-NEXT:    mv a0, a7
+; RV32XTHEADBB-NEXT:    beqz a6, .LBB11_9
 ; RV32XTHEADBB-NEXT:  .LBB11_8:
-; RV32XTHEADBB-NEXT:    bltu a6, a4, .LBB11_10
-; RV32XTHEADBB-NEXT:  # %bb.9:
+; RV32XTHEADBB-NEXT:    mv a0, a7
+; RV32XTHEADBB-NEXT:  .LBB11_9:
+; RV32XTHEADBB-NEXT:    bltu a6, a4, .LBB11_11
+; RV32XTHEADBB-NEXT:  # %bb.10:
 ; RV32XTHEADBB-NEXT:    li a1, 0
-; RV32XTHEADBB-NEXT:    j .LBB11_11
-; RV32XTHEADBB-NEXT:  .LBB11_10:
-; RV32XTHEADBB-NEXT:    srl a1, a1, a2
+; RV32XTHEADBB-NEXT:    j .LBB11_12
 ; RV32XTHEADBB-NEXT:  .LBB11_11:
+; RV32XTHEADBB-NEXT:    srl a1, a1, a2
+; RV32XTHEADBB-NEXT:  .LBB11_12:
 ; RV32XTHEADBB-NEXT:    or a0, a3, a0
 ; RV32XTHEADBB-NEXT:    or a1, a5, a1
 ; RV32XTHEADBB-NEXT:    ret
@@ -1406,44 +1400,43 @@ define i64 @rotr_64_mask_and_127_and_63(i64 %x, i64 %y) nounwind {
 ; RV32I-NEXT:    bltu a4, a5, .LBB14_2
 ; RV32I-NEXT:  # %bb.1:
 ; RV32I-NEXT:    srl a6, a1, a4
-; RV32I-NEXT:    mv a3, a0
-; RV32I-NEXT:    bnez a4, .LBB14_3
-; RV32I-NEXT:    j .LBB14_4
+; RV32I-NEXT:    j .LBB14_3
 ; RV32I-NEXT:  .LBB14_2:
 ; RV32I-NEXT:    srl a3, a0, a2
 ; RV32I-NEXT:    neg a6, a4
 ; RV32I-NEXT:    sll a6, a1, a6
 ; RV32I-NEXT:    or a6, a3, a6
-; RV32I-NEXT:    mv a3, a0
-; RV32I-NEXT:    beqz a4, .LBB14_4
 ; RV32I-NEXT:  .LBB14_3:
+; RV32I-NEXT:    mv a3, a0
+; RV32I-NEXT:    beqz a4, .LBB14_5
+; RV32I-NEXT:  # %bb.4:
 ; RV32I-NEXT:    mv a3, a6
-; RV32I-NEXT:  .LBB14_4:
-; RV32I-NEXT:    bltu a4, a5, .LBB14_6
-; RV32I-NEXT:  # %bb.5:
+; RV32I-NEXT:  .LBB14_5:
+; RV32I-NEXT:    bltu a4, a5, .LBB14_7
+; RV32I-NEXT:  # %bb.6:
 ; RV32I-NEXT:    li a4, 0
-; RV32I-NEXT:    j .LBB14_7
-; RV32I-NEXT:  .LBB14_6:
-; RV32I-NEXT:    srl a4, a1, a2
+; RV32I-NEXT:    j .LBB14_8
 ; RV32I-NEXT:  .LBB14_7:
+; RV32I-NEXT:    srl a4, a1, a2
+; RV32I-NEXT:  .LBB14_8:
 ; RV32I-NEXT:    neg a7, a2
 ; RV32I-NEXT:    andi a6, a7, 63
-; RV32I-NEXT:    bltu a6, a5, .LBB14_9
-; RV32I-NEXT:  # %bb.8:
+; RV32I-NEXT:    bltu a6, a5, .LBB14_10
+; RV32I-NEXT:  # %bb.9:
 ; RV32I-NEXT:    li a2, 0
 ; RV32I-NEXT:    sll a0, a0, a6
-; RV32I-NEXT:    bnez a6, .LBB14_10
-; RV32I-NEXT:    j .LBB14_11
-; RV32I-NEXT:  .LBB14_9:
+; RV32I-NEXT:    bnez a6, .LBB14_11
+; RV32I-NEXT:    j .LBB14_12
+; RV32I-NEXT:  .LBB14_10:
 ; RV32I-NEXT:    sll a2, a0, a7
 ; RV32I-NEXT:    neg a5, a6
 ; RV32I-NEXT:    srl a0, a0, a5
 ; RV32I-NEXT:    sll a5, a1, a7
 ; RV32I-NEXT:    or a0, a0, a5
-; RV32I-NEXT:    beqz a6, .LBB14_11
-; RV32I-NEXT:  .LBB14_10:
-; RV32I-NEXT:    mv a1, a0
+; RV32I-NEXT:    beqz a6, .LBB14_12
 ; RV32I-NEXT:  .LBB14_11:
+; RV32I-NEXT:    mv a1, a0
+; RV32I-NEXT:  .LBB14_12:
 ; RV32I-NEXT:    or a0, a3, a2
 ; RV32I-NEXT:    or a1, a4, a1
 ; RV32I-NEXT:    ret
@@ -1463,44 +1456,43 @@ define i64 @rotr_64_mask_and_127_and_63(i64 %x, i64 %y) nounwind {
 ; RV32ZBB-NEXT:    bltu a4, a5, .LBB14_2
 ; RV32ZBB-NEXT:  # %bb.1:
 ; RV32ZBB-NEXT:    srl a6, a1, a4
-; RV32ZBB-NEXT:    mv a3, a0
-; RV32ZBB-NEXT:    bnez a4, .LBB14_3
-; RV32ZBB-NEXT:    j .LBB14_4
+; RV32ZBB-NEXT:    j .LBB14_3
 ; RV32ZBB-NEXT:  .LBB14_2:
 ; RV32ZBB-NEXT:    srl a3, a0, a2
 ; RV32ZBB-NEXT:    neg a6, a4
 ; RV32ZBB-NEXT:    sll a6, a1, a6
 ; RV32ZBB-NEXT:    or a6, a3, a6
-; RV32ZBB-NEXT:    mv a3, a0
-; RV32ZBB-NEXT:    beqz a4, .LBB14_4
 ; RV32ZBB-NEXT:  .LBB14_3:
+; RV32ZBB-NEXT:    mv a3, a0
+; RV32ZBB-NEXT:    beqz a4, .LBB14_5
+; RV32ZBB-NEXT:  # %bb.4:
 ; RV32ZBB-NEXT:    mv a3, a6
-; RV32ZBB-NEXT:  .LBB14_4:
-; RV32ZBB-NEXT:    bltu a4, a5, .LBB14_6
-; RV32ZBB-NEXT:  # %bb.5:
+; RV32ZBB-NEXT:  .LBB14_5:
+; RV32ZBB-NEXT:    bltu a4, a5, .LBB14_7
+; RV32ZBB-NEXT:  # %bb.6:
 ; RV32ZBB-NEXT:    li a4, 0
-; RV32ZBB-NEXT:    j .LBB14_7
-; RV32ZBB-NEXT:  .LBB14_6:
-; RV32ZBB-NEXT:    srl a4, a1, a2
+; RV32ZBB-NEXT:    j .LBB14_8
 ; RV32ZBB-NEXT:  .LBB14_7:
+; RV32ZBB-NEXT:    srl a4, a1, a2
+; RV32ZBB-NEXT:  .LBB14_8:
 ; RV32ZBB-NEXT:    neg a7, a2
 ; RV32ZBB-NEXT:    andi a6, a7, 63
-; RV32ZBB-NEXT:    bltu a6, a5, .LBB14_9
-; RV32ZBB-NEXT:  # %bb.8:
+; RV32ZBB-NEXT:    bltu a6, a5, .LBB14_10
+; RV32ZBB-NEXT:  # %bb.9:
 ; RV32ZBB-NEXT:    li a2, 0
 ; RV32ZBB-NEXT:    sll a0, a0, a6
-; RV32ZBB-NEXT:    bnez a6, .LBB14_10
-; RV32ZBB-NEXT:    j .LBB14_11
-; RV32ZBB-NEXT:  .LBB14_9:
+; RV32ZBB-NEXT:    bnez a6, .LBB14_11
+; RV32ZBB-NEXT:    j .LBB14_12
+; RV32ZBB-NEXT:  .LBB14_10:
 ; RV32ZBB-NEXT:    sll a2, a0, a7
 ; RV32ZBB-NEXT:    neg a5, a6
 ; RV32ZBB-NEXT:    srl a0, a0, a5
 ; RV32ZBB-NEXT:    sll a5, a1, a7
 ; RV32ZBB-NEXT:    or a0, a0, a5
-; RV32ZBB-NEXT:    beqz a6, .LBB14_11
-; RV32ZBB-NEXT:  .LBB14_10:
-; RV32ZBB-NEXT:    mv a1, a0
+; RV32ZBB-NEXT:    beqz a6, .LBB14_12
 ; RV32ZBB-NEXT:  .LBB14_11:
+; RV32ZBB-NEXT:    mv a1, a0
+; RV32ZBB-NEXT:  .LBB14_12:
 ; RV32ZBB-NEXT:    or a0, a3, a2
 ; RV32ZBB-NEXT:    or a1, a4, a1
 ; RV32ZBB-NEXT:    ret
@@ -1520,44 +1512,43 @@ define i64 @rotr_64_mask_and_127_and_63(i64 %x, i64 %y) nounwind {
 ; RV32XTHEADBB-NEXT:    bltu a4, a5, .LBB14_2
 ; RV32XTHEADBB-NEXT:  # %bb.1:
 ; RV32XTHEADBB-NEXT:    srl a6, a1, a4
-; RV32XTHEADBB-NEXT:    mv a3, a0
-; RV32XTHEADBB-NEXT:    bnez a4, .LBB14_3
-; RV32XTHEADBB-NEXT:    j .LBB14_4
+; RV32XTHEADBB-NEXT:    j .LBB14_3
 ; RV32XTHEADBB-NEXT:  .LBB14_2:
 ; RV32XTHEADBB-NEXT:    srl a3, a0, a2
 ; RV32XTHEADBB-NEXT:    neg a6, a4
 ; RV32XTHEADBB-NEXT:    sll a6, a1, a6
 ; RV32XTHEADBB-NEXT:    or a6, a3, a6
-; RV32XTHEADBB-NEXT:    mv a3, a0
-; RV32XTHEADBB-NEXT:    beqz a4, .LBB14_4
 ; RV32XTHEADBB-NEXT:  .LBB14_3:
+; RV32XTHEADBB-NEXT:    mv a3, a0
+; RV32XTHEADBB-NEXT:    beqz a4, .LBB14_5
+; RV32XTHEADBB-NEXT:  # %bb.4:
 ; RV32XTHEADBB-NEXT:    mv a3, a6
-; RV32XTHEADBB-NEXT:  .LBB14_4:
-; RV32XTHEADBB-NEXT:    bltu a4, a5, .LBB14_6
-; RV32XTHEADBB-NEXT:  # %bb.5:
+; RV32XTHEADBB-NEXT:  .LBB14_5:
+; RV32XTHEADBB-NEXT:    bltu a4, a5, .LBB14_7
+; RV32XTHEADBB-NEXT:  # %bb.6:
 ; RV32XTHEADBB-NEXT:    li a4, 0
-; RV32XTHEADBB-NEXT:    j .LBB14_7
-; RV32XTHEADBB-NEXT:  .LBB14_6:
-; RV32XTHEADBB-NEXT:    srl a4, a1, a2
+; RV32XTHEADBB-NEXT:    j .LBB14_8
 ; RV32XTHEADBB-NEXT:  .LBB14_7:
+; RV32XTHEADBB-NEXT:    srl a4, a1, a2
+; RV32XTHEADBB-NEXT:  .LBB14_8:
 ; RV32XTHEADBB-NEXT:    neg a7, a2
 ; RV32XTHEADBB-NEXT:    andi a6, a7, 63
-; RV32XTHEADBB-NEXT:    bltu a6, a5, .LBB14_9
-; RV32XTHEADBB-NEXT:  # %bb.8:
+; RV32XTHEADBB-NEXT:    bltu a6, a5, .LBB14_10
+; RV32XTHEADBB-NEXT:  # %bb.9:
 ; RV32XTHEADBB-NEXT:    li a2, 0
 ; RV32XTHEADBB-NEXT:    sll a0, a0, a6
-; RV32XTHEADBB-NEXT:    bnez a6, .LBB14_10
-; RV32XTHEADBB-NEXT:    j .LBB14_11
-; RV32XTHEADBB-NEXT:  .LBB14_9:
+; RV32XTHEADBB-NEXT:    bnez a6, .LBB14_11
+; RV32XTHEADBB-NEXT:    j .LBB14_12
+; RV32XTHEADBB-NEXT:  .LBB14_10:
 ; RV32XTHEADBB-NEXT:    sll a2, a0, a7
 ; RV32XTHEADBB-NEXT:    neg a5, a6
 ; RV32XTHEADBB-NEXT:    srl a0, a0, a5
 ; RV32XTHEADBB-NEXT:    sll a5, a1, a7
 ; RV32XTHEADBB-NEXT:    or a0, a0, a5
-; RV32XTHEADBB-NEXT:    beqz a6, .LBB14_11
-; RV32XTHEADBB-NEXT:  .LBB14_10:
-; RV32XTHEADBB-NEXT:    mv a1, a0
+; RV32XTHEADBB-NEXT:    beqz a6, .LBB14_12
 ; RV32XTHEADBB-NEXT:  .LBB14_11:
+; RV32XTHEADBB-NEXT:    mv a1, a0
+; RV32XTHEADBB-NEXT:  .LBB14_12:
 ; RV32XTHEADBB-NEXT:    or a0, a3, a2
 ; RV32XTHEADBB-NEXT:    or a1, a4, a1
 ; RV32XTHEADBB-NEXT:    ret
@@ -2061,60 +2052,59 @@ define signext i64 @rotr_64_mask_shared(i64 signext %a, i64 signext %b, i64 sign
 ; RV32I-NEXT:    bltu a5, t0, .LBB19_2
 ; RV32I-NEXT:  # %bb.1:
 ; RV32I-NEXT:    srl t1, a1, a5
-; RV32I-NEXT:    mv a7, a0
-; RV32I-NEXT:    bnez a5, .LBB19_3
-; RV32I-NEXT:    j .LBB19_4
+; RV32I-NEXT:    j .LBB19_3
 ; RV32I-NEXT:  .LBB19_2:
 ; RV32I-NEXT:    srl a7, a0, a4
 ; RV32I-NEXT:    sll t1, ...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Apr 8, 2025

@llvm/pr-subscribers-llvm-globalisel

Author: Mikhail R. Gadelha (mikhailramalho)

Changes

This is a follow-up patch to PR #133256.

This patch adds the branch folding pass after the newly added late optimization pass for riscv, which reduces code size in all SPEC benchmarks except for libm.

The improvements are: 500.perlbench_r (-3.37%), 544.nab_r (-3.06%), 557.xz_r (-2.82%), 523.xalancbmk_r (-2.64%), 520.omnetpp_r (-2.34%), 531.deepsjeng_r (-2.27%), 502.gcc_r (-2.19%), 526.blender_r (-2.11%), 538.imagick_r (-2.03%), 505.mcf_r (-1.82%), 541.leela_r (-1.74%), 511.povray_r (-1.62%), 510.parest_r (-1.62%), 508.namd_r (-1.57%), 525.x264_r (-1.47%).

The geo mean is -2.07%.

Some caveats:

  • On PR #131728 I mentioned a ~7% improvement in the execution time of xz, but that's no longer the case. I went back and also tried to reproduce the result with the code from #131728 and couldn't. Now, the results from that PR and this one are the same: an overall code size reduction but no exec time improvements. The previous number I reported was likely a measurement error.
  • The root cause of the large number is not yet clear to me. I'm still investigating it.

Patch is 1.37 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/134760.diff

50 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVTargetMachine.cpp (+1)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/rotl-rotr.ll (+405-426)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/rv32zbb-zbkb.ll (+18-19)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/rv32zbb.ll (-20)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/shifts.ll (+132-135)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll (+1359-1428)
  • (modified) llvm/test/CodeGen/RISCV/O0-pipeline.ll (+5-1)
  • (modified) llvm/test/CodeGen/RISCV/O3-pipeline.ll (+5-1)
  • (modified) llvm/test/CodeGen/RISCV/atomic-rmw-discard.ll (+16-24)
  • (modified) llvm/test/CodeGen/RISCV/atomic-signext.ll (+54-54)
  • (modified) llvm/test/CodeGen/RISCV/bfloat-br-fcmp.ll (-16)
  • (modified) llvm/test/CodeGen/RISCV/bittest.ll (+231-231)
  • (modified) llvm/test/CodeGen/RISCV/branch_zero.ll (+8-10)
  • (modified) llvm/test/CodeGen/RISCV/cmp-bool.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/copyprop.ll (+6-9)
  • (modified) llvm/test/CodeGen/RISCV/csr-first-use-cost.ll (+27-27)
  • (modified) llvm/test/CodeGen/RISCV/double-br-fcmp.ll (-32)
  • (modified) llvm/test/CodeGen/RISCV/double-maximum-minimum.ll (+27-36)
  • (modified) llvm/test/CodeGen/RISCV/float-br-fcmp.ll (-32)
  • (modified) llvm/test/CodeGen/RISCV/float-maximum-minimum.ll (+24-32)
  • (modified) llvm/test/CodeGen/RISCV/forced-atomics.ll (+12-13)
  • (modified) llvm/test/CodeGen/RISCV/fpclamptosat.ll (+176-192)
  • (modified) llvm/test/CodeGen/RISCV/frame-info.ll (+40-40)
  • (modified) llvm/test/CodeGen/RISCV/half-br-fcmp.ll (-64)
  • (modified) llvm/test/CodeGen/RISCV/half-maximum-minimum.ll (+12-16)
  • (modified) llvm/test/CodeGen/RISCV/machine-pipeliner.ll (+9-9)
  • (modified) llvm/test/CodeGen/RISCV/machine-sink-load-immediate.ll (+3-63)
  • (modified) llvm/test/CodeGen/RISCV/reduce-unnecessary-extension.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/riscv-tail-dup-size.ll (+4-7)
  • (modified) llvm/test/CodeGen/RISCV/rv32zbb.ll (+7-7)
  • (modified) llvm/test/CodeGen/RISCV/rvv/copyprop.mir (+3-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/expandload.ll (+997-2991)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll (+1181-1390)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-scatter.ll (+586-956)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-unaligned.ll (+19-28)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fpclamptosat_vec.ll (+264-348)
  • (modified) llvm/test/CodeGen/RISCV/rvv/pr93587.ll (-10)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vcpop-shl-zext-opt.ll (+16-16)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vsetvli-insert-crossbb.ll (+22-22)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vxrm-insert-out-of-loop.ll (+29-37)
  • (modified) llvm/test/CodeGen/RISCV/sadd_sat.ll (+18-24)
  • (modified) llvm/test/CodeGen/RISCV/sadd_sat_plus.ll (+18-24)
  • (modified) llvm/test/CodeGen/RISCV/setcc-logic.ll (+100-100)
  • (modified) llvm/test/CodeGen/RISCV/sext-zext-trunc.ll (+38-38)
  • (modified) llvm/test/CodeGen/RISCV/shifts.ll (+2-3)
  • (modified) llvm/test/CodeGen/RISCV/simplify-condbr.ll (+13-18)
  • (modified) llvm/test/CodeGen/RISCV/ssub_sat.ll (+18-24)
  • (modified) llvm/test/CodeGen/RISCV/ssub_sat_plus.ll (+18-24)
  • (modified) llvm/test/CodeGen/RISCV/xcvbi.ll (+2-4)
  • (modified) llvm/test/Transforms/LoopStrengthReduce/RISCV/lsr-drop-solution.ll (+10-10)
diff --git a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
index 7fb64be3975d5..63bd0f4c20497 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
@@ -570,6 +570,7 @@ void RISCVPassConfig::addPreEmitPass() {
     addPass(createMachineCopyPropagationPass(true));
   if (TM->getOptLevel() >= CodeGenOptLevel::Default)
     addPass(createRISCVLateBranchOptPass());
+  addPass(&BranchFolderPassID);
   addPass(&BranchRelaxationPassID);
   addPass(createRISCVMakeCompressibleOptPass());
 }
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/rotl-rotr.ll b/llvm/test/CodeGen/RISCV/GlobalISel/rotl-rotr.ll
index 8a786fc9993d2..da8678f9a9916 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/rotl-rotr.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/rotl-rotr.ll
@@ -296,44 +296,43 @@ define i64 @rotr_64(i64 %x, i64 %y) nounwind {
 ; RV32I-NEXT:    bltu a5, a4, .LBB3_2
 ; RV32I-NEXT:  # %bb.1:
 ; RV32I-NEXT:    srl a6, a1, a5
-; RV32I-NEXT:    mv a3, a0
-; RV32I-NEXT:    bnez a5, .LBB3_3
-; RV32I-NEXT:    j .LBB3_4
+; RV32I-NEXT:    j .LBB3_3
 ; RV32I-NEXT:  .LBB3_2:
 ; RV32I-NEXT:    srl a3, a0, a2
 ; RV32I-NEXT:    neg a6, a5
 ; RV32I-NEXT:    sll a6, a1, a6
 ; RV32I-NEXT:    or a6, a3, a6
-; RV32I-NEXT:    mv a3, a0
-; RV32I-NEXT:    beqz a5, .LBB3_4
 ; RV32I-NEXT:  .LBB3_3:
+; RV32I-NEXT:    mv a3, a0
+; RV32I-NEXT:    beqz a5, .LBB3_5
+; RV32I-NEXT:  # %bb.4:
 ; RV32I-NEXT:    mv a3, a6
-; RV32I-NEXT:  .LBB3_4:
+; RV32I-NEXT:  .LBB3_5:
 ; RV32I-NEXT:    neg a6, a2
-; RV32I-NEXT:    bltu a5, a4, .LBB3_7
-; RV32I-NEXT:  # %bb.5:
+; RV32I-NEXT:    bltu a5, a4, .LBB3_9
+; RV32I-NEXT:  # %bb.6:
 ; RV32I-NEXT:    li a2, 0
+; RV32I-NEXT:  .LBB3_7:
 ; RV32I-NEXT:    andi a5, a6, 63
-; RV32I-NEXT:    bgeu a5, a4, .LBB3_8
-; RV32I-NEXT:  .LBB3_6:
+; RV32I-NEXT:    bgeu a5, a4, .LBB3_10
+; RV32I-NEXT:  # %bb.8:
 ; RV32I-NEXT:    sll a4, a0, a6
 ; RV32I-NEXT:    neg a7, a5
 ; RV32I-NEXT:    srl a0, a0, a7
 ; RV32I-NEXT:    sll a6, a1, a6
 ; RV32I-NEXT:    or a0, a0, a6
-; RV32I-NEXT:    bnez a5, .LBB3_9
-; RV32I-NEXT:    j .LBB3_10
-; RV32I-NEXT:  .LBB3_7:
+; RV32I-NEXT:    bnez a5, .LBB3_11
+; RV32I-NEXT:    j .LBB3_12
+; RV32I-NEXT:  .LBB3_9:
 ; RV32I-NEXT:    srl a2, a1, a2
-; RV32I-NEXT:    andi a5, a6, 63
-; RV32I-NEXT:    bltu a5, a4, .LBB3_6
-; RV32I-NEXT:  .LBB3_8:
+; RV32I-NEXT:    j .LBB3_7
+; RV32I-NEXT:  .LBB3_10:
 ; RV32I-NEXT:    li a4, 0
 ; RV32I-NEXT:    sll a0, a0, a5
-; RV32I-NEXT:    beqz a5, .LBB3_10
-; RV32I-NEXT:  .LBB3_9:
+; RV32I-NEXT:    beqz a5, .LBB3_12
+; RV32I-NEXT:  .LBB3_11:
 ; RV32I-NEXT:    mv a1, a0
-; RV32I-NEXT:  .LBB3_10:
+; RV32I-NEXT:  .LBB3_12:
 ; RV32I-NEXT:    or a0, a3, a4
 ; RV32I-NEXT:    or a1, a2, a1
 ; RV32I-NEXT:    ret
@@ -353,44 +352,43 @@ define i64 @rotr_64(i64 %x, i64 %y) nounwind {
 ; RV32ZBB-NEXT:    bltu a5, a4, .LBB3_2
 ; RV32ZBB-NEXT:  # %bb.1:
 ; RV32ZBB-NEXT:    srl a6, a1, a5
-; RV32ZBB-NEXT:    mv a3, a0
-; RV32ZBB-NEXT:    bnez a5, .LBB3_3
-; RV32ZBB-NEXT:    j .LBB3_4
+; RV32ZBB-NEXT:    j .LBB3_3
 ; RV32ZBB-NEXT:  .LBB3_2:
 ; RV32ZBB-NEXT:    srl a3, a0, a2
 ; RV32ZBB-NEXT:    neg a6, a5
 ; RV32ZBB-NEXT:    sll a6, a1, a6
 ; RV32ZBB-NEXT:    or a6, a3, a6
-; RV32ZBB-NEXT:    mv a3, a0
-; RV32ZBB-NEXT:    beqz a5, .LBB3_4
 ; RV32ZBB-NEXT:  .LBB3_3:
+; RV32ZBB-NEXT:    mv a3, a0
+; RV32ZBB-NEXT:    beqz a5, .LBB3_5
+; RV32ZBB-NEXT:  # %bb.4:
 ; RV32ZBB-NEXT:    mv a3, a6
-; RV32ZBB-NEXT:  .LBB3_4:
+; RV32ZBB-NEXT:  .LBB3_5:
 ; RV32ZBB-NEXT:    neg a6, a2
-; RV32ZBB-NEXT:    bltu a5, a4, .LBB3_7
-; RV32ZBB-NEXT:  # %bb.5:
+; RV32ZBB-NEXT:    bltu a5, a4, .LBB3_9
+; RV32ZBB-NEXT:  # %bb.6:
 ; RV32ZBB-NEXT:    li a2, 0
+; RV32ZBB-NEXT:  .LBB3_7:
 ; RV32ZBB-NEXT:    andi a5, a6, 63
-; RV32ZBB-NEXT:    bgeu a5, a4, .LBB3_8
-; RV32ZBB-NEXT:  .LBB3_6:
+; RV32ZBB-NEXT:    bgeu a5, a4, .LBB3_10
+; RV32ZBB-NEXT:  # %bb.8:
 ; RV32ZBB-NEXT:    sll a4, a0, a6
 ; RV32ZBB-NEXT:    neg a7, a5
 ; RV32ZBB-NEXT:    srl a0, a0, a7
 ; RV32ZBB-NEXT:    sll a6, a1, a6
 ; RV32ZBB-NEXT:    or a0, a0, a6
-; RV32ZBB-NEXT:    bnez a5, .LBB3_9
-; RV32ZBB-NEXT:    j .LBB3_10
-; RV32ZBB-NEXT:  .LBB3_7:
+; RV32ZBB-NEXT:    bnez a5, .LBB3_11
+; RV32ZBB-NEXT:    j .LBB3_12
+; RV32ZBB-NEXT:  .LBB3_9:
 ; RV32ZBB-NEXT:    srl a2, a1, a2
-; RV32ZBB-NEXT:    andi a5, a6, 63
-; RV32ZBB-NEXT:    bltu a5, a4, .LBB3_6
-; RV32ZBB-NEXT:  .LBB3_8:
+; RV32ZBB-NEXT:    j .LBB3_7
+; RV32ZBB-NEXT:  .LBB3_10:
 ; RV32ZBB-NEXT:    li a4, 0
 ; RV32ZBB-NEXT:    sll a0, a0, a5
-; RV32ZBB-NEXT:    beqz a5, .LBB3_10
-; RV32ZBB-NEXT:  .LBB3_9:
+; RV32ZBB-NEXT:    beqz a5, .LBB3_12
+; RV32ZBB-NEXT:  .LBB3_11:
 ; RV32ZBB-NEXT:    mv a1, a0
-; RV32ZBB-NEXT:  .LBB3_10:
+; RV32ZBB-NEXT:  .LBB3_12:
 ; RV32ZBB-NEXT:    or a0, a3, a4
 ; RV32ZBB-NEXT:    or a1, a2, a1
 ; RV32ZBB-NEXT:    ret
@@ -407,44 +405,43 @@ define i64 @rotr_64(i64 %x, i64 %y) nounwind {
 ; RV32XTHEADBB-NEXT:    bltu a5, a4, .LBB3_2
 ; RV32XTHEADBB-NEXT:  # %bb.1:
 ; RV32XTHEADBB-NEXT:    srl a6, a1, a5
-; RV32XTHEADBB-NEXT:    mv a3, a0
-; RV32XTHEADBB-NEXT:    bnez a5, .LBB3_3
-; RV32XTHEADBB-NEXT:    j .LBB3_4
+; RV32XTHEADBB-NEXT:    j .LBB3_3
 ; RV32XTHEADBB-NEXT:  .LBB3_2:
 ; RV32XTHEADBB-NEXT:    srl a3, a0, a2
 ; RV32XTHEADBB-NEXT:    neg a6, a5
 ; RV32XTHEADBB-NEXT:    sll a6, a1, a6
 ; RV32XTHEADBB-NEXT:    or a6, a3, a6
-; RV32XTHEADBB-NEXT:    mv a3, a0
-; RV32XTHEADBB-NEXT:    beqz a5, .LBB3_4
 ; RV32XTHEADBB-NEXT:  .LBB3_3:
+; RV32XTHEADBB-NEXT:    mv a3, a0
+; RV32XTHEADBB-NEXT:    beqz a5, .LBB3_5
+; RV32XTHEADBB-NEXT:  # %bb.4:
 ; RV32XTHEADBB-NEXT:    mv a3, a6
-; RV32XTHEADBB-NEXT:  .LBB3_4:
+; RV32XTHEADBB-NEXT:  .LBB3_5:
 ; RV32XTHEADBB-NEXT:    neg a6, a2
-; RV32XTHEADBB-NEXT:    bltu a5, a4, .LBB3_7
-; RV32XTHEADBB-NEXT:  # %bb.5:
+; RV32XTHEADBB-NEXT:    bltu a5, a4, .LBB3_9
+; RV32XTHEADBB-NEXT:  # %bb.6:
 ; RV32XTHEADBB-NEXT:    li a2, 0
+; RV32XTHEADBB-NEXT:  .LBB3_7:
 ; RV32XTHEADBB-NEXT:    andi a5, a6, 63
-; RV32XTHEADBB-NEXT:    bgeu a5, a4, .LBB3_8
-; RV32XTHEADBB-NEXT:  .LBB3_6:
+; RV32XTHEADBB-NEXT:    bgeu a5, a4, .LBB3_10
+; RV32XTHEADBB-NEXT:  # %bb.8:
 ; RV32XTHEADBB-NEXT:    sll a4, a0, a6
 ; RV32XTHEADBB-NEXT:    neg a7, a5
 ; RV32XTHEADBB-NEXT:    srl a0, a0, a7
 ; RV32XTHEADBB-NEXT:    sll a6, a1, a6
 ; RV32XTHEADBB-NEXT:    or a0, a0, a6
-; RV32XTHEADBB-NEXT:    bnez a5, .LBB3_9
-; RV32XTHEADBB-NEXT:    j .LBB3_10
-; RV32XTHEADBB-NEXT:  .LBB3_7:
+; RV32XTHEADBB-NEXT:    bnez a5, .LBB3_11
+; RV32XTHEADBB-NEXT:    j .LBB3_12
+; RV32XTHEADBB-NEXT:  .LBB3_9:
 ; RV32XTHEADBB-NEXT:    srl a2, a1, a2
-; RV32XTHEADBB-NEXT:    andi a5, a6, 63
-; RV32XTHEADBB-NEXT:    bltu a5, a4, .LBB3_6
-; RV32XTHEADBB-NEXT:  .LBB3_8:
+; RV32XTHEADBB-NEXT:    j .LBB3_7
+; RV32XTHEADBB-NEXT:  .LBB3_10:
 ; RV32XTHEADBB-NEXT:    li a4, 0
 ; RV32XTHEADBB-NEXT:    sll a0, a0, a5
-; RV32XTHEADBB-NEXT:    beqz a5, .LBB3_10
-; RV32XTHEADBB-NEXT:  .LBB3_9:
+; RV32XTHEADBB-NEXT:    beqz a5, .LBB3_12
+; RV32XTHEADBB-NEXT:  .LBB3_11:
 ; RV32XTHEADBB-NEXT:    mv a1, a0
-; RV32XTHEADBB-NEXT:  .LBB3_10:
+; RV32XTHEADBB-NEXT:  .LBB3_12:
 ; RV32XTHEADBB-NEXT:    or a0, a3, a4
 ; RV32XTHEADBB-NEXT:    or a1, a2, a1
 ; RV32XTHEADBB-NEXT:    ret
@@ -961,43 +958,42 @@ define i64 @rotl_64_mask_and_127_and_63(i64 %x, i64 %y) nounwind {
 ; RV32I-NEXT:  # %bb.1:
 ; RV32I-NEXT:    li a3, 0
 ; RV32I-NEXT:    sll a7, a0, a6
-; RV32I-NEXT:    mv a5, a1
-; RV32I-NEXT:    bnez a6, .LBB11_3
-; RV32I-NEXT:    j .LBB11_4
+; RV32I-NEXT:    j .LBB11_3
 ; RV32I-NEXT:  .LBB11_2:
 ; RV32I-NEXT:    sll a3, a0, a2
 ; RV32I-NEXT:    neg a5, a6
 ; RV32I-NEXT:    srl a5, a0, a5
 ; RV32I-NEXT:    sll a7, a1, a2
 ; RV32I-NEXT:    or a7, a5, a7
-; RV32I-NEXT:    mv a5, a1
-; RV32I-NEXT:    beqz a6, .LBB11_4
 ; RV32I-NEXT:  .LBB11_3:
+; RV32I-NEXT:    mv a5, a1
+; RV32I-NEXT:    beqz a6, .LBB11_5
+; RV32I-NEXT:  # %bb.4:
 ; RV32I-NEXT:    mv a5, a7
-; RV32I-NEXT:  .LBB11_4:
+; RV32I-NEXT:  .LBB11_5:
 ; RV32I-NEXT:    neg a2, a2
 ; RV32I-NEXT:    andi a6, a2, 63
-; RV32I-NEXT:    bltu a6, a4, .LBB11_6
-; RV32I-NEXT:  # %bb.5:
+; RV32I-NEXT:    bltu a6, a4, .LBB11_7
+; RV32I-NEXT:  # %bb.6:
 ; RV32I-NEXT:    srl a7, a1, a6
-; RV32I-NEXT:    bnez a6, .LBB11_7
-; RV32I-NEXT:    j .LBB11_8
-; RV32I-NEXT:  .LBB11_6:
+; RV32I-NEXT:    bnez a6, .LBB11_8
+; RV32I-NEXT:    j .LBB11_9
+; RV32I-NEXT:  .LBB11_7:
 ; RV32I-NEXT:    srl a7, a0, a2
 ; RV32I-NEXT:    neg t0, a6
 ; RV32I-NEXT:    sll t0, a1, t0
 ; RV32I-NEXT:    or a7, a7, t0
-; RV32I-NEXT:    beqz a6, .LBB11_8
-; RV32I-NEXT:  .LBB11_7:
-; RV32I-NEXT:    mv a0, a7
+; RV32I-NEXT:    beqz a6, .LBB11_9
 ; RV32I-NEXT:  .LBB11_8:
-; RV32I-NEXT:    bltu a6, a4, .LBB11_10
-; RV32I-NEXT:  # %bb.9:
+; RV32I-NEXT:    mv a0, a7
+; RV32I-NEXT:  .LBB11_9:
+; RV32I-NEXT:    bltu a6, a4, .LBB11_11
+; RV32I-NEXT:  # %bb.10:
 ; RV32I-NEXT:    li a1, 0
-; RV32I-NEXT:    j .LBB11_11
-; RV32I-NEXT:  .LBB11_10:
-; RV32I-NEXT:    srl a1, a1, a2
+; RV32I-NEXT:    j .LBB11_12
 ; RV32I-NEXT:  .LBB11_11:
+; RV32I-NEXT:    srl a1, a1, a2
+; RV32I-NEXT:  .LBB11_12:
 ; RV32I-NEXT:    or a0, a3, a0
 ; RV32I-NEXT:    or a1, a5, a1
 ; RV32I-NEXT:    ret
@@ -1018,43 +1014,42 @@ define i64 @rotl_64_mask_and_127_and_63(i64 %x, i64 %y) nounwind {
 ; RV32ZBB-NEXT:  # %bb.1:
 ; RV32ZBB-NEXT:    li a3, 0
 ; RV32ZBB-NEXT:    sll a7, a0, a6
-; RV32ZBB-NEXT:    mv a5, a1
-; RV32ZBB-NEXT:    bnez a6, .LBB11_3
-; RV32ZBB-NEXT:    j .LBB11_4
+; RV32ZBB-NEXT:    j .LBB11_3
 ; RV32ZBB-NEXT:  .LBB11_2:
 ; RV32ZBB-NEXT:    sll a3, a0, a2
 ; RV32ZBB-NEXT:    neg a5, a6
 ; RV32ZBB-NEXT:    srl a5, a0, a5
 ; RV32ZBB-NEXT:    sll a7, a1, a2
 ; RV32ZBB-NEXT:    or a7, a5, a7
-; RV32ZBB-NEXT:    mv a5, a1
-; RV32ZBB-NEXT:    beqz a6, .LBB11_4
 ; RV32ZBB-NEXT:  .LBB11_3:
+; RV32ZBB-NEXT:    mv a5, a1
+; RV32ZBB-NEXT:    beqz a6, .LBB11_5
+; RV32ZBB-NEXT:  # %bb.4:
 ; RV32ZBB-NEXT:    mv a5, a7
-; RV32ZBB-NEXT:  .LBB11_4:
+; RV32ZBB-NEXT:  .LBB11_5:
 ; RV32ZBB-NEXT:    neg a2, a2
 ; RV32ZBB-NEXT:    andi a6, a2, 63
-; RV32ZBB-NEXT:    bltu a6, a4, .LBB11_6
-; RV32ZBB-NEXT:  # %bb.5:
+; RV32ZBB-NEXT:    bltu a6, a4, .LBB11_7
+; RV32ZBB-NEXT:  # %bb.6:
 ; RV32ZBB-NEXT:    srl a7, a1, a6
-; RV32ZBB-NEXT:    bnez a6, .LBB11_7
-; RV32ZBB-NEXT:    j .LBB11_8
-; RV32ZBB-NEXT:  .LBB11_6:
+; RV32ZBB-NEXT:    bnez a6, .LBB11_8
+; RV32ZBB-NEXT:    j .LBB11_9
+; RV32ZBB-NEXT:  .LBB11_7:
 ; RV32ZBB-NEXT:    srl a7, a0, a2
 ; RV32ZBB-NEXT:    neg t0, a6
 ; RV32ZBB-NEXT:    sll t0, a1, t0
 ; RV32ZBB-NEXT:    or a7, a7, t0
-; RV32ZBB-NEXT:    beqz a6, .LBB11_8
-; RV32ZBB-NEXT:  .LBB11_7:
-; RV32ZBB-NEXT:    mv a0, a7
+; RV32ZBB-NEXT:    beqz a6, .LBB11_9
 ; RV32ZBB-NEXT:  .LBB11_8:
-; RV32ZBB-NEXT:    bltu a6, a4, .LBB11_10
-; RV32ZBB-NEXT:  # %bb.9:
+; RV32ZBB-NEXT:    mv a0, a7
+; RV32ZBB-NEXT:  .LBB11_9:
+; RV32ZBB-NEXT:    bltu a6, a4, .LBB11_11
+; RV32ZBB-NEXT:  # %bb.10:
 ; RV32ZBB-NEXT:    li a1, 0
-; RV32ZBB-NEXT:    j .LBB11_11
-; RV32ZBB-NEXT:  .LBB11_10:
-; RV32ZBB-NEXT:    srl a1, a1, a2
+; RV32ZBB-NEXT:    j .LBB11_12
 ; RV32ZBB-NEXT:  .LBB11_11:
+; RV32ZBB-NEXT:    srl a1, a1, a2
+; RV32ZBB-NEXT:  .LBB11_12:
 ; RV32ZBB-NEXT:    or a0, a3, a0
 ; RV32ZBB-NEXT:    or a1, a5, a1
 ; RV32ZBB-NEXT:    ret
@@ -1075,43 +1070,42 @@ define i64 @rotl_64_mask_and_127_and_63(i64 %x, i64 %y) nounwind {
 ; RV32XTHEADBB-NEXT:  # %bb.1:
 ; RV32XTHEADBB-NEXT:    li a3, 0
 ; RV32XTHEADBB-NEXT:    sll a7, a0, a6
-; RV32XTHEADBB-NEXT:    mv a5, a1
-; RV32XTHEADBB-NEXT:    bnez a6, .LBB11_3
-; RV32XTHEADBB-NEXT:    j .LBB11_4
+; RV32XTHEADBB-NEXT:    j .LBB11_3
 ; RV32XTHEADBB-NEXT:  .LBB11_2:
 ; RV32XTHEADBB-NEXT:    sll a3, a0, a2
 ; RV32XTHEADBB-NEXT:    neg a5, a6
 ; RV32XTHEADBB-NEXT:    srl a5, a0, a5
 ; RV32XTHEADBB-NEXT:    sll a7, a1, a2
 ; RV32XTHEADBB-NEXT:    or a7, a5, a7
-; RV32XTHEADBB-NEXT:    mv a5, a1
-; RV32XTHEADBB-NEXT:    beqz a6, .LBB11_4
 ; RV32XTHEADBB-NEXT:  .LBB11_3:
+; RV32XTHEADBB-NEXT:    mv a5, a1
+; RV32XTHEADBB-NEXT:    beqz a6, .LBB11_5
+; RV32XTHEADBB-NEXT:  # %bb.4:
 ; RV32XTHEADBB-NEXT:    mv a5, a7
-; RV32XTHEADBB-NEXT:  .LBB11_4:
+; RV32XTHEADBB-NEXT:  .LBB11_5:
 ; RV32XTHEADBB-NEXT:    neg a2, a2
 ; RV32XTHEADBB-NEXT:    andi a6, a2, 63
-; RV32XTHEADBB-NEXT:    bltu a6, a4, .LBB11_6
-; RV32XTHEADBB-NEXT:  # %bb.5:
+; RV32XTHEADBB-NEXT:    bltu a6, a4, .LBB11_7
+; RV32XTHEADBB-NEXT:  # %bb.6:
 ; RV32XTHEADBB-NEXT:    srl a7, a1, a6
-; RV32XTHEADBB-NEXT:    bnez a6, .LBB11_7
-; RV32XTHEADBB-NEXT:    j .LBB11_8
-; RV32XTHEADBB-NEXT:  .LBB11_6:
+; RV32XTHEADBB-NEXT:    bnez a6, .LBB11_8
+; RV32XTHEADBB-NEXT:    j .LBB11_9
+; RV32XTHEADBB-NEXT:  .LBB11_7:
 ; RV32XTHEADBB-NEXT:    srl a7, a0, a2
 ; RV32XTHEADBB-NEXT:    neg t0, a6
 ; RV32XTHEADBB-NEXT:    sll t0, a1, t0
 ; RV32XTHEADBB-NEXT:    or a7, a7, t0
-; RV32XTHEADBB-NEXT:    beqz a6, .LBB11_8
-; RV32XTHEADBB-NEXT:  .LBB11_7:
-; RV32XTHEADBB-NEXT:    mv a0, a7
+; RV32XTHEADBB-NEXT:    beqz a6, .LBB11_9
 ; RV32XTHEADBB-NEXT:  .LBB11_8:
-; RV32XTHEADBB-NEXT:    bltu a6, a4, .LBB11_10
-; RV32XTHEADBB-NEXT:  # %bb.9:
+; RV32XTHEADBB-NEXT:    mv a0, a7
+; RV32XTHEADBB-NEXT:  .LBB11_9:
+; RV32XTHEADBB-NEXT:    bltu a6, a4, .LBB11_11
+; RV32XTHEADBB-NEXT:  # %bb.10:
 ; RV32XTHEADBB-NEXT:    li a1, 0
-; RV32XTHEADBB-NEXT:    j .LBB11_11
-; RV32XTHEADBB-NEXT:  .LBB11_10:
-; RV32XTHEADBB-NEXT:    srl a1, a1, a2
+; RV32XTHEADBB-NEXT:    j .LBB11_12
 ; RV32XTHEADBB-NEXT:  .LBB11_11:
+; RV32XTHEADBB-NEXT:    srl a1, a1, a2
+; RV32XTHEADBB-NEXT:  .LBB11_12:
 ; RV32XTHEADBB-NEXT:    or a0, a3, a0
 ; RV32XTHEADBB-NEXT:    or a1, a5, a1
 ; RV32XTHEADBB-NEXT:    ret
@@ -1406,44 +1400,43 @@ define i64 @rotr_64_mask_and_127_and_63(i64 %x, i64 %y) nounwind {
 ; RV32I-NEXT:    bltu a4, a5, .LBB14_2
 ; RV32I-NEXT:  # %bb.1:
 ; RV32I-NEXT:    srl a6, a1, a4
-; RV32I-NEXT:    mv a3, a0
-; RV32I-NEXT:    bnez a4, .LBB14_3
-; RV32I-NEXT:    j .LBB14_4
+; RV32I-NEXT:    j .LBB14_3
 ; RV32I-NEXT:  .LBB14_2:
 ; RV32I-NEXT:    srl a3, a0, a2
 ; RV32I-NEXT:    neg a6, a4
 ; RV32I-NEXT:    sll a6, a1, a6
 ; RV32I-NEXT:    or a6, a3, a6
-; RV32I-NEXT:    mv a3, a0
-; RV32I-NEXT:    beqz a4, .LBB14_4
 ; RV32I-NEXT:  .LBB14_3:
+; RV32I-NEXT:    mv a3, a0
+; RV32I-NEXT:    beqz a4, .LBB14_5
+; RV32I-NEXT:  # %bb.4:
 ; RV32I-NEXT:    mv a3, a6
-; RV32I-NEXT:  .LBB14_4:
-; RV32I-NEXT:    bltu a4, a5, .LBB14_6
-; RV32I-NEXT:  # %bb.5:
+; RV32I-NEXT:  .LBB14_5:
+; RV32I-NEXT:    bltu a4, a5, .LBB14_7
+; RV32I-NEXT:  # %bb.6:
 ; RV32I-NEXT:    li a4, 0
-; RV32I-NEXT:    j .LBB14_7
-; RV32I-NEXT:  .LBB14_6:
-; RV32I-NEXT:    srl a4, a1, a2
+; RV32I-NEXT:    j .LBB14_8
 ; RV32I-NEXT:  .LBB14_7:
+; RV32I-NEXT:    srl a4, a1, a2
+; RV32I-NEXT:  .LBB14_8:
 ; RV32I-NEXT:    neg a7, a2
 ; RV32I-NEXT:    andi a6, a7, 63
-; RV32I-NEXT:    bltu a6, a5, .LBB14_9
-; RV32I-NEXT:  # %bb.8:
+; RV32I-NEXT:    bltu a6, a5, .LBB14_10
+; RV32I-NEXT:  # %bb.9:
 ; RV32I-NEXT:    li a2, 0
 ; RV32I-NEXT:    sll a0, a0, a6
-; RV32I-NEXT:    bnez a6, .LBB14_10
-; RV32I-NEXT:    j .LBB14_11
-; RV32I-NEXT:  .LBB14_9:
+; RV32I-NEXT:    bnez a6, .LBB14_11
+; RV32I-NEXT:    j .LBB14_12
+; RV32I-NEXT:  .LBB14_10:
 ; RV32I-NEXT:    sll a2, a0, a7
 ; RV32I-NEXT:    neg a5, a6
 ; RV32I-NEXT:    srl a0, a0, a5
 ; RV32I-NEXT:    sll a5, a1, a7
 ; RV32I-NEXT:    or a0, a0, a5
-; RV32I-NEXT:    beqz a6, .LBB14_11
-; RV32I-NEXT:  .LBB14_10:
-; RV32I-NEXT:    mv a1, a0
+; RV32I-NEXT:    beqz a6, .LBB14_12
 ; RV32I-NEXT:  .LBB14_11:
+; RV32I-NEXT:    mv a1, a0
+; RV32I-NEXT:  .LBB14_12:
 ; RV32I-NEXT:    or a0, a3, a2
 ; RV32I-NEXT:    or a1, a4, a1
 ; RV32I-NEXT:    ret
@@ -1463,44 +1456,43 @@ define i64 @rotr_64_mask_and_127_and_63(i64 %x, i64 %y) nounwind {
 ; RV32ZBB-NEXT:    bltu a4, a5, .LBB14_2
 ; RV32ZBB-NEXT:  # %bb.1:
 ; RV32ZBB-NEXT:    srl a6, a1, a4
-; RV32ZBB-NEXT:    mv a3, a0
-; RV32ZBB-NEXT:    bnez a4, .LBB14_3
-; RV32ZBB-NEXT:    j .LBB14_4
+; RV32ZBB-NEXT:    j .LBB14_3
 ; RV32ZBB-NEXT:  .LBB14_2:
 ; RV32ZBB-NEXT:    srl a3, a0, a2
 ; RV32ZBB-NEXT:    neg a6, a4
 ; RV32ZBB-NEXT:    sll a6, a1, a6
 ; RV32ZBB-NEXT:    or a6, a3, a6
-; RV32ZBB-NEXT:    mv a3, a0
-; RV32ZBB-NEXT:    beqz a4, .LBB14_4
 ; RV32ZBB-NEXT:  .LBB14_3:
+; RV32ZBB-NEXT:    mv a3, a0
+; RV32ZBB-NEXT:    beqz a4, .LBB14_5
+; RV32ZBB-NEXT:  # %bb.4:
 ; RV32ZBB-NEXT:    mv a3, a6
-; RV32ZBB-NEXT:  .LBB14_4:
-; RV32ZBB-NEXT:    bltu a4, a5, .LBB14_6
-; RV32ZBB-NEXT:  # %bb.5:
+; RV32ZBB-NEXT:  .LBB14_5:
+; RV32ZBB-NEXT:    bltu a4, a5, .LBB14_7
+; RV32ZBB-NEXT:  # %bb.6:
 ; RV32ZBB-NEXT:    li a4, 0
-; RV32ZBB-NEXT:    j .LBB14_7
-; RV32ZBB-NEXT:  .LBB14_6:
-; RV32ZBB-NEXT:    srl a4, a1, a2
+; RV32ZBB-NEXT:    j .LBB14_8
 ; RV32ZBB-NEXT:  .LBB14_7:
+; RV32ZBB-NEXT:    srl a4, a1, a2
+; RV32ZBB-NEXT:  .LBB14_8:
 ; RV32ZBB-NEXT:    neg a7, a2
 ; RV32ZBB-NEXT:    andi a6, a7, 63
-; RV32ZBB-NEXT:    bltu a6, a5, .LBB14_9
-; RV32ZBB-NEXT:  # %bb.8:
+; RV32ZBB-NEXT:    bltu a6, a5, .LBB14_10
+; RV32ZBB-NEXT:  # %bb.9:
 ; RV32ZBB-NEXT:    li a2, 0
 ; RV32ZBB-NEXT:    sll a0, a0, a6
-; RV32ZBB-NEXT:    bnez a6, .LBB14_10
-; RV32ZBB-NEXT:    j .LBB14_11
-; RV32ZBB-NEXT:  .LBB14_9:
+; RV32ZBB-NEXT:    bnez a6, .LBB14_11
+; RV32ZBB-NEXT:    j .LBB14_12
+; RV32ZBB-NEXT:  .LBB14_10:
 ; RV32ZBB-NEXT:    sll a2, a0, a7
 ; RV32ZBB-NEXT:    neg a5, a6
 ; RV32ZBB-NEXT:    srl a0, a0, a5
 ; RV32ZBB-NEXT:    sll a5, a1, a7
 ; RV32ZBB-NEXT:    or a0, a0, a5
-; RV32ZBB-NEXT:    beqz a6, .LBB14_11
-; RV32ZBB-NEXT:  .LBB14_10:
-; RV32ZBB-NEXT:    mv a1, a0
+; RV32ZBB-NEXT:    beqz a6, .LBB14_12
 ; RV32ZBB-NEXT:  .LBB14_11:
+; RV32ZBB-NEXT:    mv a1, a0
+; RV32ZBB-NEXT:  .LBB14_12:
 ; RV32ZBB-NEXT:    or a0, a3, a2
 ; RV32ZBB-NEXT:    or a1, a4, a1
 ; RV32ZBB-NEXT:    ret
@@ -1520,44 +1512,43 @@ define i64 @rotr_64_mask_and_127_and_63(i64 %x, i64 %y) nounwind {
 ; RV32XTHEADBB-NEXT:    bltu a4, a5, .LBB14_2
 ; RV32XTHEADBB-NEXT:  # %bb.1:
 ; RV32XTHEADBB-NEXT:    srl a6, a1, a4
-; RV32XTHEADBB-NEXT:    mv a3, a0
-; RV32XTHEADBB-NEXT:    bnez a4, .LBB14_3
-; RV32XTHEADBB-NEXT:    j .LBB14_4
+; RV32XTHEADBB-NEXT:    j .LBB14_3
 ; RV32XTHEADBB-NEXT:  .LBB14_2:
 ; RV32XTHEADBB-NEXT:    srl a3, a0, a2
 ; RV32XTHEADBB-NEXT:    neg a6, a4
 ; RV32XTHEADBB-NEXT:    sll a6, a1, a6
 ; RV32XTHEADBB-NEXT:    or a6, a3, a6
-; RV32XTHEADBB-NEXT:    mv a3, a0
-; RV32XTHEADBB-NEXT:    beqz a4, .LBB14_4
 ; RV32XTHEADBB-NEXT:  .LBB14_3:
+; RV32XTHEADBB-NEXT:    mv a3, a0
+; RV32XTHEADBB-NEXT:    beqz a4, .LBB14_5
+; RV32XTHEADBB-NEXT:  # %bb.4:
 ; RV32XTHEADBB-NEXT:    mv a3, a6
-; RV32XTHEADBB-NEXT:  .LBB14_4:
-; RV32XTHEADBB-NEXT:    bltu a4, a5, .LBB14_6
-; RV32XTHEADBB-NEXT:  # %bb.5:
+; RV32XTHEADBB-NEXT:  .LBB14_5:
+; RV32XTHEADBB-NEXT:    bltu a4, a5, .LBB14_7
+; RV32XTHEADBB-NEXT:  # %bb.6:
 ; RV32XTHEADBB-NEXT:    li a4, 0
-; RV32XTHEADBB-NEXT:    j .LBB14_7
-; RV32XTHEADBB-NEXT:  .LBB14_6:
-; RV32XTHEADBB-NEXT:    srl a4, a1, a2
+; RV32XTHEADBB-NEXT:    j .LBB14_8
 ; RV32XTHEADBB-NEXT:  .LBB14_7:
+; RV32XTHEADBB-NEXT:    srl a4, a1, a2
+; RV32XTHEADBB-NEXT:  .LBB14_8:
 ; RV32XTHEADBB-NEXT:    neg a7, a2
 ; RV32XTHEADBB-NEXT:    andi a6, a7, 63
-; RV32XTHEADBB-NEXT:    bltu a6, a5, .LBB14_9
-; RV32XTHEADBB-NEXT:  # %bb.8:
+; RV32XTHEADBB-NEXT:    bltu a6, a5, .LBB14_10
+; RV32XTHEADBB-NEXT:  # %bb.9:
 ; RV32XTHEADBB-NEXT:    li a2, 0
 ; RV32XTHEADBB-NEXT:    sll a0, a0, a6
-; RV32XTHEADBB-NEXT:    bnez a6, .LBB14_10
-; RV32XTHEADBB-NEXT:    j .LBB14_11
-; RV32XTHEADBB-NEXT:  .LBB14_9:
+; RV32XTHEADBB-NEXT:    bnez a6, .LBB14_11
+; RV32XTHEADBB-NEXT:    j .LBB14_12
+; RV32XTHEADBB-NEXT:  .LBB14_10:
 ; RV32XTHEADBB-NEXT:    sll a2, a0, a7
 ; RV32XTHEADBB-NEXT:    neg a5, a6
 ; RV32XTHEADBB-NEXT:    srl a0, a0, a5
 ; RV32XTHEADBB-NEXT:    sll a5, a1, a7
 ; RV32XTHEADBB-NEXT:    or a0, a0, a5
-; RV32XTHEADBB-NEXT:    beqz a6, .LBB14_11
-; RV32XTHEADBB-NEXT:  .LBB14_10:
-; RV32XTHEADBB-NEXT:    mv a1, a0
+; RV32XTHEADBB-NEXT:    beqz a6, .LBB14_12
 ; RV32XTHEADBB-NEXT:  .LBB14_11:
+; RV32XTHEADBB-NEXT:    mv a1, a0
+; RV32XTHEADBB-NEXT:  .LBB14_12:
 ; RV32XTHEADBB-NEXT:    or a0, a3, a2
 ; RV32XTHEADBB-NEXT:    or a1, a4, a1
 ; RV32XTHEADBB-NEXT:    ret
@@ -2061,60 +2052,59 @@ define signext i64 @rotr_64_mask_shared(i64 signext %a, i64 signext %b, i64 sign
 ; RV32I-NEXT:    bltu a5, t0, .LBB19_2
 ; RV32I-NEXT:  # %bb.1:
 ; RV32I-NEXT:    srl t1, a1, a5
-; RV32I-NEXT:    mv a7, a0
-; RV32I-NEXT:    bnez a5, .LBB19_3
-; RV32I-NEXT:    j .LBB19_4
+; RV32I-NEXT:    j .LBB19_3
 ; RV32I-NEXT:  .LBB19_2:
 ; RV32I-NEXT:    srl a7, a0, a4
 ; RV32I-NEXT:    sll t1, ...
[truncated]

@mikhailramalho
Copy link
Member Author

The results mentioned on the PR are available here: https://lnt.lukelau.me/db_default/v4/nts/380?compare_to=379

Copy link
Contributor

@wangpc-pp wangpc-pp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! What about the compile-time impact?

@@ -570,6 +570,7 @@ void RISCVPassConfig::addPreEmitPass() {
addPass(createMachineCopyPropagationPass(true));
if (TM->getOptLevel() >= CodeGenOptLevel::Default)
addPass(createRISCVLateBranchOptPass());
addPass(&BranchFolderPassID);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should have run already? I'd expect a "late branch opt" pass to avoid introducing new constructs that need cleanup

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The late branch optimisation pass is actually making it easier for branch folding to clean up things that copy propagation and other prior passes have introduced.

; PROP-NEXT: li a0, 12
; PROP-NEXT: sw a0, 0(a4)
; PROP-NEXT: ret
; PROP-NEXT: j .LBB0_3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this undoing tail duplication?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this end up happening? I thought this pass ran after tail duplication

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the things the branch folding pass does is merging common tails. Tail duplication explicitly creates common tails to reduce the number branch instructions that will be executed. If you run branch folding after tail duplication it will undo explicit tail duplication.

@topperc
Copy link
Collaborator

topperc commented Apr 8, 2025

At least one of these cases is undoing tail duplication. Do you have dynamic instruction count numbers?

@mikhailramalho
Copy link
Member Author

At least one of these cases is undoing tail duplication. Do you have dynamic instruction count numbers?

Here are the numbers:

Benchmark                       DirA            DirB   Diff (%)
============================================================
500.perlbench_r         165826196167    166357789074      0.32%
502.gcc_r               219399420992    219803323636      0.18%
505.mcf_r               156797058571    157114468160      0.20%
508.namd_r              217322414517    217333387648      0.01%
510.parest_r            143264868963    143516815618      0.18%
511.povray_r             30514506955     30467500922     -0.15%
519.lbm_r                95279677707     95782257905      0.53%
520.omnetpp_r           141462225575    141696999927      0.17%
523.xalancbmk_r         282306179988    284057163151      0.62%
525.x264_r              183580253187    183653095365      0.04%
531.deepsjeng_r         350577788863    351478176840      0.26%
538.imagick_r           238460281809    247684328639      3.87%
541.leela_r             405842310516    406313280179      0.12%
544.nab_r               396838566663    398308105312      0.37%
557.xz_r                124382811159    124798965273      0.33%

DirA is the main branch, commit 65813e0
DirB is this PR

Blender is not there because it timed out on the main branch. I'm re-running it now.

The numbers increased pretty much everywhere. Maybe I should get only the code that removes the dead blocks running? I see you can disable tail merging on the branch folding pass.

@@ -9,21 +9,11 @@ define i16 @f() {
; CHECK: # %bb.0: # %BB
; CHECK-NEXT: addi sp, sp, -16
; CHECK-NEXT: .cfi_def_cfa_offset 16
; CHECK-NEXT: j .LBB0_1
; CHECK-NEXT: .LBB0_1: # %BB1
; CHECK-NEXT: # =>This Inner Loop Header: Depth=1
; CHECK-NEXT: li a0, 0
; CHECK-NEXT: sd a0, 8(sp) # 8-byte Folded Spill
; CHECK-NEXT: j .LBB0_1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@preames I saw you added this test case to check for a live interval crash when it runs in unreachable blocks, but if this patch lands, the unreachable block is gone. Should I try to rework the test? Maybe disable branch folding via the cmd line

@mikhailramalho
Copy link
Member Author

Updated numbers with blender:

Benchmark                       DirA            DirB   Diff (%)
============================================================
500.perlbench_r         165826196167    166357789074      0.32%
502.gcc_r               219399420992    219803323636      0.18%
505.mcf_r               156797058571    157114468160      0.20%
508.namd_r              217322414517    217333387648      0.01%
510.parest_r            143264868963    143516815618      0.18%
511.povray_r             30514506955     30467500922     -0.15%
519.lbm_r                95279677707     95782257905      0.53%
520.omnetpp_r           141462225575    141696999927      0.17%
523.xalancbmk_r         282306179988    284057163151      0.62%
525.x264_r              183580253187    183653095365      0.04%
526.blender_r           660276542155    660835589453      0.08%
531.deepsjeng_r         350577788863    351478176840      0.26%
538.imagick_r           238460281809    247684328639      3.87%
541.leela_r             405842310516    406313280179      0.12%
544.nab_r               396838566663    398308105312      0.37%
557.xz_r                124382811159    124798965273      0.33%

…e tail merge

Signed-off-by: Mikhail R. Gadelha <mikhail@igalia.com>
Signed-off-by: Mikhail R. Gadelha <mikhail@igalia.com>
Signed-off-by: Mikhail R. Gadelha <mikhail@igalia.com>
@mikhailramalho
Copy link
Member Author

Folks, I've updated the PR with an option to disable tail merging when calling the branch folding pass.

This required me to create a createBranchFolderPass (similar to PR #128858); I can submit a separate PR for this change if you think it's necessary.

I'll run SPEC now and report the results once it's done.

@mikhailramalho
Copy link
Member Author

This looks great! What about the compile-time impact?

Sorry I missed your message. I'll collect the data and post it soon.

Comment on lines 157 to +159
bool EnableTailMerge = !MF.getTarget().requiresStructuredCFG() &&
PassConfig->getEnableTailMerge();
PassConfig->getEnableTailMerge() &&
this->EnableTailMerge;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This global state should be moved to only change the pass parameter

@@ -90,10 +90,13 @@ namespace {

/// BranchFolderPass - Wrap branch folder in a machine function pass.
class BranchFolderLegacy : public MachineFunctionPass {
bool EnableTailMerge;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing new pass manager handling (also should handle print/parse of the pass parameter). Also this should be done as a separate step

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the new pass manager part covered by #128858?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That added the initial port to the new pm. This is now changing the pass arguments in the old PM, without the matching new PM change

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I'll work on it

@mikhailramalho
Copy link
Member Author

Results from the latest version: https://lnt.lukelau.me/db_default/v4/nts/437?show_delta=yes&show_previous=yes&show_stddev=yes&show_mad=yes&show_all=yes&show_all_samples=yes&show_sample_counts=yes&show_small_diff=yes&num_comparison_runs=0&test_filter=&test_min_value_filter=&aggregation_fn=min&MW_confidence_lv=0.05&compare_to=435&submit=Update

It shows an improvement on mcf and omnetpp, but I believe it's the usual variance of these benchmarks, not the result of the patch.

There is a considerably smaller code size change, but it's still there.

There is a slight reduction in the dynamic instruction count. Blender is not included since it keeps timing out on qemu for me, on both main and the PR branch:

Benchmark                       DirA            DirB   Diff (%)
============================================================
500.perlbench_r         165826196163    165450563224     -0.23%
502.gcc_r               219399420992    219237589152     -0.07%
505.mcf_r               156797058570    156780899014     -0.01%
508.namd_r              217322414517    217321947075     -0.00%
510.parest_r            143264868783    143229146094     -0.02%
511.povray_r             30514993260     30514218533     -0.00%
519.lbm_r                95279677706     95279677700     -0.00%
520.omnetpp_r           141462225575    141465993056      0.00%
523.xalancbmk_r         282306179987    282303679984     -0.00%
525.x264_r              183580253185    183577348789     -0.00%
531.deepsjeng_r         350577788862    350508921370     -0.02%
538.imagick_r           238460281807    238460282041      0.00%
541.leela_r             405842310517    405842154288     -0.00%
544.nab_r               396838566662    396837264561     -0.00%
557.xz_r                124382811159    124223819665     -0.13%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants