-
Notifications
You must be signed in to change notification settings - Fork 13.3k
[RISCV] Add branch folding before branch relaxation #134760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[RISCV] Add branch folding before branch relaxation #134760
Conversation
This is a follow-up patch to PR llvm#133256. This patch adds the branch folding pass after the newly added late optimization pass for riscv, which reduces code size in all SPEC benchmarks (except libm). The improvements are: 500.perlbench_r (-3.37%), 544.nab_r (-3.06%), 557.xz_r (-2.82%), 523.xalancbmk_r (-2.64%), 520.omnetpp_r (-2.34%), 531.deepsjeng_r (-2.27%), 502.gcc_r (-2.19%), 526.blender_r (-2.11%), 538.imagick_r (-2.03%), 505.mcf_r (-1.82%), 541.leela_r (-1.74%), 511.povray_r (-1.62%), 510.parest_r (-1.62%), 508.namd_r (-1.57%), 525.x264_r (-1.47%). Geo mean is -2.07%. Some caveats: * On llvm#131728 I mentioned a 7% improvement on execution time of xz, but that's no longer the case. I went back and also tried to reproduce the result with the code from llvm#131728 and couldn't. Now the results from that PR and this one are the same: an overall code size reduction but no exec time improvements. * The root cause of the large number is not yet clear for me. I'm still investigating it.
Signed-off-by: Mikhail R. Gadelha <mikhail@igalia.com>
Signed-off-by: Mikhail R. Gadelha <mikhail@igalia.com>
@llvm/pr-subscribers-backend-risc-v @llvm/pr-subscribers-llvm-transforms Author: Mikhail R. Gadelha (mikhailramalho) ChangesThis is a follow-up patch to PR #133256. This patch adds the branch folding pass after the newly added late optimization pass for riscv, which reduces code size in all SPEC benchmarks except for libm. The improvements are: 500.perlbench_r (-3.37%), 544.nab_r (-3.06%), 557.xz_r (-2.82%), 523.xalancbmk_r (-2.64%), 520.omnetpp_r (-2.34%), 531.deepsjeng_r (-2.27%), 502.gcc_r (-2.19%), 526.blender_r (-2.11%), 538.imagick_r (-2.03%), 505.mcf_r (-1.82%), 541.leela_r (-1.74%), 511.povray_r (-1.62%), 510.parest_r (-1.62%), 508.namd_r (-1.57%), 525.x264_r (-1.47%). The geo mean is -2.07%. Some caveats:
Patch is 1.37 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/134760.diff 50 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
index 7fb64be3975d5..63bd0f4c20497 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
@@ -570,6 +570,7 @@ void RISCVPassConfig::addPreEmitPass() {
addPass(createMachineCopyPropagationPass(true));
if (TM->getOptLevel() >= CodeGenOptLevel::Default)
addPass(createRISCVLateBranchOptPass());
+ addPass(&BranchFolderPassID);
addPass(&BranchRelaxationPassID);
addPass(createRISCVMakeCompressibleOptPass());
}
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/rotl-rotr.ll b/llvm/test/CodeGen/RISCV/GlobalISel/rotl-rotr.ll
index 8a786fc9993d2..da8678f9a9916 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/rotl-rotr.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/rotl-rotr.ll
@@ -296,44 +296,43 @@ define i64 @rotr_64(i64 %x, i64 %y) nounwind {
; RV32I-NEXT: bltu a5, a4, .LBB3_2
; RV32I-NEXT: # %bb.1:
; RV32I-NEXT: srl a6, a1, a5
-; RV32I-NEXT: mv a3, a0
-; RV32I-NEXT: bnez a5, .LBB3_3
-; RV32I-NEXT: j .LBB3_4
+; RV32I-NEXT: j .LBB3_3
; RV32I-NEXT: .LBB3_2:
; RV32I-NEXT: srl a3, a0, a2
; RV32I-NEXT: neg a6, a5
; RV32I-NEXT: sll a6, a1, a6
; RV32I-NEXT: or a6, a3, a6
-; RV32I-NEXT: mv a3, a0
-; RV32I-NEXT: beqz a5, .LBB3_4
; RV32I-NEXT: .LBB3_3:
+; RV32I-NEXT: mv a3, a0
+; RV32I-NEXT: beqz a5, .LBB3_5
+; RV32I-NEXT: # %bb.4:
; RV32I-NEXT: mv a3, a6
-; RV32I-NEXT: .LBB3_4:
+; RV32I-NEXT: .LBB3_5:
; RV32I-NEXT: neg a6, a2
-; RV32I-NEXT: bltu a5, a4, .LBB3_7
-; RV32I-NEXT: # %bb.5:
+; RV32I-NEXT: bltu a5, a4, .LBB3_9
+; RV32I-NEXT: # %bb.6:
; RV32I-NEXT: li a2, 0
+; RV32I-NEXT: .LBB3_7:
; RV32I-NEXT: andi a5, a6, 63
-; RV32I-NEXT: bgeu a5, a4, .LBB3_8
-; RV32I-NEXT: .LBB3_6:
+; RV32I-NEXT: bgeu a5, a4, .LBB3_10
+; RV32I-NEXT: # %bb.8:
; RV32I-NEXT: sll a4, a0, a6
; RV32I-NEXT: neg a7, a5
; RV32I-NEXT: srl a0, a0, a7
; RV32I-NEXT: sll a6, a1, a6
; RV32I-NEXT: or a0, a0, a6
-; RV32I-NEXT: bnez a5, .LBB3_9
-; RV32I-NEXT: j .LBB3_10
-; RV32I-NEXT: .LBB3_7:
+; RV32I-NEXT: bnez a5, .LBB3_11
+; RV32I-NEXT: j .LBB3_12
+; RV32I-NEXT: .LBB3_9:
; RV32I-NEXT: srl a2, a1, a2
-; RV32I-NEXT: andi a5, a6, 63
-; RV32I-NEXT: bltu a5, a4, .LBB3_6
-; RV32I-NEXT: .LBB3_8:
+; RV32I-NEXT: j .LBB3_7
+; RV32I-NEXT: .LBB3_10:
; RV32I-NEXT: li a4, 0
; RV32I-NEXT: sll a0, a0, a5
-; RV32I-NEXT: beqz a5, .LBB3_10
-; RV32I-NEXT: .LBB3_9:
+; RV32I-NEXT: beqz a5, .LBB3_12
+; RV32I-NEXT: .LBB3_11:
; RV32I-NEXT: mv a1, a0
-; RV32I-NEXT: .LBB3_10:
+; RV32I-NEXT: .LBB3_12:
; RV32I-NEXT: or a0, a3, a4
; RV32I-NEXT: or a1, a2, a1
; RV32I-NEXT: ret
@@ -353,44 +352,43 @@ define i64 @rotr_64(i64 %x, i64 %y) nounwind {
; RV32ZBB-NEXT: bltu a5, a4, .LBB3_2
; RV32ZBB-NEXT: # %bb.1:
; RV32ZBB-NEXT: srl a6, a1, a5
-; RV32ZBB-NEXT: mv a3, a0
-; RV32ZBB-NEXT: bnez a5, .LBB3_3
-; RV32ZBB-NEXT: j .LBB3_4
+; RV32ZBB-NEXT: j .LBB3_3
; RV32ZBB-NEXT: .LBB3_2:
; RV32ZBB-NEXT: srl a3, a0, a2
; RV32ZBB-NEXT: neg a6, a5
; RV32ZBB-NEXT: sll a6, a1, a6
; RV32ZBB-NEXT: or a6, a3, a6
-; RV32ZBB-NEXT: mv a3, a0
-; RV32ZBB-NEXT: beqz a5, .LBB3_4
; RV32ZBB-NEXT: .LBB3_3:
+; RV32ZBB-NEXT: mv a3, a0
+; RV32ZBB-NEXT: beqz a5, .LBB3_5
+; RV32ZBB-NEXT: # %bb.4:
; RV32ZBB-NEXT: mv a3, a6
-; RV32ZBB-NEXT: .LBB3_4:
+; RV32ZBB-NEXT: .LBB3_5:
; RV32ZBB-NEXT: neg a6, a2
-; RV32ZBB-NEXT: bltu a5, a4, .LBB3_7
-; RV32ZBB-NEXT: # %bb.5:
+; RV32ZBB-NEXT: bltu a5, a4, .LBB3_9
+; RV32ZBB-NEXT: # %bb.6:
; RV32ZBB-NEXT: li a2, 0
+; RV32ZBB-NEXT: .LBB3_7:
; RV32ZBB-NEXT: andi a5, a6, 63
-; RV32ZBB-NEXT: bgeu a5, a4, .LBB3_8
-; RV32ZBB-NEXT: .LBB3_6:
+; RV32ZBB-NEXT: bgeu a5, a4, .LBB3_10
+; RV32ZBB-NEXT: # %bb.8:
; RV32ZBB-NEXT: sll a4, a0, a6
; RV32ZBB-NEXT: neg a7, a5
; RV32ZBB-NEXT: srl a0, a0, a7
; RV32ZBB-NEXT: sll a6, a1, a6
; RV32ZBB-NEXT: or a0, a0, a6
-; RV32ZBB-NEXT: bnez a5, .LBB3_9
-; RV32ZBB-NEXT: j .LBB3_10
-; RV32ZBB-NEXT: .LBB3_7:
+; RV32ZBB-NEXT: bnez a5, .LBB3_11
+; RV32ZBB-NEXT: j .LBB3_12
+; RV32ZBB-NEXT: .LBB3_9:
; RV32ZBB-NEXT: srl a2, a1, a2
-; RV32ZBB-NEXT: andi a5, a6, 63
-; RV32ZBB-NEXT: bltu a5, a4, .LBB3_6
-; RV32ZBB-NEXT: .LBB3_8:
+; RV32ZBB-NEXT: j .LBB3_7
+; RV32ZBB-NEXT: .LBB3_10:
; RV32ZBB-NEXT: li a4, 0
; RV32ZBB-NEXT: sll a0, a0, a5
-; RV32ZBB-NEXT: beqz a5, .LBB3_10
-; RV32ZBB-NEXT: .LBB3_9:
+; RV32ZBB-NEXT: beqz a5, .LBB3_12
+; RV32ZBB-NEXT: .LBB3_11:
; RV32ZBB-NEXT: mv a1, a0
-; RV32ZBB-NEXT: .LBB3_10:
+; RV32ZBB-NEXT: .LBB3_12:
; RV32ZBB-NEXT: or a0, a3, a4
; RV32ZBB-NEXT: or a1, a2, a1
; RV32ZBB-NEXT: ret
@@ -407,44 +405,43 @@ define i64 @rotr_64(i64 %x, i64 %y) nounwind {
; RV32XTHEADBB-NEXT: bltu a5, a4, .LBB3_2
; RV32XTHEADBB-NEXT: # %bb.1:
; RV32XTHEADBB-NEXT: srl a6, a1, a5
-; RV32XTHEADBB-NEXT: mv a3, a0
-; RV32XTHEADBB-NEXT: bnez a5, .LBB3_3
-; RV32XTHEADBB-NEXT: j .LBB3_4
+; RV32XTHEADBB-NEXT: j .LBB3_3
; RV32XTHEADBB-NEXT: .LBB3_2:
; RV32XTHEADBB-NEXT: srl a3, a0, a2
; RV32XTHEADBB-NEXT: neg a6, a5
; RV32XTHEADBB-NEXT: sll a6, a1, a6
; RV32XTHEADBB-NEXT: or a6, a3, a6
-; RV32XTHEADBB-NEXT: mv a3, a0
-; RV32XTHEADBB-NEXT: beqz a5, .LBB3_4
; RV32XTHEADBB-NEXT: .LBB3_3:
+; RV32XTHEADBB-NEXT: mv a3, a0
+; RV32XTHEADBB-NEXT: beqz a5, .LBB3_5
+; RV32XTHEADBB-NEXT: # %bb.4:
; RV32XTHEADBB-NEXT: mv a3, a6
-; RV32XTHEADBB-NEXT: .LBB3_4:
+; RV32XTHEADBB-NEXT: .LBB3_5:
; RV32XTHEADBB-NEXT: neg a6, a2
-; RV32XTHEADBB-NEXT: bltu a5, a4, .LBB3_7
-; RV32XTHEADBB-NEXT: # %bb.5:
+; RV32XTHEADBB-NEXT: bltu a5, a4, .LBB3_9
+; RV32XTHEADBB-NEXT: # %bb.6:
; RV32XTHEADBB-NEXT: li a2, 0
+; RV32XTHEADBB-NEXT: .LBB3_7:
; RV32XTHEADBB-NEXT: andi a5, a6, 63
-; RV32XTHEADBB-NEXT: bgeu a5, a4, .LBB3_8
-; RV32XTHEADBB-NEXT: .LBB3_6:
+; RV32XTHEADBB-NEXT: bgeu a5, a4, .LBB3_10
+; RV32XTHEADBB-NEXT: # %bb.8:
; RV32XTHEADBB-NEXT: sll a4, a0, a6
; RV32XTHEADBB-NEXT: neg a7, a5
; RV32XTHEADBB-NEXT: srl a0, a0, a7
; RV32XTHEADBB-NEXT: sll a6, a1, a6
; RV32XTHEADBB-NEXT: or a0, a0, a6
-; RV32XTHEADBB-NEXT: bnez a5, .LBB3_9
-; RV32XTHEADBB-NEXT: j .LBB3_10
-; RV32XTHEADBB-NEXT: .LBB3_7:
+; RV32XTHEADBB-NEXT: bnez a5, .LBB3_11
+; RV32XTHEADBB-NEXT: j .LBB3_12
+; RV32XTHEADBB-NEXT: .LBB3_9:
; RV32XTHEADBB-NEXT: srl a2, a1, a2
-; RV32XTHEADBB-NEXT: andi a5, a6, 63
-; RV32XTHEADBB-NEXT: bltu a5, a4, .LBB3_6
-; RV32XTHEADBB-NEXT: .LBB3_8:
+; RV32XTHEADBB-NEXT: j .LBB3_7
+; RV32XTHEADBB-NEXT: .LBB3_10:
; RV32XTHEADBB-NEXT: li a4, 0
; RV32XTHEADBB-NEXT: sll a0, a0, a5
-; RV32XTHEADBB-NEXT: beqz a5, .LBB3_10
-; RV32XTHEADBB-NEXT: .LBB3_9:
+; RV32XTHEADBB-NEXT: beqz a5, .LBB3_12
+; RV32XTHEADBB-NEXT: .LBB3_11:
; RV32XTHEADBB-NEXT: mv a1, a0
-; RV32XTHEADBB-NEXT: .LBB3_10:
+; RV32XTHEADBB-NEXT: .LBB3_12:
; RV32XTHEADBB-NEXT: or a0, a3, a4
; RV32XTHEADBB-NEXT: or a1, a2, a1
; RV32XTHEADBB-NEXT: ret
@@ -961,43 +958,42 @@ define i64 @rotl_64_mask_and_127_and_63(i64 %x, i64 %y) nounwind {
; RV32I-NEXT: # %bb.1:
; RV32I-NEXT: li a3, 0
; RV32I-NEXT: sll a7, a0, a6
-; RV32I-NEXT: mv a5, a1
-; RV32I-NEXT: bnez a6, .LBB11_3
-; RV32I-NEXT: j .LBB11_4
+; RV32I-NEXT: j .LBB11_3
; RV32I-NEXT: .LBB11_2:
; RV32I-NEXT: sll a3, a0, a2
; RV32I-NEXT: neg a5, a6
; RV32I-NEXT: srl a5, a0, a5
; RV32I-NEXT: sll a7, a1, a2
; RV32I-NEXT: or a7, a5, a7
-; RV32I-NEXT: mv a5, a1
-; RV32I-NEXT: beqz a6, .LBB11_4
; RV32I-NEXT: .LBB11_3:
+; RV32I-NEXT: mv a5, a1
+; RV32I-NEXT: beqz a6, .LBB11_5
+; RV32I-NEXT: # %bb.4:
; RV32I-NEXT: mv a5, a7
-; RV32I-NEXT: .LBB11_4:
+; RV32I-NEXT: .LBB11_5:
; RV32I-NEXT: neg a2, a2
; RV32I-NEXT: andi a6, a2, 63
-; RV32I-NEXT: bltu a6, a4, .LBB11_6
-; RV32I-NEXT: # %bb.5:
+; RV32I-NEXT: bltu a6, a4, .LBB11_7
+; RV32I-NEXT: # %bb.6:
; RV32I-NEXT: srl a7, a1, a6
-; RV32I-NEXT: bnez a6, .LBB11_7
-; RV32I-NEXT: j .LBB11_8
-; RV32I-NEXT: .LBB11_6:
+; RV32I-NEXT: bnez a6, .LBB11_8
+; RV32I-NEXT: j .LBB11_9
+; RV32I-NEXT: .LBB11_7:
; RV32I-NEXT: srl a7, a0, a2
; RV32I-NEXT: neg t0, a6
; RV32I-NEXT: sll t0, a1, t0
; RV32I-NEXT: or a7, a7, t0
-; RV32I-NEXT: beqz a6, .LBB11_8
-; RV32I-NEXT: .LBB11_7:
-; RV32I-NEXT: mv a0, a7
+; RV32I-NEXT: beqz a6, .LBB11_9
; RV32I-NEXT: .LBB11_8:
-; RV32I-NEXT: bltu a6, a4, .LBB11_10
-; RV32I-NEXT: # %bb.9:
+; RV32I-NEXT: mv a0, a7
+; RV32I-NEXT: .LBB11_9:
+; RV32I-NEXT: bltu a6, a4, .LBB11_11
+; RV32I-NEXT: # %bb.10:
; RV32I-NEXT: li a1, 0
-; RV32I-NEXT: j .LBB11_11
-; RV32I-NEXT: .LBB11_10:
-; RV32I-NEXT: srl a1, a1, a2
+; RV32I-NEXT: j .LBB11_12
; RV32I-NEXT: .LBB11_11:
+; RV32I-NEXT: srl a1, a1, a2
+; RV32I-NEXT: .LBB11_12:
; RV32I-NEXT: or a0, a3, a0
; RV32I-NEXT: or a1, a5, a1
; RV32I-NEXT: ret
@@ -1018,43 +1014,42 @@ define i64 @rotl_64_mask_and_127_and_63(i64 %x, i64 %y) nounwind {
; RV32ZBB-NEXT: # %bb.1:
; RV32ZBB-NEXT: li a3, 0
; RV32ZBB-NEXT: sll a7, a0, a6
-; RV32ZBB-NEXT: mv a5, a1
-; RV32ZBB-NEXT: bnez a6, .LBB11_3
-; RV32ZBB-NEXT: j .LBB11_4
+; RV32ZBB-NEXT: j .LBB11_3
; RV32ZBB-NEXT: .LBB11_2:
; RV32ZBB-NEXT: sll a3, a0, a2
; RV32ZBB-NEXT: neg a5, a6
; RV32ZBB-NEXT: srl a5, a0, a5
; RV32ZBB-NEXT: sll a7, a1, a2
; RV32ZBB-NEXT: or a7, a5, a7
-; RV32ZBB-NEXT: mv a5, a1
-; RV32ZBB-NEXT: beqz a6, .LBB11_4
; RV32ZBB-NEXT: .LBB11_3:
+; RV32ZBB-NEXT: mv a5, a1
+; RV32ZBB-NEXT: beqz a6, .LBB11_5
+; RV32ZBB-NEXT: # %bb.4:
; RV32ZBB-NEXT: mv a5, a7
-; RV32ZBB-NEXT: .LBB11_4:
+; RV32ZBB-NEXT: .LBB11_5:
; RV32ZBB-NEXT: neg a2, a2
; RV32ZBB-NEXT: andi a6, a2, 63
-; RV32ZBB-NEXT: bltu a6, a4, .LBB11_6
-; RV32ZBB-NEXT: # %bb.5:
+; RV32ZBB-NEXT: bltu a6, a4, .LBB11_7
+; RV32ZBB-NEXT: # %bb.6:
; RV32ZBB-NEXT: srl a7, a1, a6
-; RV32ZBB-NEXT: bnez a6, .LBB11_7
-; RV32ZBB-NEXT: j .LBB11_8
-; RV32ZBB-NEXT: .LBB11_6:
+; RV32ZBB-NEXT: bnez a6, .LBB11_8
+; RV32ZBB-NEXT: j .LBB11_9
+; RV32ZBB-NEXT: .LBB11_7:
; RV32ZBB-NEXT: srl a7, a0, a2
; RV32ZBB-NEXT: neg t0, a6
; RV32ZBB-NEXT: sll t0, a1, t0
; RV32ZBB-NEXT: or a7, a7, t0
-; RV32ZBB-NEXT: beqz a6, .LBB11_8
-; RV32ZBB-NEXT: .LBB11_7:
-; RV32ZBB-NEXT: mv a0, a7
+; RV32ZBB-NEXT: beqz a6, .LBB11_9
; RV32ZBB-NEXT: .LBB11_8:
-; RV32ZBB-NEXT: bltu a6, a4, .LBB11_10
-; RV32ZBB-NEXT: # %bb.9:
+; RV32ZBB-NEXT: mv a0, a7
+; RV32ZBB-NEXT: .LBB11_9:
+; RV32ZBB-NEXT: bltu a6, a4, .LBB11_11
+; RV32ZBB-NEXT: # %bb.10:
; RV32ZBB-NEXT: li a1, 0
-; RV32ZBB-NEXT: j .LBB11_11
-; RV32ZBB-NEXT: .LBB11_10:
-; RV32ZBB-NEXT: srl a1, a1, a2
+; RV32ZBB-NEXT: j .LBB11_12
; RV32ZBB-NEXT: .LBB11_11:
+; RV32ZBB-NEXT: srl a1, a1, a2
+; RV32ZBB-NEXT: .LBB11_12:
; RV32ZBB-NEXT: or a0, a3, a0
; RV32ZBB-NEXT: or a1, a5, a1
; RV32ZBB-NEXT: ret
@@ -1075,43 +1070,42 @@ define i64 @rotl_64_mask_and_127_and_63(i64 %x, i64 %y) nounwind {
; RV32XTHEADBB-NEXT: # %bb.1:
; RV32XTHEADBB-NEXT: li a3, 0
; RV32XTHEADBB-NEXT: sll a7, a0, a6
-; RV32XTHEADBB-NEXT: mv a5, a1
-; RV32XTHEADBB-NEXT: bnez a6, .LBB11_3
-; RV32XTHEADBB-NEXT: j .LBB11_4
+; RV32XTHEADBB-NEXT: j .LBB11_3
; RV32XTHEADBB-NEXT: .LBB11_2:
; RV32XTHEADBB-NEXT: sll a3, a0, a2
; RV32XTHEADBB-NEXT: neg a5, a6
; RV32XTHEADBB-NEXT: srl a5, a0, a5
; RV32XTHEADBB-NEXT: sll a7, a1, a2
; RV32XTHEADBB-NEXT: or a7, a5, a7
-; RV32XTHEADBB-NEXT: mv a5, a1
-; RV32XTHEADBB-NEXT: beqz a6, .LBB11_4
; RV32XTHEADBB-NEXT: .LBB11_3:
+; RV32XTHEADBB-NEXT: mv a5, a1
+; RV32XTHEADBB-NEXT: beqz a6, .LBB11_5
+; RV32XTHEADBB-NEXT: # %bb.4:
; RV32XTHEADBB-NEXT: mv a5, a7
-; RV32XTHEADBB-NEXT: .LBB11_4:
+; RV32XTHEADBB-NEXT: .LBB11_5:
; RV32XTHEADBB-NEXT: neg a2, a2
; RV32XTHEADBB-NEXT: andi a6, a2, 63
-; RV32XTHEADBB-NEXT: bltu a6, a4, .LBB11_6
-; RV32XTHEADBB-NEXT: # %bb.5:
+; RV32XTHEADBB-NEXT: bltu a6, a4, .LBB11_7
+; RV32XTHEADBB-NEXT: # %bb.6:
; RV32XTHEADBB-NEXT: srl a7, a1, a6
-; RV32XTHEADBB-NEXT: bnez a6, .LBB11_7
-; RV32XTHEADBB-NEXT: j .LBB11_8
-; RV32XTHEADBB-NEXT: .LBB11_6:
+; RV32XTHEADBB-NEXT: bnez a6, .LBB11_8
+; RV32XTHEADBB-NEXT: j .LBB11_9
+; RV32XTHEADBB-NEXT: .LBB11_7:
; RV32XTHEADBB-NEXT: srl a7, a0, a2
; RV32XTHEADBB-NEXT: neg t0, a6
; RV32XTHEADBB-NEXT: sll t0, a1, t0
; RV32XTHEADBB-NEXT: or a7, a7, t0
-; RV32XTHEADBB-NEXT: beqz a6, .LBB11_8
-; RV32XTHEADBB-NEXT: .LBB11_7:
-; RV32XTHEADBB-NEXT: mv a0, a7
+; RV32XTHEADBB-NEXT: beqz a6, .LBB11_9
; RV32XTHEADBB-NEXT: .LBB11_8:
-; RV32XTHEADBB-NEXT: bltu a6, a4, .LBB11_10
-; RV32XTHEADBB-NEXT: # %bb.9:
+; RV32XTHEADBB-NEXT: mv a0, a7
+; RV32XTHEADBB-NEXT: .LBB11_9:
+; RV32XTHEADBB-NEXT: bltu a6, a4, .LBB11_11
+; RV32XTHEADBB-NEXT: # %bb.10:
; RV32XTHEADBB-NEXT: li a1, 0
-; RV32XTHEADBB-NEXT: j .LBB11_11
-; RV32XTHEADBB-NEXT: .LBB11_10:
-; RV32XTHEADBB-NEXT: srl a1, a1, a2
+; RV32XTHEADBB-NEXT: j .LBB11_12
; RV32XTHEADBB-NEXT: .LBB11_11:
+; RV32XTHEADBB-NEXT: srl a1, a1, a2
+; RV32XTHEADBB-NEXT: .LBB11_12:
; RV32XTHEADBB-NEXT: or a0, a3, a0
; RV32XTHEADBB-NEXT: or a1, a5, a1
; RV32XTHEADBB-NEXT: ret
@@ -1406,44 +1400,43 @@ define i64 @rotr_64_mask_and_127_and_63(i64 %x, i64 %y) nounwind {
; RV32I-NEXT: bltu a4, a5, .LBB14_2
; RV32I-NEXT: # %bb.1:
; RV32I-NEXT: srl a6, a1, a4
-; RV32I-NEXT: mv a3, a0
-; RV32I-NEXT: bnez a4, .LBB14_3
-; RV32I-NEXT: j .LBB14_4
+; RV32I-NEXT: j .LBB14_3
; RV32I-NEXT: .LBB14_2:
; RV32I-NEXT: srl a3, a0, a2
; RV32I-NEXT: neg a6, a4
; RV32I-NEXT: sll a6, a1, a6
; RV32I-NEXT: or a6, a3, a6
-; RV32I-NEXT: mv a3, a0
-; RV32I-NEXT: beqz a4, .LBB14_4
; RV32I-NEXT: .LBB14_3:
+; RV32I-NEXT: mv a3, a0
+; RV32I-NEXT: beqz a4, .LBB14_5
+; RV32I-NEXT: # %bb.4:
; RV32I-NEXT: mv a3, a6
-; RV32I-NEXT: .LBB14_4:
-; RV32I-NEXT: bltu a4, a5, .LBB14_6
-; RV32I-NEXT: # %bb.5:
+; RV32I-NEXT: .LBB14_5:
+; RV32I-NEXT: bltu a4, a5, .LBB14_7
+; RV32I-NEXT: # %bb.6:
; RV32I-NEXT: li a4, 0
-; RV32I-NEXT: j .LBB14_7
-; RV32I-NEXT: .LBB14_6:
-; RV32I-NEXT: srl a4, a1, a2
+; RV32I-NEXT: j .LBB14_8
; RV32I-NEXT: .LBB14_7:
+; RV32I-NEXT: srl a4, a1, a2
+; RV32I-NEXT: .LBB14_8:
; RV32I-NEXT: neg a7, a2
; RV32I-NEXT: andi a6, a7, 63
-; RV32I-NEXT: bltu a6, a5, .LBB14_9
-; RV32I-NEXT: # %bb.8:
+; RV32I-NEXT: bltu a6, a5, .LBB14_10
+; RV32I-NEXT: # %bb.9:
; RV32I-NEXT: li a2, 0
; RV32I-NEXT: sll a0, a0, a6
-; RV32I-NEXT: bnez a6, .LBB14_10
-; RV32I-NEXT: j .LBB14_11
-; RV32I-NEXT: .LBB14_9:
+; RV32I-NEXT: bnez a6, .LBB14_11
+; RV32I-NEXT: j .LBB14_12
+; RV32I-NEXT: .LBB14_10:
; RV32I-NEXT: sll a2, a0, a7
; RV32I-NEXT: neg a5, a6
; RV32I-NEXT: srl a0, a0, a5
; RV32I-NEXT: sll a5, a1, a7
; RV32I-NEXT: or a0, a0, a5
-; RV32I-NEXT: beqz a6, .LBB14_11
-; RV32I-NEXT: .LBB14_10:
-; RV32I-NEXT: mv a1, a0
+; RV32I-NEXT: beqz a6, .LBB14_12
; RV32I-NEXT: .LBB14_11:
+; RV32I-NEXT: mv a1, a0
+; RV32I-NEXT: .LBB14_12:
; RV32I-NEXT: or a0, a3, a2
; RV32I-NEXT: or a1, a4, a1
; RV32I-NEXT: ret
@@ -1463,44 +1456,43 @@ define i64 @rotr_64_mask_and_127_and_63(i64 %x, i64 %y) nounwind {
; RV32ZBB-NEXT: bltu a4, a5, .LBB14_2
; RV32ZBB-NEXT: # %bb.1:
; RV32ZBB-NEXT: srl a6, a1, a4
-; RV32ZBB-NEXT: mv a3, a0
-; RV32ZBB-NEXT: bnez a4, .LBB14_3
-; RV32ZBB-NEXT: j .LBB14_4
+; RV32ZBB-NEXT: j .LBB14_3
; RV32ZBB-NEXT: .LBB14_2:
; RV32ZBB-NEXT: srl a3, a0, a2
; RV32ZBB-NEXT: neg a6, a4
; RV32ZBB-NEXT: sll a6, a1, a6
; RV32ZBB-NEXT: or a6, a3, a6
-; RV32ZBB-NEXT: mv a3, a0
-; RV32ZBB-NEXT: beqz a4, .LBB14_4
; RV32ZBB-NEXT: .LBB14_3:
+; RV32ZBB-NEXT: mv a3, a0
+; RV32ZBB-NEXT: beqz a4, .LBB14_5
+; RV32ZBB-NEXT: # %bb.4:
; RV32ZBB-NEXT: mv a3, a6
-; RV32ZBB-NEXT: .LBB14_4:
-; RV32ZBB-NEXT: bltu a4, a5, .LBB14_6
-; RV32ZBB-NEXT: # %bb.5:
+; RV32ZBB-NEXT: .LBB14_5:
+; RV32ZBB-NEXT: bltu a4, a5, .LBB14_7
+; RV32ZBB-NEXT: # %bb.6:
; RV32ZBB-NEXT: li a4, 0
-; RV32ZBB-NEXT: j .LBB14_7
-; RV32ZBB-NEXT: .LBB14_6:
-; RV32ZBB-NEXT: srl a4, a1, a2
+; RV32ZBB-NEXT: j .LBB14_8
; RV32ZBB-NEXT: .LBB14_7:
+; RV32ZBB-NEXT: srl a4, a1, a2
+; RV32ZBB-NEXT: .LBB14_8:
; RV32ZBB-NEXT: neg a7, a2
; RV32ZBB-NEXT: andi a6, a7, 63
-; RV32ZBB-NEXT: bltu a6, a5, .LBB14_9
-; RV32ZBB-NEXT: # %bb.8:
+; RV32ZBB-NEXT: bltu a6, a5, .LBB14_10
+; RV32ZBB-NEXT: # %bb.9:
; RV32ZBB-NEXT: li a2, 0
; RV32ZBB-NEXT: sll a0, a0, a6
-; RV32ZBB-NEXT: bnez a6, .LBB14_10
-; RV32ZBB-NEXT: j .LBB14_11
-; RV32ZBB-NEXT: .LBB14_9:
+; RV32ZBB-NEXT: bnez a6, .LBB14_11
+; RV32ZBB-NEXT: j .LBB14_12
+; RV32ZBB-NEXT: .LBB14_10:
; RV32ZBB-NEXT: sll a2, a0, a7
; RV32ZBB-NEXT: neg a5, a6
; RV32ZBB-NEXT: srl a0, a0, a5
; RV32ZBB-NEXT: sll a5, a1, a7
; RV32ZBB-NEXT: or a0, a0, a5
-; RV32ZBB-NEXT: beqz a6, .LBB14_11
-; RV32ZBB-NEXT: .LBB14_10:
-; RV32ZBB-NEXT: mv a1, a0
+; RV32ZBB-NEXT: beqz a6, .LBB14_12
; RV32ZBB-NEXT: .LBB14_11:
+; RV32ZBB-NEXT: mv a1, a0
+; RV32ZBB-NEXT: .LBB14_12:
; RV32ZBB-NEXT: or a0, a3, a2
; RV32ZBB-NEXT: or a1, a4, a1
; RV32ZBB-NEXT: ret
@@ -1520,44 +1512,43 @@ define i64 @rotr_64_mask_and_127_and_63(i64 %x, i64 %y) nounwind {
; RV32XTHEADBB-NEXT: bltu a4, a5, .LBB14_2
; RV32XTHEADBB-NEXT: # %bb.1:
; RV32XTHEADBB-NEXT: srl a6, a1, a4
-; RV32XTHEADBB-NEXT: mv a3, a0
-; RV32XTHEADBB-NEXT: bnez a4, .LBB14_3
-; RV32XTHEADBB-NEXT: j .LBB14_4
+; RV32XTHEADBB-NEXT: j .LBB14_3
; RV32XTHEADBB-NEXT: .LBB14_2:
; RV32XTHEADBB-NEXT: srl a3, a0, a2
; RV32XTHEADBB-NEXT: neg a6, a4
; RV32XTHEADBB-NEXT: sll a6, a1, a6
; RV32XTHEADBB-NEXT: or a6, a3, a6
-; RV32XTHEADBB-NEXT: mv a3, a0
-; RV32XTHEADBB-NEXT: beqz a4, .LBB14_4
; RV32XTHEADBB-NEXT: .LBB14_3:
+; RV32XTHEADBB-NEXT: mv a3, a0
+; RV32XTHEADBB-NEXT: beqz a4, .LBB14_5
+; RV32XTHEADBB-NEXT: # %bb.4:
; RV32XTHEADBB-NEXT: mv a3, a6
-; RV32XTHEADBB-NEXT: .LBB14_4:
-; RV32XTHEADBB-NEXT: bltu a4, a5, .LBB14_6
-; RV32XTHEADBB-NEXT: # %bb.5:
+; RV32XTHEADBB-NEXT: .LBB14_5:
+; RV32XTHEADBB-NEXT: bltu a4, a5, .LBB14_7
+; RV32XTHEADBB-NEXT: # %bb.6:
; RV32XTHEADBB-NEXT: li a4, 0
-; RV32XTHEADBB-NEXT: j .LBB14_7
-; RV32XTHEADBB-NEXT: .LBB14_6:
-; RV32XTHEADBB-NEXT: srl a4, a1, a2
+; RV32XTHEADBB-NEXT: j .LBB14_8
; RV32XTHEADBB-NEXT: .LBB14_7:
+; RV32XTHEADBB-NEXT: srl a4, a1, a2
+; RV32XTHEADBB-NEXT: .LBB14_8:
; RV32XTHEADBB-NEXT: neg a7, a2
; RV32XTHEADBB-NEXT: andi a6, a7, 63
-; RV32XTHEADBB-NEXT: bltu a6, a5, .LBB14_9
-; RV32XTHEADBB-NEXT: # %bb.8:
+; RV32XTHEADBB-NEXT: bltu a6, a5, .LBB14_10
+; RV32XTHEADBB-NEXT: # %bb.9:
; RV32XTHEADBB-NEXT: li a2, 0
; RV32XTHEADBB-NEXT: sll a0, a0, a6
-; RV32XTHEADBB-NEXT: bnez a6, .LBB14_10
-; RV32XTHEADBB-NEXT: j .LBB14_11
-; RV32XTHEADBB-NEXT: .LBB14_9:
+; RV32XTHEADBB-NEXT: bnez a6, .LBB14_11
+; RV32XTHEADBB-NEXT: j .LBB14_12
+; RV32XTHEADBB-NEXT: .LBB14_10:
; RV32XTHEADBB-NEXT: sll a2, a0, a7
; RV32XTHEADBB-NEXT: neg a5, a6
; RV32XTHEADBB-NEXT: srl a0, a0, a5
; RV32XTHEADBB-NEXT: sll a5, a1, a7
; RV32XTHEADBB-NEXT: or a0, a0, a5
-; RV32XTHEADBB-NEXT: beqz a6, .LBB14_11
-; RV32XTHEADBB-NEXT: .LBB14_10:
-; RV32XTHEADBB-NEXT: mv a1, a0
+; RV32XTHEADBB-NEXT: beqz a6, .LBB14_12
; RV32XTHEADBB-NEXT: .LBB14_11:
+; RV32XTHEADBB-NEXT: mv a1, a0
+; RV32XTHEADBB-NEXT: .LBB14_12:
; RV32XTHEADBB-NEXT: or a0, a3, a2
; RV32XTHEADBB-NEXT: or a1, a4, a1
; RV32XTHEADBB-NEXT: ret
@@ -2061,60 +2052,59 @@ define signext i64 @rotr_64_mask_shared(i64 signext %a, i64 signext %b, i64 sign
; RV32I-NEXT: bltu a5, t0, .LBB19_2
; RV32I-NEXT: # %bb.1:
; RV32I-NEXT: srl t1, a1, a5
-; RV32I-NEXT: mv a7, a0
-; RV32I-NEXT: bnez a5, .LBB19_3
-; RV32I-NEXT: j .LBB19_4
+; RV32I-NEXT: j .LBB19_3
; RV32I-NEXT: .LBB19_2:
; RV32I-NEXT: srl a7, a0, a4
; RV32I-NEXT: sll t1, ...
[truncated]
|
@llvm/pr-subscribers-llvm-globalisel Author: Mikhail R. Gadelha (mikhailramalho) ChangesThis is a follow-up patch to PR #133256. This patch adds the branch folding pass after the newly added late optimization pass for riscv, which reduces code size in all SPEC benchmarks except for libm. The improvements are: 500.perlbench_r (-3.37%), 544.nab_r (-3.06%), 557.xz_r (-2.82%), 523.xalancbmk_r (-2.64%), 520.omnetpp_r (-2.34%), 531.deepsjeng_r (-2.27%), 502.gcc_r (-2.19%), 526.blender_r (-2.11%), 538.imagick_r (-2.03%), 505.mcf_r (-1.82%), 541.leela_r (-1.74%), 511.povray_r (-1.62%), 510.parest_r (-1.62%), 508.namd_r (-1.57%), 525.x264_r (-1.47%). The geo mean is -2.07%. Some caveats:
Patch is 1.37 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/134760.diff 50 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
index 7fb64be3975d5..63bd0f4c20497 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
@@ -570,6 +570,7 @@ void RISCVPassConfig::addPreEmitPass() {
addPass(createMachineCopyPropagationPass(true));
if (TM->getOptLevel() >= CodeGenOptLevel::Default)
addPass(createRISCVLateBranchOptPass());
+ addPass(&BranchFolderPassID);
addPass(&BranchRelaxationPassID);
addPass(createRISCVMakeCompressibleOptPass());
}
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/rotl-rotr.ll b/llvm/test/CodeGen/RISCV/GlobalISel/rotl-rotr.ll
index 8a786fc9993d2..da8678f9a9916 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/rotl-rotr.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/rotl-rotr.ll
@@ -296,44 +296,43 @@ define i64 @rotr_64(i64 %x, i64 %y) nounwind {
; RV32I-NEXT: bltu a5, a4, .LBB3_2
; RV32I-NEXT: # %bb.1:
; RV32I-NEXT: srl a6, a1, a5
-; RV32I-NEXT: mv a3, a0
-; RV32I-NEXT: bnez a5, .LBB3_3
-; RV32I-NEXT: j .LBB3_4
+; RV32I-NEXT: j .LBB3_3
; RV32I-NEXT: .LBB3_2:
; RV32I-NEXT: srl a3, a0, a2
; RV32I-NEXT: neg a6, a5
; RV32I-NEXT: sll a6, a1, a6
; RV32I-NEXT: or a6, a3, a6
-; RV32I-NEXT: mv a3, a0
-; RV32I-NEXT: beqz a5, .LBB3_4
; RV32I-NEXT: .LBB3_3:
+; RV32I-NEXT: mv a3, a0
+; RV32I-NEXT: beqz a5, .LBB3_5
+; RV32I-NEXT: # %bb.4:
; RV32I-NEXT: mv a3, a6
-; RV32I-NEXT: .LBB3_4:
+; RV32I-NEXT: .LBB3_5:
; RV32I-NEXT: neg a6, a2
-; RV32I-NEXT: bltu a5, a4, .LBB3_7
-; RV32I-NEXT: # %bb.5:
+; RV32I-NEXT: bltu a5, a4, .LBB3_9
+; RV32I-NEXT: # %bb.6:
; RV32I-NEXT: li a2, 0
+; RV32I-NEXT: .LBB3_7:
; RV32I-NEXT: andi a5, a6, 63
-; RV32I-NEXT: bgeu a5, a4, .LBB3_8
-; RV32I-NEXT: .LBB3_6:
+; RV32I-NEXT: bgeu a5, a4, .LBB3_10
+; RV32I-NEXT: # %bb.8:
; RV32I-NEXT: sll a4, a0, a6
; RV32I-NEXT: neg a7, a5
; RV32I-NEXT: srl a0, a0, a7
; RV32I-NEXT: sll a6, a1, a6
; RV32I-NEXT: or a0, a0, a6
-; RV32I-NEXT: bnez a5, .LBB3_9
-; RV32I-NEXT: j .LBB3_10
-; RV32I-NEXT: .LBB3_7:
+; RV32I-NEXT: bnez a5, .LBB3_11
+; RV32I-NEXT: j .LBB3_12
+; RV32I-NEXT: .LBB3_9:
; RV32I-NEXT: srl a2, a1, a2
-; RV32I-NEXT: andi a5, a6, 63
-; RV32I-NEXT: bltu a5, a4, .LBB3_6
-; RV32I-NEXT: .LBB3_8:
+; RV32I-NEXT: j .LBB3_7
+; RV32I-NEXT: .LBB3_10:
; RV32I-NEXT: li a4, 0
; RV32I-NEXT: sll a0, a0, a5
-; RV32I-NEXT: beqz a5, .LBB3_10
-; RV32I-NEXT: .LBB3_9:
+; RV32I-NEXT: beqz a5, .LBB3_12
+; RV32I-NEXT: .LBB3_11:
; RV32I-NEXT: mv a1, a0
-; RV32I-NEXT: .LBB3_10:
+; RV32I-NEXT: .LBB3_12:
; RV32I-NEXT: or a0, a3, a4
; RV32I-NEXT: or a1, a2, a1
; RV32I-NEXT: ret
@@ -353,44 +352,43 @@ define i64 @rotr_64(i64 %x, i64 %y) nounwind {
; RV32ZBB-NEXT: bltu a5, a4, .LBB3_2
; RV32ZBB-NEXT: # %bb.1:
; RV32ZBB-NEXT: srl a6, a1, a5
-; RV32ZBB-NEXT: mv a3, a0
-; RV32ZBB-NEXT: bnez a5, .LBB3_3
-; RV32ZBB-NEXT: j .LBB3_4
+; RV32ZBB-NEXT: j .LBB3_3
; RV32ZBB-NEXT: .LBB3_2:
; RV32ZBB-NEXT: srl a3, a0, a2
; RV32ZBB-NEXT: neg a6, a5
; RV32ZBB-NEXT: sll a6, a1, a6
; RV32ZBB-NEXT: or a6, a3, a6
-; RV32ZBB-NEXT: mv a3, a0
-; RV32ZBB-NEXT: beqz a5, .LBB3_4
; RV32ZBB-NEXT: .LBB3_3:
+; RV32ZBB-NEXT: mv a3, a0
+; RV32ZBB-NEXT: beqz a5, .LBB3_5
+; RV32ZBB-NEXT: # %bb.4:
; RV32ZBB-NEXT: mv a3, a6
-; RV32ZBB-NEXT: .LBB3_4:
+; RV32ZBB-NEXT: .LBB3_5:
; RV32ZBB-NEXT: neg a6, a2
-; RV32ZBB-NEXT: bltu a5, a4, .LBB3_7
-; RV32ZBB-NEXT: # %bb.5:
+; RV32ZBB-NEXT: bltu a5, a4, .LBB3_9
+; RV32ZBB-NEXT: # %bb.6:
; RV32ZBB-NEXT: li a2, 0
+; RV32ZBB-NEXT: .LBB3_7:
; RV32ZBB-NEXT: andi a5, a6, 63
-; RV32ZBB-NEXT: bgeu a5, a4, .LBB3_8
-; RV32ZBB-NEXT: .LBB3_6:
+; RV32ZBB-NEXT: bgeu a5, a4, .LBB3_10
+; RV32ZBB-NEXT: # %bb.8:
; RV32ZBB-NEXT: sll a4, a0, a6
; RV32ZBB-NEXT: neg a7, a5
; RV32ZBB-NEXT: srl a0, a0, a7
; RV32ZBB-NEXT: sll a6, a1, a6
; RV32ZBB-NEXT: or a0, a0, a6
-; RV32ZBB-NEXT: bnez a5, .LBB3_9
-; RV32ZBB-NEXT: j .LBB3_10
-; RV32ZBB-NEXT: .LBB3_7:
+; RV32ZBB-NEXT: bnez a5, .LBB3_11
+; RV32ZBB-NEXT: j .LBB3_12
+; RV32ZBB-NEXT: .LBB3_9:
; RV32ZBB-NEXT: srl a2, a1, a2
-; RV32ZBB-NEXT: andi a5, a6, 63
-; RV32ZBB-NEXT: bltu a5, a4, .LBB3_6
-; RV32ZBB-NEXT: .LBB3_8:
+; RV32ZBB-NEXT: j .LBB3_7
+; RV32ZBB-NEXT: .LBB3_10:
; RV32ZBB-NEXT: li a4, 0
; RV32ZBB-NEXT: sll a0, a0, a5
-; RV32ZBB-NEXT: beqz a5, .LBB3_10
-; RV32ZBB-NEXT: .LBB3_9:
+; RV32ZBB-NEXT: beqz a5, .LBB3_12
+; RV32ZBB-NEXT: .LBB3_11:
; RV32ZBB-NEXT: mv a1, a0
-; RV32ZBB-NEXT: .LBB3_10:
+; RV32ZBB-NEXT: .LBB3_12:
; RV32ZBB-NEXT: or a0, a3, a4
; RV32ZBB-NEXT: or a1, a2, a1
; RV32ZBB-NEXT: ret
@@ -407,44 +405,43 @@ define i64 @rotr_64(i64 %x, i64 %y) nounwind {
; RV32XTHEADBB-NEXT: bltu a5, a4, .LBB3_2
; RV32XTHEADBB-NEXT: # %bb.1:
; RV32XTHEADBB-NEXT: srl a6, a1, a5
-; RV32XTHEADBB-NEXT: mv a3, a0
-; RV32XTHEADBB-NEXT: bnez a5, .LBB3_3
-; RV32XTHEADBB-NEXT: j .LBB3_4
+; RV32XTHEADBB-NEXT: j .LBB3_3
; RV32XTHEADBB-NEXT: .LBB3_2:
; RV32XTHEADBB-NEXT: srl a3, a0, a2
; RV32XTHEADBB-NEXT: neg a6, a5
; RV32XTHEADBB-NEXT: sll a6, a1, a6
; RV32XTHEADBB-NEXT: or a6, a3, a6
-; RV32XTHEADBB-NEXT: mv a3, a0
-; RV32XTHEADBB-NEXT: beqz a5, .LBB3_4
; RV32XTHEADBB-NEXT: .LBB3_3:
+; RV32XTHEADBB-NEXT: mv a3, a0
+; RV32XTHEADBB-NEXT: beqz a5, .LBB3_5
+; RV32XTHEADBB-NEXT: # %bb.4:
; RV32XTHEADBB-NEXT: mv a3, a6
-; RV32XTHEADBB-NEXT: .LBB3_4:
+; RV32XTHEADBB-NEXT: .LBB3_5:
; RV32XTHEADBB-NEXT: neg a6, a2
-; RV32XTHEADBB-NEXT: bltu a5, a4, .LBB3_7
-; RV32XTHEADBB-NEXT: # %bb.5:
+; RV32XTHEADBB-NEXT: bltu a5, a4, .LBB3_9
+; RV32XTHEADBB-NEXT: # %bb.6:
; RV32XTHEADBB-NEXT: li a2, 0
+; RV32XTHEADBB-NEXT: .LBB3_7:
; RV32XTHEADBB-NEXT: andi a5, a6, 63
-; RV32XTHEADBB-NEXT: bgeu a5, a4, .LBB3_8
-; RV32XTHEADBB-NEXT: .LBB3_6:
+; RV32XTHEADBB-NEXT: bgeu a5, a4, .LBB3_10
+; RV32XTHEADBB-NEXT: # %bb.8:
; RV32XTHEADBB-NEXT: sll a4, a0, a6
; RV32XTHEADBB-NEXT: neg a7, a5
; RV32XTHEADBB-NEXT: srl a0, a0, a7
; RV32XTHEADBB-NEXT: sll a6, a1, a6
; RV32XTHEADBB-NEXT: or a0, a0, a6
-; RV32XTHEADBB-NEXT: bnez a5, .LBB3_9
-; RV32XTHEADBB-NEXT: j .LBB3_10
-; RV32XTHEADBB-NEXT: .LBB3_7:
+; RV32XTHEADBB-NEXT: bnez a5, .LBB3_11
+; RV32XTHEADBB-NEXT: j .LBB3_12
+; RV32XTHEADBB-NEXT: .LBB3_9:
; RV32XTHEADBB-NEXT: srl a2, a1, a2
-; RV32XTHEADBB-NEXT: andi a5, a6, 63
-; RV32XTHEADBB-NEXT: bltu a5, a4, .LBB3_6
-; RV32XTHEADBB-NEXT: .LBB3_8:
+; RV32XTHEADBB-NEXT: j .LBB3_7
+; RV32XTHEADBB-NEXT: .LBB3_10:
; RV32XTHEADBB-NEXT: li a4, 0
; RV32XTHEADBB-NEXT: sll a0, a0, a5
-; RV32XTHEADBB-NEXT: beqz a5, .LBB3_10
-; RV32XTHEADBB-NEXT: .LBB3_9:
+; RV32XTHEADBB-NEXT: beqz a5, .LBB3_12
+; RV32XTHEADBB-NEXT: .LBB3_11:
; RV32XTHEADBB-NEXT: mv a1, a0
-; RV32XTHEADBB-NEXT: .LBB3_10:
+; RV32XTHEADBB-NEXT: .LBB3_12:
; RV32XTHEADBB-NEXT: or a0, a3, a4
; RV32XTHEADBB-NEXT: or a1, a2, a1
; RV32XTHEADBB-NEXT: ret
@@ -961,43 +958,42 @@ define i64 @rotl_64_mask_and_127_and_63(i64 %x, i64 %y) nounwind {
; RV32I-NEXT: # %bb.1:
; RV32I-NEXT: li a3, 0
; RV32I-NEXT: sll a7, a0, a6
-; RV32I-NEXT: mv a5, a1
-; RV32I-NEXT: bnez a6, .LBB11_3
-; RV32I-NEXT: j .LBB11_4
+; RV32I-NEXT: j .LBB11_3
; RV32I-NEXT: .LBB11_2:
; RV32I-NEXT: sll a3, a0, a2
; RV32I-NEXT: neg a5, a6
; RV32I-NEXT: srl a5, a0, a5
; RV32I-NEXT: sll a7, a1, a2
; RV32I-NEXT: or a7, a5, a7
-; RV32I-NEXT: mv a5, a1
-; RV32I-NEXT: beqz a6, .LBB11_4
; RV32I-NEXT: .LBB11_3:
+; RV32I-NEXT: mv a5, a1
+; RV32I-NEXT: beqz a6, .LBB11_5
+; RV32I-NEXT: # %bb.4:
; RV32I-NEXT: mv a5, a7
-; RV32I-NEXT: .LBB11_4:
+; RV32I-NEXT: .LBB11_5:
; RV32I-NEXT: neg a2, a2
; RV32I-NEXT: andi a6, a2, 63
-; RV32I-NEXT: bltu a6, a4, .LBB11_6
-; RV32I-NEXT: # %bb.5:
+; RV32I-NEXT: bltu a6, a4, .LBB11_7
+; RV32I-NEXT: # %bb.6:
; RV32I-NEXT: srl a7, a1, a6
-; RV32I-NEXT: bnez a6, .LBB11_7
-; RV32I-NEXT: j .LBB11_8
-; RV32I-NEXT: .LBB11_6:
+; RV32I-NEXT: bnez a6, .LBB11_8
+; RV32I-NEXT: j .LBB11_9
+; RV32I-NEXT: .LBB11_7:
; RV32I-NEXT: srl a7, a0, a2
; RV32I-NEXT: neg t0, a6
; RV32I-NEXT: sll t0, a1, t0
; RV32I-NEXT: or a7, a7, t0
-; RV32I-NEXT: beqz a6, .LBB11_8
-; RV32I-NEXT: .LBB11_7:
-; RV32I-NEXT: mv a0, a7
+; RV32I-NEXT: beqz a6, .LBB11_9
; RV32I-NEXT: .LBB11_8:
-; RV32I-NEXT: bltu a6, a4, .LBB11_10
-; RV32I-NEXT: # %bb.9:
+; RV32I-NEXT: mv a0, a7
+; RV32I-NEXT: .LBB11_9:
+; RV32I-NEXT: bltu a6, a4, .LBB11_11
+; RV32I-NEXT: # %bb.10:
; RV32I-NEXT: li a1, 0
-; RV32I-NEXT: j .LBB11_11
-; RV32I-NEXT: .LBB11_10:
-; RV32I-NEXT: srl a1, a1, a2
+; RV32I-NEXT: j .LBB11_12
; RV32I-NEXT: .LBB11_11:
+; RV32I-NEXT: srl a1, a1, a2
+; RV32I-NEXT: .LBB11_12:
; RV32I-NEXT: or a0, a3, a0
; RV32I-NEXT: or a1, a5, a1
; RV32I-NEXT: ret
@@ -1018,43 +1014,42 @@ define i64 @rotl_64_mask_and_127_and_63(i64 %x, i64 %y) nounwind {
; RV32ZBB-NEXT: # %bb.1:
; RV32ZBB-NEXT: li a3, 0
; RV32ZBB-NEXT: sll a7, a0, a6
-; RV32ZBB-NEXT: mv a5, a1
-; RV32ZBB-NEXT: bnez a6, .LBB11_3
-; RV32ZBB-NEXT: j .LBB11_4
+; RV32ZBB-NEXT: j .LBB11_3
; RV32ZBB-NEXT: .LBB11_2:
; RV32ZBB-NEXT: sll a3, a0, a2
; RV32ZBB-NEXT: neg a5, a6
; RV32ZBB-NEXT: srl a5, a0, a5
; RV32ZBB-NEXT: sll a7, a1, a2
; RV32ZBB-NEXT: or a7, a5, a7
-; RV32ZBB-NEXT: mv a5, a1
-; RV32ZBB-NEXT: beqz a6, .LBB11_4
; RV32ZBB-NEXT: .LBB11_3:
+; RV32ZBB-NEXT: mv a5, a1
+; RV32ZBB-NEXT: beqz a6, .LBB11_5
+; RV32ZBB-NEXT: # %bb.4:
; RV32ZBB-NEXT: mv a5, a7
-; RV32ZBB-NEXT: .LBB11_4:
+; RV32ZBB-NEXT: .LBB11_5:
; RV32ZBB-NEXT: neg a2, a2
; RV32ZBB-NEXT: andi a6, a2, 63
-; RV32ZBB-NEXT: bltu a6, a4, .LBB11_6
-; RV32ZBB-NEXT: # %bb.5:
+; RV32ZBB-NEXT: bltu a6, a4, .LBB11_7
+; RV32ZBB-NEXT: # %bb.6:
; RV32ZBB-NEXT: srl a7, a1, a6
-; RV32ZBB-NEXT: bnez a6, .LBB11_7
-; RV32ZBB-NEXT: j .LBB11_8
-; RV32ZBB-NEXT: .LBB11_6:
+; RV32ZBB-NEXT: bnez a6, .LBB11_8
+; RV32ZBB-NEXT: j .LBB11_9
+; RV32ZBB-NEXT: .LBB11_7:
; RV32ZBB-NEXT: srl a7, a0, a2
; RV32ZBB-NEXT: neg t0, a6
; RV32ZBB-NEXT: sll t0, a1, t0
; RV32ZBB-NEXT: or a7, a7, t0
-; RV32ZBB-NEXT: beqz a6, .LBB11_8
-; RV32ZBB-NEXT: .LBB11_7:
-; RV32ZBB-NEXT: mv a0, a7
+; RV32ZBB-NEXT: beqz a6, .LBB11_9
; RV32ZBB-NEXT: .LBB11_8:
-; RV32ZBB-NEXT: bltu a6, a4, .LBB11_10
-; RV32ZBB-NEXT: # %bb.9:
+; RV32ZBB-NEXT: mv a0, a7
+; RV32ZBB-NEXT: .LBB11_9:
+; RV32ZBB-NEXT: bltu a6, a4, .LBB11_11
+; RV32ZBB-NEXT: # %bb.10:
; RV32ZBB-NEXT: li a1, 0
-; RV32ZBB-NEXT: j .LBB11_11
-; RV32ZBB-NEXT: .LBB11_10:
-; RV32ZBB-NEXT: srl a1, a1, a2
+; RV32ZBB-NEXT: j .LBB11_12
; RV32ZBB-NEXT: .LBB11_11:
+; RV32ZBB-NEXT: srl a1, a1, a2
+; RV32ZBB-NEXT: .LBB11_12:
; RV32ZBB-NEXT: or a0, a3, a0
; RV32ZBB-NEXT: or a1, a5, a1
; RV32ZBB-NEXT: ret
@@ -1075,43 +1070,42 @@ define i64 @rotl_64_mask_and_127_and_63(i64 %x, i64 %y) nounwind {
; RV32XTHEADBB-NEXT: # %bb.1:
; RV32XTHEADBB-NEXT: li a3, 0
; RV32XTHEADBB-NEXT: sll a7, a0, a6
-; RV32XTHEADBB-NEXT: mv a5, a1
-; RV32XTHEADBB-NEXT: bnez a6, .LBB11_3
-; RV32XTHEADBB-NEXT: j .LBB11_4
+; RV32XTHEADBB-NEXT: j .LBB11_3
; RV32XTHEADBB-NEXT: .LBB11_2:
; RV32XTHEADBB-NEXT: sll a3, a0, a2
; RV32XTHEADBB-NEXT: neg a5, a6
; RV32XTHEADBB-NEXT: srl a5, a0, a5
; RV32XTHEADBB-NEXT: sll a7, a1, a2
; RV32XTHEADBB-NEXT: or a7, a5, a7
-; RV32XTHEADBB-NEXT: mv a5, a1
-; RV32XTHEADBB-NEXT: beqz a6, .LBB11_4
; RV32XTHEADBB-NEXT: .LBB11_3:
+; RV32XTHEADBB-NEXT: mv a5, a1
+; RV32XTHEADBB-NEXT: beqz a6, .LBB11_5
+; RV32XTHEADBB-NEXT: # %bb.4:
; RV32XTHEADBB-NEXT: mv a5, a7
-; RV32XTHEADBB-NEXT: .LBB11_4:
+; RV32XTHEADBB-NEXT: .LBB11_5:
; RV32XTHEADBB-NEXT: neg a2, a2
; RV32XTHEADBB-NEXT: andi a6, a2, 63
-; RV32XTHEADBB-NEXT: bltu a6, a4, .LBB11_6
-; RV32XTHEADBB-NEXT: # %bb.5:
+; RV32XTHEADBB-NEXT: bltu a6, a4, .LBB11_7
+; RV32XTHEADBB-NEXT: # %bb.6:
; RV32XTHEADBB-NEXT: srl a7, a1, a6
-; RV32XTHEADBB-NEXT: bnez a6, .LBB11_7
-; RV32XTHEADBB-NEXT: j .LBB11_8
-; RV32XTHEADBB-NEXT: .LBB11_6:
+; RV32XTHEADBB-NEXT: bnez a6, .LBB11_8
+; RV32XTHEADBB-NEXT: j .LBB11_9
+; RV32XTHEADBB-NEXT: .LBB11_7:
; RV32XTHEADBB-NEXT: srl a7, a0, a2
; RV32XTHEADBB-NEXT: neg t0, a6
; RV32XTHEADBB-NEXT: sll t0, a1, t0
; RV32XTHEADBB-NEXT: or a7, a7, t0
-; RV32XTHEADBB-NEXT: beqz a6, .LBB11_8
-; RV32XTHEADBB-NEXT: .LBB11_7:
-; RV32XTHEADBB-NEXT: mv a0, a7
+; RV32XTHEADBB-NEXT: beqz a6, .LBB11_9
; RV32XTHEADBB-NEXT: .LBB11_8:
-; RV32XTHEADBB-NEXT: bltu a6, a4, .LBB11_10
-; RV32XTHEADBB-NEXT: # %bb.9:
+; RV32XTHEADBB-NEXT: mv a0, a7
+; RV32XTHEADBB-NEXT: .LBB11_9:
+; RV32XTHEADBB-NEXT: bltu a6, a4, .LBB11_11
+; RV32XTHEADBB-NEXT: # %bb.10:
; RV32XTHEADBB-NEXT: li a1, 0
-; RV32XTHEADBB-NEXT: j .LBB11_11
-; RV32XTHEADBB-NEXT: .LBB11_10:
-; RV32XTHEADBB-NEXT: srl a1, a1, a2
+; RV32XTHEADBB-NEXT: j .LBB11_12
; RV32XTHEADBB-NEXT: .LBB11_11:
+; RV32XTHEADBB-NEXT: srl a1, a1, a2
+; RV32XTHEADBB-NEXT: .LBB11_12:
; RV32XTHEADBB-NEXT: or a0, a3, a0
; RV32XTHEADBB-NEXT: or a1, a5, a1
; RV32XTHEADBB-NEXT: ret
@@ -1406,44 +1400,43 @@ define i64 @rotr_64_mask_and_127_and_63(i64 %x, i64 %y) nounwind {
; RV32I-NEXT: bltu a4, a5, .LBB14_2
; RV32I-NEXT: # %bb.1:
; RV32I-NEXT: srl a6, a1, a4
-; RV32I-NEXT: mv a3, a0
-; RV32I-NEXT: bnez a4, .LBB14_3
-; RV32I-NEXT: j .LBB14_4
+; RV32I-NEXT: j .LBB14_3
; RV32I-NEXT: .LBB14_2:
; RV32I-NEXT: srl a3, a0, a2
; RV32I-NEXT: neg a6, a4
; RV32I-NEXT: sll a6, a1, a6
; RV32I-NEXT: or a6, a3, a6
-; RV32I-NEXT: mv a3, a0
-; RV32I-NEXT: beqz a4, .LBB14_4
; RV32I-NEXT: .LBB14_3:
+; RV32I-NEXT: mv a3, a0
+; RV32I-NEXT: beqz a4, .LBB14_5
+; RV32I-NEXT: # %bb.4:
; RV32I-NEXT: mv a3, a6
-; RV32I-NEXT: .LBB14_4:
-; RV32I-NEXT: bltu a4, a5, .LBB14_6
-; RV32I-NEXT: # %bb.5:
+; RV32I-NEXT: .LBB14_5:
+; RV32I-NEXT: bltu a4, a5, .LBB14_7
+; RV32I-NEXT: # %bb.6:
; RV32I-NEXT: li a4, 0
-; RV32I-NEXT: j .LBB14_7
-; RV32I-NEXT: .LBB14_6:
-; RV32I-NEXT: srl a4, a1, a2
+; RV32I-NEXT: j .LBB14_8
; RV32I-NEXT: .LBB14_7:
+; RV32I-NEXT: srl a4, a1, a2
+; RV32I-NEXT: .LBB14_8:
; RV32I-NEXT: neg a7, a2
; RV32I-NEXT: andi a6, a7, 63
-; RV32I-NEXT: bltu a6, a5, .LBB14_9
-; RV32I-NEXT: # %bb.8:
+; RV32I-NEXT: bltu a6, a5, .LBB14_10
+; RV32I-NEXT: # %bb.9:
; RV32I-NEXT: li a2, 0
; RV32I-NEXT: sll a0, a0, a6
-; RV32I-NEXT: bnez a6, .LBB14_10
-; RV32I-NEXT: j .LBB14_11
-; RV32I-NEXT: .LBB14_9:
+; RV32I-NEXT: bnez a6, .LBB14_11
+; RV32I-NEXT: j .LBB14_12
+; RV32I-NEXT: .LBB14_10:
; RV32I-NEXT: sll a2, a0, a7
; RV32I-NEXT: neg a5, a6
; RV32I-NEXT: srl a0, a0, a5
; RV32I-NEXT: sll a5, a1, a7
; RV32I-NEXT: or a0, a0, a5
-; RV32I-NEXT: beqz a6, .LBB14_11
-; RV32I-NEXT: .LBB14_10:
-; RV32I-NEXT: mv a1, a0
+; RV32I-NEXT: beqz a6, .LBB14_12
; RV32I-NEXT: .LBB14_11:
+; RV32I-NEXT: mv a1, a0
+; RV32I-NEXT: .LBB14_12:
; RV32I-NEXT: or a0, a3, a2
; RV32I-NEXT: or a1, a4, a1
; RV32I-NEXT: ret
@@ -1463,44 +1456,43 @@ define i64 @rotr_64_mask_and_127_and_63(i64 %x, i64 %y) nounwind {
; RV32ZBB-NEXT: bltu a4, a5, .LBB14_2
; RV32ZBB-NEXT: # %bb.1:
; RV32ZBB-NEXT: srl a6, a1, a4
-; RV32ZBB-NEXT: mv a3, a0
-; RV32ZBB-NEXT: bnez a4, .LBB14_3
-; RV32ZBB-NEXT: j .LBB14_4
+; RV32ZBB-NEXT: j .LBB14_3
; RV32ZBB-NEXT: .LBB14_2:
; RV32ZBB-NEXT: srl a3, a0, a2
; RV32ZBB-NEXT: neg a6, a4
; RV32ZBB-NEXT: sll a6, a1, a6
; RV32ZBB-NEXT: or a6, a3, a6
-; RV32ZBB-NEXT: mv a3, a0
-; RV32ZBB-NEXT: beqz a4, .LBB14_4
; RV32ZBB-NEXT: .LBB14_3:
+; RV32ZBB-NEXT: mv a3, a0
+; RV32ZBB-NEXT: beqz a4, .LBB14_5
+; RV32ZBB-NEXT: # %bb.4:
; RV32ZBB-NEXT: mv a3, a6
-; RV32ZBB-NEXT: .LBB14_4:
-; RV32ZBB-NEXT: bltu a4, a5, .LBB14_6
-; RV32ZBB-NEXT: # %bb.5:
+; RV32ZBB-NEXT: .LBB14_5:
+; RV32ZBB-NEXT: bltu a4, a5, .LBB14_7
+; RV32ZBB-NEXT: # %bb.6:
; RV32ZBB-NEXT: li a4, 0
-; RV32ZBB-NEXT: j .LBB14_7
-; RV32ZBB-NEXT: .LBB14_6:
-; RV32ZBB-NEXT: srl a4, a1, a2
+; RV32ZBB-NEXT: j .LBB14_8
; RV32ZBB-NEXT: .LBB14_7:
+; RV32ZBB-NEXT: srl a4, a1, a2
+; RV32ZBB-NEXT: .LBB14_8:
; RV32ZBB-NEXT: neg a7, a2
; RV32ZBB-NEXT: andi a6, a7, 63
-; RV32ZBB-NEXT: bltu a6, a5, .LBB14_9
-; RV32ZBB-NEXT: # %bb.8:
+; RV32ZBB-NEXT: bltu a6, a5, .LBB14_10
+; RV32ZBB-NEXT: # %bb.9:
; RV32ZBB-NEXT: li a2, 0
; RV32ZBB-NEXT: sll a0, a0, a6
-; RV32ZBB-NEXT: bnez a6, .LBB14_10
-; RV32ZBB-NEXT: j .LBB14_11
-; RV32ZBB-NEXT: .LBB14_9:
+; RV32ZBB-NEXT: bnez a6, .LBB14_11
+; RV32ZBB-NEXT: j .LBB14_12
+; RV32ZBB-NEXT: .LBB14_10:
; RV32ZBB-NEXT: sll a2, a0, a7
; RV32ZBB-NEXT: neg a5, a6
; RV32ZBB-NEXT: srl a0, a0, a5
; RV32ZBB-NEXT: sll a5, a1, a7
; RV32ZBB-NEXT: or a0, a0, a5
-; RV32ZBB-NEXT: beqz a6, .LBB14_11
-; RV32ZBB-NEXT: .LBB14_10:
-; RV32ZBB-NEXT: mv a1, a0
+; RV32ZBB-NEXT: beqz a6, .LBB14_12
; RV32ZBB-NEXT: .LBB14_11:
+; RV32ZBB-NEXT: mv a1, a0
+; RV32ZBB-NEXT: .LBB14_12:
; RV32ZBB-NEXT: or a0, a3, a2
; RV32ZBB-NEXT: or a1, a4, a1
; RV32ZBB-NEXT: ret
@@ -1520,44 +1512,43 @@ define i64 @rotr_64_mask_and_127_and_63(i64 %x, i64 %y) nounwind {
; RV32XTHEADBB-NEXT: bltu a4, a5, .LBB14_2
; RV32XTHEADBB-NEXT: # %bb.1:
; RV32XTHEADBB-NEXT: srl a6, a1, a4
-; RV32XTHEADBB-NEXT: mv a3, a0
-; RV32XTHEADBB-NEXT: bnez a4, .LBB14_3
-; RV32XTHEADBB-NEXT: j .LBB14_4
+; RV32XTHEADBB-NEXT: j .LBB14_3
; RV32XTHEADBB-NEXT: .LBB14_2:
; RV32XTHEADBB-NEXT: srl a3, a0, a2
; RV32XTHEADBB-NEXT: neg a6, a4
; RV32XTHEADBB-NEXT: sll a6, a1, a6
; RV32XTHEADBB-NEXT: or a6, a3, a6
-; RV32XTHEADBB-NEXT: mv a3, a0
-; RV32XTHEADBB-NEXT: beqz a4, .LBB14_4
; RV32XTHEADBB-NEXT: .LBB14_3:
+; RV32XTHEADBB-NEXT: mv a3, a0
+; RV32XTHEADBB-NEXT: beqz a4, .LBB14_5
+; RV32XTHEADBB-NEXT: # %bb.4:
; RV32XTHEADBB-NEXT: mv a3, a6
-; RV32XTHEADBB-NEXT: .LBB14_4:
-; RV32XTHEADBB-NEXT: bltu a4, a5, .LBB14_6
-; RV32XTHEADBB-NEXT: # %bb.5:
+; RV32XTHEADBB-NEXT: .LBB14_5:
+; RV32XTHEADBB-NEXT: bltu a4, a5, .LBB14_7
+; RV32XTHEADBB-NEXT: # %bb.6:
; RV32XTHEADBB-NEXT: li a4, 0
-; RV32XTHEADBB-NEXT: j .LBB14_7
-; RV32XTHEADBB-NEXT: .LBB14_6:
-; RV32XTHEADBB-NEXT: srl a4, a1, a2
+; RV32XTHEADBB-NEXT: j .LBB14_8
; RV32XTHEADBB-NEXT: .LBB14_7:
+; RV32XTHEADBB-NEXT: srl a4, a1, a2
+; RV32XTHEADBB-NEXT: .LBB14_8:
; RV32XTHEADBB-NEXT: neg a7, a2
; RV32XTHEADBB-NEXT: andi a6, a7, 63
-; RV32XTHEADBB-NEXT: bltu a6, a5, .LBB14_9
-; RV32XTHEADBB-NEXT: # %bb.8:
+; RV32XTHEADBB-NEXT: bltu a6, a5, .LBB14_10
+; RV32XTHEADBB-NEXT: # %bb.9:
; RV32XTHEADBB-NEXT: li a2, 0
; RV32XTHEADBB-NEXT: sll a0, a0, a6
-; RV32XTHEADBB-NEXT: bnez a6, .LBB14_10
-; RV32XTHEADBB-NEXT: j .LBB14_11
-; RV32XTHEADBB-NEXT: .LBB14_9:
+; RV32XTHEADBB-NEXT: bnez a6, .LBB14_11
+; RV32XTHEADBB-NEXT: j .LBB14_12
+; RV32XTHEADBB-NEXT: .LBB14_10:
; RV32XTHEADBB-NEXT: sll a2, a0, a7
; RV32XTHEADBB-NEXT: neg a5, a6
; RV32XTHEADBB-NEXT: srl a0, a0, a5
; RV32XTHEADBB-NEXT: sll a5, a1, a7
; RV32XTHEADBB-NEXT: or a0, a0, a5
-; RV32XTHEADBB-NEXT: beqz a6, .LBB14_11
-; RV32XTHEADBB-NEXT: .LBB14_10:
-; RV32XTHEADBB-NEXT: mv a1, a0
+; RV32XTHEADBB-NEXT: beqz a6, .LBB14_12
; RV32XTHEADBB-NEXT: .LBB14_11:
+; RV32XTHEADBB-NEXT: mv a1, a0
+; RV32XTHEADBB-NEXT: .LBB14_12:
; RV32XTHEADBB-NEXT: or a0, a3, a2
; RV32XTHEADBB-NEXT: or a1, a4, a1
; RV32XTHEADBB-NEXT: ret
@@ -2061,60 +2052,59 @@ define signext i64 @rotr_64_mask_shared(i64 signext %a, i64 signext %b, i64 sign
; RV32I-NEXT: bltu a5, t0, .LBB19_2
; RV32I-NEXT: # %bb.1:
; RV32I-NEXT: srl t1, a1, a5
-; RV32I-NEXT: mv a7, a0
-; RV32I-NEXT: bnez a5, .LBB19_3
-; RV32I-NEXT: j .LBB19_4
+; RV32I-NEXT: j .LBB19_3
; RV32I-NEXT: .LBB19_2:
; RV32I-NEXT: srl a7, a0, a4
; RV32I-NEXT: sll t1, ...
[truncated]
|
The results mentioned on the PR are available here: https://lnt.lukelau.me/db_default/v4/nts/380?compare_to=379 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! What about the compile-time impact?
@@ -570,6 +570,7 @@ void RISCVPassConfig::addPreEmitPass() { | |||
addPass(createMachineCopyPropagationPass(true)); | |||
if (TM->getOptLevel() >= CodeGenOptLevel::Default) | |||
addPass(createRISCVLateBranchOptPass()); | |||
addPass(&BranchFolderPassID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should have run already? I'd expect a "late branch opt" pass to avoid introducing new constructs that need cleanup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The late branch optimisation pass is actually making it easier for branch folding to clean up things that copy propagation and other prior passes have introduced.
llvm/test/CodeGen/RISCV/copyprop.ll
Outdated
; PROP-NEXT: li a0, 12 | ||
; PROP-NEXT: sw a0, 0(a4) | ||
; PROP-NEXT: ret | ||
; PROP-NEXT: j .LBB0_3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this undoing tail duplication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this end up happening? I thought this pass ran after tail duplication
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the things the branch folding pass does is merging common tails. Tail duplication explicitly creates common tails to reduce the number branch instructions that will be executed. If you run branch folding after tail duplication it will undo explicit tail duplication.
At least one of these cases is undoing tail duplication. Do you have dynamic instruction count numbers? |
Here are the numbers:
DirA is the main branch, commit 65813e0 Blender is not there because it timed out on the main branch. I'm re-running it now. The numbers increased pretty much everywhere. Maybe I should get only the code that removes the dead blocks running? I see you can disable tail merging on the branch folding pass. |
@@ -9,21 +9,11 @@ define i16 @f() { | |||
; CHECK: # %bb.0: # %BB | |||
; CHECK-NEXT: addi sp, sp, -16 | |||
; CHECK-NEXT: .cfi_def_cfa_offset 16 | |||
; CHECK-NEXT: j .LBB0_1 | |||
; CHECK-NEXT: .LBB0_1: # %BB1 | |||
; CHECK-NEXT: # =>This Inner Loop Header: Depth=1 | |||
; CHECK-NEXT: li a0, 0 | |||
; CHECK-NEXT: sd a0, 8(sp) # 8-byte Folded Spill | |||
; CHECK-NEXT: j .LBB0_1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@preames I saw you added this test case to check for a live interval crash when it runs in unreachable blocks, but if this patch lands, the unreachable block is gone. Should I try to rework the test? Maybe disable branch folding via the cmd line
Updated numbers with blender:
|
…e tail merge Signed-off-by: Mikhail R. Gadelha <mikhail@igalia.com>
Signed-off-by: Mikhail R. Gadelha <mikhail@igalia.com>
Signed-off-by: Mikhail R. Gadelha <mikhail@igalia.com>
Folks, I've updated the PR with an option to disable tail merging when calling the branch folding pass. This required me to create a I'll run SPEC now and report the results once it's done. |
Sorry I missed your message. I'll collect the data and post it soon. |
bool EnableTailMerge = !MF.getTarget().requiresStructuredCFG() && | ||
PassConfig->getEnableTailMerge(); | ||
PassConfig->getEnableTailMerge() && | ||
this->EnableTailMerge; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This global state should be moved to only change the pass parameter
@@ -90,10 +90,13 @@ namespace { | |||
|
|||
/// BranchFolderPass - Wrap branch folder in a machine function pass. | |||
class BranchFolderLegacy : public MachineFunctionPass { | |||
bool EnableTailMerge; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing new pass manager handling (also should handle print/parse of the pass parameter). Also this should be done as a separate step
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the new pass manager part covered by #128858?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That added the initial port to the new pm. This is now changing the pass arguments in the old PM, without the matching new PM change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I'll work on it
It shows an improvement on mcf and omnetpp, but I believe it's the usual variance of these benchmarks, not the result of the patch. There is a considerably smaller code size change, but it's still there. There is a slight reduction in the dynamic instruction count. Blender is not included since it keeps timing out on qemu for me, on both main and the PR branch:
|
This is a follow-up patch to PR #133256.
This patch adds the branch folding pass after the newly added late optimization pass for riscv, which reduces code size in all SPEC benchmarks except for libm.
The improvements are: 500.perlbench_r (-3.37%), 544.nab_r (-3.06%), 557.xz_r (-2.82%), 523.xalancbmk_r (-2.64%), 520.omnetpp_r (-2.34%), 531.deepsjeng_r (-2.27%), 502.gcc_r (-2.19%), 526.blender_r (-2.11%), 538.imagick_r (-2.03%), 505.mcf_r (-1.82%), 541.leela_r (-1.74%), 511.povray_r (-1.62%), 510.parest_r (-1.62%), 508.namd_r (-1.57%), 525.x264_r (-1.47%).
The geo mean is -2.07%.
Some caveats: