-
Notifications
You must be signed in to change notification settings - Fork 13.3k
[RISCV] Optimize conditional branches that can be statically evaluated #131684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-amdgpu @llvm/pr-subscribers-backend-risc-v Author: Michael Maitland (michaelmaitland) ChangesThere are two changes here. The first that we teach analyzeBranch how to The second change is that we pass AllowModify=false to analyzeBranch in the This patch is an alternative or can be used in concert with #117060. This allows Full diff: https://github.com/llvm/llvm-project/pull/131684.diff 3 Files Affected:
diff --git a/llvm/lib/CodeGen/BranchFolding.cpp b/llvm/lib/CodeGen/BranchFolding.cpp
index 5218e39b88222..5dce5ea897915 100644
--- a/llvm/lib/CodeGen/BranchFolding.cpp
+++ b/llvm/lib/CodeGen/BranchFolding.cpp
@@ -1107,7 +1107,7 @@ bool BranchFolder::TailMergeBlocks(MachineFunction &MF) {
MachineBasicBlock *TBB = nullptr, *FBB = nullptr;
SmallVector<MachineOperand, 4> Cond;
- if (!TII->analyzeBranch(*PBB, TBB, FBB, Cond, true)) {
+ if (!TII->analyzeBranch(*PBB, TBB, FBB, Cond)) {
// Failing case: IBB is the target of a cbr, and we cannot reverse the
// branch.
SmallVector<MachineOperand, 4> NewCond(Cond);
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
index 2fdf6bd36e88f..c68e7db345fc5 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
@@ -1005,6 +1005,96 @@ RISCVCC::CondCode RISCVCC::getOppositeBranchCondition(RISCVCC::CondCode CC) {
}
}
+/// Return true if the branch represented by the conditional branch with
+/// components TBB, FBB, and CurCond can folded into an unconditional branch
+/// that branches to FoldedBB.
+///
+/// For example:
+/// BRCOND EQ 0, 0, BB1
+/// BR BB2
+///
+/// can be simplified to BR BB1 since 0 == 0 statically. On the other hand,
+///
+///
+/// BRCOND EQ 0, 1, BB1
+/// BR BB2
+///
+/// can be simplified to BR BB2 because 0 != 1 statically.
+static bool canFoldBrOfCondBr(MachineBasicBlock *TBB, MachineBasicBlock *FBB,
+ SmallVectorImpl<MachineOperand> &CurCond,
+ MachineBasicBlock *&FoldedBB) {
+ if (!TBB || !FBB || CurCond.size() != 3)
+ return false;
+
+ RISCVCC::CondCode CC = static_cast<RISCVCC::CondCode>(CurCond[0].getImm());
+ auto LHS = CurCond[1];
+ auto RHS = CurCond[2];
+
+ // Return true if MO definitely contains the value one.
+ auto isOne = [](MachineOperand &MO) -> bool {
+ if (MO.isImm() && MO.getImm() == 1)
+ return true;
+
+ if (!MO.isReg() || !MO.getReg().isVirtual())
+ return false;
+
+ MachineRegisterInfo &MRI =
+ MO.getParent()->getParent()->getParent()->getRegInfo();
+ MachineInstr *DefMI = MRI.getUniqueVRegDef(MO.getReg());
+ if (!DefMI)
+ return false;
+
+ // For now, just check the canonical one value.
+ if (DefMI->getOpcode() == RISCV::ADDI &&
+ DefMI->getOperand(1).getReg() == RISCV::X0 &&
+ DefMI->getOperand(2).getImm() == 1)
+ return true;
+
+ return false;
+ };
+
+ // Return true if MO definitely contains the value zero.
+ auto isZero = [](MachineOperand &MO) -> bool {
+ if (MO.isImm() && MO.getImm() == 0)
+ return true;
+ if (MO.isReg() && MO.getReg() == RISCV::X0)
+ return true;
+ return false;
+ };
+
+ switch (CC) {
+ default:
+ // TODO: Implement for more CCs
+ return false;
+ case RISCVCC::COND_EQ: {
+ // We can statically evaluate that we take the first branch
+ if ((isZero(LHS) && isZero(RHS)) || (isOne(LHS) && isOne(RHS))) {
+ FoldedBB = TBB;
+ return true;
+ }
+ // We can statically evaluate that we take the second branch
+ if ((isZero(LHS) && isOne(RHS)) || (isOne(LHS) && isZero(RHS))) {
+ FoldedBB = FBB;
+ return true;
+ }
+ return false;
+ }
+ case RISCVCC::COND_NE: {
+ // We can statically evaluate that we take the first branch
+ if ((isOne(LHS) && isZero(RHS)) || (isZero(LHS) && isOne(RHS))) {
+ FoldedBB = TBB;
+ return true;
+ }
+ // We can statically evaluate that we take the second branch
+ if ((isZero(LHS) && isZero(RHS)) || (isOne(LHS) && isOne(RHS))) {
+ FoldedBB = FBB;
+ return true;
+ }
+ return false;
+ }
+ }
+}
+
bool RISCVInstrInfo::analyzeBranch(MachineBasicBlock &MBB,
MachineBasicBlock *&TBB,
MachineBasicBlock *&FBB,
@@ -1068,8 +1158,24 @@ bool RISCVInstrInfo::analyzeBranch(MachineBasicBlock &MBB,
// Handle a conditional branch followed by an unconditional branch.
if (NumTerminators == 2 && std::prev(I)->getDesc().isConditionalBranch() &&
I->getDesc().isUnconditionalBranch()) {
- parseCondBranch(*std::prev(I), TBB, Cond);
+
+ // Try to fold the branch of the conditional branch into an unconditional
+ // branch.
+ MachineInstr &CondBr = *std::prev(I);
+ parseCondBranch(CondBr, TBB, Cond);
FBB = getBranchDestBlock(*I);
+ MachineBasicBlock *Folded = nullptr;
+ if (AllowModify && canFoldBrOfCondBr(TBB, FBB, Cond, Folded)) {
+ DebugLoc DL = MBB.findBranchDebugLoc();
+ removeBranch(MBB);
+ insertBranch(MBB, Folded, nullptr, {}, DL);
+ while (!MBB.succ_empty())
+ MBB.removeSuccessor(MBB.succ_end() - 1);
+ MBB.addSuccessor(Folded);
+ TBB = Folded;
+ FBB = nullptr;
+ Cond.clear();
+ }
return false;
}
diff --git a/llvm/test/CodeGen/RISCV/branch_zero.ll b/llvm/test/CodeGen/RISCV/branch_zero.ll
index fd0979977ba3b..95cc25ddbddd9 100644
--- a/llvm/test/CodeGen/RISCV/branch_zero.ll
+++ b/llvm/test/CodeGen/RISCV/branch_zero.ll
@@ -83,3 +83,20 @@ exit1:
if.then:
br label %for.body
}
+
+define ptr @baz() {
+; CHECK-LABEL: baz:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: # %bb.1: # %if.end12
+; CHECK-NEXT: li a0, 0
+; CHECK-NEXT: ret
+entry:
+ %or.cond = or i1 false, false
+ br i1 %or.cond, label %if.end12, label %if.then10
+
+if.then10: ; preds = %entry
+ unreachable
+
+if.end12: ; preds = %entry
+ ret ptr null
+}
|
e7670a7
to
bd669d3
Compare
You can test this locally with the following command:git-clang-format --diff 15c96d6874e8e37e583cf2994b290b9a6869cd30 cfb65551f148e39e33311f77d6fc364cfaca3473 --extensions cpp -- llvm/lib/CodeGen/BranchFolding.cpp llvm/lib/CodeGen/PeepholeOptimizer.cpp llvm/lib/Target/RISCV/RISCVInstrInfo.cpp View the diff from clang-format here.diff --git a/llvm/lib/CodeGen/PeepholeOptimizer.cpp b/llvm/lib/CodeGen/PeepholeOptimizer.cpp
index 07a57ff822..8250ea0334 100644
--- a/llvm/lib/CodeGen/PeepholeOptimizer.cpp
+++ b/llvm/lib/CodeGen/PeepholeOptimizer.cpp
@@ -447,6 +447,7 @@ public:
using RecurrenceCycle = SmallVector<RecurrenceInstr, 4>;
bool needToInvalidateMLI() const { return NeedToInvalidateMLI; }
+
private:
bool optimizeCmpInstr(MachineInstr &MI);
bool optimizeExtInstr(MachineInstr &MI, MachineBasicBlock &MBB,
|
The documentation for |
@@ -83,3 +71,20 @@ exit1: | |||
if.then: | |||
br label %for.body | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this test should be in this file. The optimization this test was written for was for the bltz/bgez. The bnez zero
is an llvm reduce artifact that was missed in the review and shouldn't have been commited as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced. Updated this patch with a test case from spec2017 that suffered from it. I reduced it by piping opt -O3 into llc -O3 to avoid the IR being unoptimized.
Thats not how I read it. It just gives deletion after an unconditional branch as an example, but does not limit it to just that case. |
@@ -357,11 +357,6 @@ define i64 @ctpop_i64(i64 %a) nounwind { | |||
define i1 @ctpop_i64_ugt_two(i64 %a) nounwind { | |||
; RV32I-LABEL: ctpop_i64_ugt_two: | |||
; RV32I: # %bb.0: | |||
; RV32I-NEXT: beqz zero, .LBB6_2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a missed post legalizer constant fold in GISel.
; RV64IZCMP-NEXT: .cfi_restore ra | ||
; RV64IZCMP-NEXT: .cfi_def_cfa_offset 0 | ||
; RV64IZCMP-NEXT: tail f2 | ||
|
||
entry: | ||
br i1 poison, label %if.T, label %if.F |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks an llvm reduce artifact. A test with branch on poison is fragile and risks being broken in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
I'm not seeing any test changes that aren't bad tests or missing folds in SelectionDAG/GlobalISel. Are any of the tests representative of the changes you're seeing in spec2006? |
I also think AllowModify should be purged. Last time I looked at it, I wasn't sure it was ever set to true |
It is set to true in some cases, and cannot be true in others. |
f703ad7
to
7985cef
Compare
ee9a134
to
c403110
Compare
…ructions This is an alternative to llvm#117060, and is stacked on llvm#131684. Marking @mikhailramalho as Co-Author here because I got the idea of a late peephole pass and the test case from llvm#117060. I use a late pass because we introduce the optimizable branches so late in the pipeline. Co-authored-by: Mikhail R. Gadelha <mikhail@igalia.com>
…ructions This is an alternative to llvm#117060, and is stacked on llvm#131684. Marking @mikhailramalho as Co-Author here because I got the idea of a late peephole pass and the test case from llvm#117060. I use a late pass because we introduce the optimizable branches so late in the pipeline. Co-authored-by: Mikhail R. Gadelha <mikhail@igalia.com>
…ructions This is an alternative to llvm#117060, and is stacked on llvm#131684. Marking @mikhailramalho as Co-Author here because I got the idea of a late peephole pass and the test case from llvm#117060. I use a late pass because we introduce the optimizable branches so late in the pipeline. Co-authored-by: Mikhail R. Gadelha <mikhail@igalia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! It's great to see this finally fixed.
Just one question: have you consider implementing this in RISCVInstrInfo::optimizeCondBranch
?
It's only used here:
if (MI->isConditionalBranch() && optimizeCondBranch(*MI)) { |
but I wonder if we could get more cases covered by changing optimizeCondBranch
(instead of the new trySimplifyCondBr
) + your changes to RISCVInstrInfo::analyzeBranch
.
I'll run SPEC with your patch tonight, we should have the results tomorrow morning
Let me see if I can use optimizeCondBranch and post an update here. You should run spec with #131728 instead of this patch because we will clean up more cases. |
e4544b5
to
cfb6555
Compare
I can, but I'd like to get support that this is the right direction before doing it. The drawback about using |
Two related, but slightly off topic points.
|
@preames theres a really silly scenario where sometimes DT is not required for this pass but MLI is. We cannot update MLI without DT. Otherwise I would have. |
I found myself asking the same question Craig did here. I have a feeling here that the scope of this patch has increased enough that you should go take another look at this question. I don't have any more than a feeling here, but to my understanding, leaving an unconditional branch followed by a conditional one is supposed to be legal, and cleaned up later. Just a guess, did you maybe leave the successors in an invalid state? If you're not careful, it would be easy to delete a successor edge with this code structure. |
Another possibility on factoring. Have you explored what happens if rather than actually performing the branch rewrite, you just adjust the result from analyzeBranch to reflect the fact this branch is unconditional? (That is, after parseCondBranch, check if the condition is constant, and setup out-params as-if it were?) |
I've since fixed all the crashes. It was happening because MachineLoopInfo was marked as preserved but we were changing CFG |
If I understand things. Adding this to optimizeCondBranch and relying on the call from PeepholeOptimizer doesn't get any optimizations on real world code because the problem we're trying to solve is created long after after the PeepholeOptimizer. It effects some tests in lit, but that's primarily things that should have been folded in IR like a branch on We still need a late optimization pass to run some kind of branch cleanup to get the real world cases created by tail duplication, machine block placement, machine copy propagation. So I guess my question is, what do we gain by spreading the logic around instead of doing everything in a late pass like #117060? |
My suggestion is for us to give up on this patch and @mikhailramalho to drive #117060 forward. @mikhailramalho, please feel free to steal any code you like from this patch or an earlier fixup that this patch introduced. |
07d4d5d
to
cfb6555
Compare
optimizeCondBranch isn't allowed to modify the CFG, but it can rewrite the branch condition freely. However, If we could fold a conditional branch to an unconditional one (aside from that restriction), we can also rewrite it into some canonical conditional branch instead. Looking at the diffs, the only cases this catches in tree tests are cases where we could have constant folded during lowering from IR, but didn't. This is inspired by trying to salvage code from llvm#131684 which might be useful. Given the test impact, it's of questionable merits. The main advantage over only the late cleanup pass is that it kills off the LIs for the constants early - which can help e.g. register allocation.
This patch is an alternative to PRs llvm#117060, llvm#131684, llvm#131728. The patch adds a late optimization pass that replaces conditional branches that can be statically evaluated with an unconditinal branch. Adding michael as a co-author as most of code that evaluates the condition comes from llvm#131684.
…#132988) optimizeCondBranch isn't allowed to modify the CFG, but it can rewrite the branch condition freely. However, If we could fold a conditional branch to an unconditional one (aside from that restriction), we can also rewrite it into some canonical conditional branch instead. Looking at the diffs, the only cases this catches in tree tests are cases where we could have constant folded during lowering from IR, but didn't. This is inspired by trying to salvage code from #131684 which might be useful. Given the test impact, it's of questionable merits. The main advantage over only the late cleanup pass is that it kills off the LIs for the constants early - which can help e.g. register allocation.
…eCondBranch (#132988) optimizeCondBranch isn't allowed to modify the CFG, but it can rewrite the branch condition freely. However, If we could fold a conditional branch to an unconditional one (aside from that restriction), we can also rewrite it into some canonical conditional branch instead. Looking at the diffs, the only cases this catches in tree tests are cases where we could have constant folded during lowering from IR, but didn't. This is inspired by trying to salvage code from llvm/llvm-project#131684 which might be useful. Given the test impact, it's of questionable merits. The main advantage over only the late cleanup pass is that it kills off the LIs for the constants early - which can help e.g. register allocation.
This patch is an alternative to PRs #117060, #131684, #131728. The patch adds a late optimization pass that replaces conditional branches that can be statically evaluated with an unconditinal branch. Adding Michael as a co-author as most of the code that evaluates the condition comes from #131684. Co-authored-by: Michael Maitland michaeltmaitland@gmail.com
We teach
optimizeCondBranch
how to evaluate statically known conditionsand replace with an unconditional branch.
We teach
analyzeBranch
to calloptimizeCondBranch
whenAllowModify
is set to true.
We pass
AllowModify=false
to analyzeBranch in the tail mergingcode because this code is doing some clever tricks to the branch code
that it will restore later. We need to be careful not to take it out of the
form it expects the branches to be in.
Now that PeepholeOptimizer may modify the CFG, we need to account
for that. We do as much work in updating DT and MLI as possible to
reduce the number of times that we need to invalidate these passes.
We can consider a future patch of calling
optimizeCondBranch
in moreplaces, such as a later pass to catch more cases.