-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Add late optimization pass for riscv #117060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Mikhail R. Gadelha <mikhail@igalia.com>
Signed-off-by: Mikhail R. Gadelha <mikhail@igalia.com>
Signed-off-by: Mikhail R. Gadelha <mikhail@igalia.com>
You can test this locally with the following command:git-clang-format --diff b49c4af186a6de8f201ed6a4c326ebf822d4fd84 f75f6a174387487d99d1c98a44bdbd56fec6fc6b --extensions cpp,h -- llvm/lib/Target/RISCV/RISCVLatePeephole.cpp llvm/lib/Target/RISCV/RISCV.h llvm/lib/Target/RISCV/RISCVTargetMachine.cpp View the diff from clang-format here.diff --git a/llvm/lib/Target/RISCV/RISCVLatePeephole.cpp b/llvm/lib/Target/RISCV/RISCVLatePeephole.cpp
index 11add97aa8..d30cbdf68b 100644
--- a/llvm/lib/Target/RISCV/RISCVLatePeephole.cpp
+++ b/llvm/lib/Target/RISCV/RISCVLatePeephole.cpp
@@ -59,7 +59,8 @@ char RISCVLatePeepholeOpt::ID = 0;
INITIALIZE_PASS(RISCVLatePeepholeOpt, "riscv-late-peephole",
RISCV_LATE_PEEPHOLE_NAME, false, false)
-void RISCVLatePeepholeOpt::removeBlock(MachineBasicBlock *B, MachineBasicBlock *NewB) {
+void RISCVLatePeepholeOpt::removeBlock(MachineBasicBlock *B,
+ MachineBasicBlock *NewB) {
LLVM_DEBUG(dbgs() << "Removing block '#'" << B->getNumber() << "\n");
// Transfer the immediate dominator information from B to its descendants.
|
|
||
void getAnalysisUsage(AnalysisUsage &AU) const override { | ||
AU.addRequired<MachineDominatorTreeWrapperPass>(); | ||
AU.addPreserved<MachineDominatorTreeWrapperPass>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we don't preserve the dominator tree? Do any passes use it after this?
A couple of random thoughts on possible fixes. There's aren't exclusive, it's possible we can/should do multiple or none of these. If we have zicond, the whole problem goes away. There may also be an arithmetic lowering available for this particular select - I haven't closely looked. The problematic branch is created by MachineBlockPlacement when duplicating the tail. It seems odd to me that it appears to make no effort to simplify after duplicating. This would seem to unblock additional transformations. I can't find any target which implements this "trivially evaluated cond branch -> unconditional branch" in analyzeBranch, but I also can't find any reason why you can't do this if AllowModify=true. |
Are you suggesting analyzeBranch should look through copies to figure this out? |
No, I was suggesting that after copy propagation that analyzeBranch could perform at least the conversion to the unconditional branch. Though honestly, recognizing the ADDI XN, x0, 0 idiom might not be crazy either. |
|
||
Register DstReg = T1I->getOperand(0).getReg(); | ||
Register SrcReg = T1I->getOperand(1).getReg(); | ||
if (DstReg != SrcReg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if we keep this as a separate late peephole, I'd suggest splitting this into two transforms:
- Transform one does the bne zero, zero -> jump conversion.
- Transform two does the jump threading of an edge through a single basis block.
Doing it that way should make the code easier to read, and test. It'll also be more powerful.
Note that we probably won't keep the separate pass, so I wouldn't necessarily spend time implementing this just yet.
Yesterday I noticed Currently, it works only for virtual regs, and it's designed to simplify some cases of blt/bge, but it should be straightforward to change it |
…ructions This is an alternative to llvm#117060, and is stacked on llvm#131684. Marking @mikhailramalho as Co-Author here because I got the idea of a late peephole pass and the test case from llvm#117060. I use a late pass because we introduce the optimizable branches so late in the pipeline. Co-authored-by: Mikhail R. Gadelha <mikhail@igalia.com>
…ructions This is an alternative to llvm#117060, and is stacked on llvm#131684. Marking @mikhailramalho as Co-Author here because I got the idea of a late peephole pass and the test case from llvm#117060. I use a late pass because we introduce the optimizable branches so late in the pipeline. Co-authored-by: Mikhail R. Gadelha <mikhail@igalia.com>
…ructions This is an alternative to llvm#117060, and is stacked on llvm#131684. Marking @mikhailramalho as Co-Author here because I got the idea of a late peephole pass and the test case from llvm#117060. I use a late pass because we introduce the optimizable branches so late in the pipeline. Co-authored-by: Mikhail R. Gadelha <mikhail@igalia.com>
@@ -552,6 +553,7 @@ void RISCVPassConfig::addPreEmitPass() { | |||
EnableRISCVCopyPropagation) | |||
addPass(createMachineCopyPropagationPass(true)); | |||
addPass(&BranchRelaxationPassID); | |||
addPass(createRISCVLatePeepholeOptPass()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we should be messing with branches after the branch relaxation pass
This patch is an alternative to PRs llvm#117060, llvm#131684, llvm#131728. The patch adds a late optimization pass that replaces conditional branches that can be statically evaluated with an unconditinal branch. Adding michael as a co-author as most of code that evaluates the condition comes from llvm#131684.
This patch is an alternative to PRs #117060, #131684, #131728. The patch adds a late optimization pass that replaces conditional branches that can be statically evaluated with an unconditinal branch. Adding Michael as a co-author as most of the code that evaluates the condition comes from #131684. Co-authored-by: Michael Maitland michaeltmaitland@gmail.com
This patch adds a late optimization pass (riscv only) to remove blocks with a single instruction, a
beqz
orbnez
that compares the zero register.The code here is based on HexagonEarlyIfConversion.
A few passes work together to generate the unoptimized code: Virtual Register Rewriter, Post-RA pseudo instruction, Branch Probability Basic Block Placement, and Machine Copy Propagation Pass produce:
The BNE can jump straight to bb.4, since bb.1 only has one instruction and always jumps to bb.4.
A couple of passes can remove this branch if they are present early (e.g., CodeGen Prepare and early tail duplication can remove a block with an unconditional branch), but given that
BEQ $x0, $x0, %bb.4
only appears after the Machine Copy Propagation Pass, we end up generating unoptimized code.This patch fixes it by adding the new optimization pass right after the Machine Copy Propagation Pass.
TODO: