Skip to content

[FlattenCFG] Fixup Phi nodes during CFG flattening #143766

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

HighW4y2H3ll
Copy link
Member

@HighW4y2H3ll HighW4y2H3ll commented Jun 11, 2025

FlattenCFG Pass is skipped when Basic Block contains a Phi node. Fix up the Phi node with a SelectInst so that FlattenCFG can be ran again.

@llvmbot
Copy link
Member

llvmbot commented Jun 11, 2025

@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-llvm-transforms

Author: None (HighW4y2H3ll)

Changes

SimplifyCFG currently only merges basic block into its predecessor with PredecessorWithTwoSuccessors turned off, even though it's plausible to merge when the predecessor basic block has two successors and the instructions in the current basic block have no side effects. This patch re-enables this CFG transformation in the SimplifyCFG pass so that we can make more aggressive optimizations.


Full diff: https://github.com/llvm/llvm-project/pull/143766.diff

5 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/SimplifyCFGPass.cpp (+2)
  • (modified) llvm/lib/Transforms/Utils/BasicBlockUtils.cpp (+33)
  • (modified) llvm/lib/Transforms/Utils/Local.cpp (+3-1)
  • (modified) llvm/lib/Transforms/Utils/SimplifyCFG.cpp (+1-1)
  • (added) llvm/test/Transforms/SimplifyCFG/two-succ-merge-block.ll (+58)
diff --git a/llvm/lib/Transforms/Scalar/SimplifyCFGPass.cpp b/llvm/lib/Transforms/Scalar/SimplifyCFGPass.cpp
index 4e437e9abeb43..864a3c35ed79e 100644
--- a/llvm/lib/Transforms/Scalar/SimplifyCFGPass.cpp
+++ b/llvm/lib/Transforms/Scalar/SimplifyCFGPass.cpp
@@ -264,6 +264,8 @@ static bool iterativelySimplifyCFG(Function &F, const TargetTransformInfo &TTI,
       if (simplifyCFG(&BB, TTI, DTU, Options, LoopHeaders)) {
         LocalChange = true;
         ++NumSimpl;
+        // Function iterator might not be valid anymore
+        break;
       }
     }
     Changed |= LocalChange;
diff --git a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
index 6608515e1cbbc..f9fda9625ea36 100644
--- a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
+++ b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
@@ -207,6 +207,10 @@ bool llvm::MergeBlockIntoPredecessor(BasicBlock *BB, DomTreeUpdater *DTU,
   if (PredecessorWithTwoSuccessors) {
     if (!(PredBB_BI = dyn_cast<BranchInst>(PTI)))
       return false;
+    // Can't merge if any of the instructions has side effects
+    for (Instruction &candid : *BB)
+      if (candid.mayHaveSideEffects())
+        return false;
     BranchInst *BB_JmpI = dyn_cast<BranchInst>(BB->getTerminator());
     if (!BB_JmpI || !BB_JmpI->isUnconditional())
       return false;
@@ -283,6 +287,35 @@ bool llvm::MergeBlockIntoPredecessor(BasicBlock *BB, DomTreeUpdater *DTU,
   if (MSSAU)
     MSSAU->moveAllAfterMergeBlocks(BB, PredBB, Start);
 
+  // PHI nodes that references the Pred shall be replaced with a SelectInst
+  for (BasicBlock *Succ : successors(PTI)) {
+    for (PHINode &PhiSucc : Succ->phis()) {
+      if (llvm::is_contained(PhiSucc.blocks(), PredBB) && llvm::is_contained(PhiSucc.blocks(), BB)) {
+        if (PredBB_BI->isConditional()) {
+          Value *Cond = PredBB_BI->getCondition();
+          Value *OrigPhi0 = PhiSucc.removeIncomingValue(PredBB, false);
+          Value *OrigPhi1 = PhiSucc.removeIncomingValue(BB, false);
+
+          SelectInst *newPhi = nullptr;
+          // Swap branch given the conditions
+          if (PredBB_BI->getSuccessor(0) == Succ) {
+            newPhi = SelectInst::Create(Cond, OrigPhi0, OrigPhi1, "", PTI->getIterator());
+          } else {
+            newPhi = SelectInst::Create(Cond, OrigPhi1, OrigPhi0, "", PTI->getIterator());
+          }
+
+          // Erase PhiNode if its empty
+          if (PhiSucc.getNumOperands() == 0) {
+            PhiSucc.replaceAllUsesWith(newPhi);
+            PhiSucc.eraseFromParent();
+            break;
+          }
+          PhiSucc.addIncoming(newPhi, PredBB);
+        }
+      }
+    }
+  }
+
   // Make all PHI nodes that referred to BB now refer to Pred as their
   // source...
   BB->replaceAllUsesWith(PredBB);
diff --git a/llvm/lib/Transforms/Utils/Local.cpp b/llvm/lib/Transforms/Utils/Local.cpp
index be71cb69ad8cc..c382c1e385134 100644
--- a/llvm/lib/Transforms/Utils/Local.cpp
+++ b/llvm/lib/Transforms/Utils/Local.cpp
@@ -153,7 +153,9 @@ bool llvm::ConstantFoldTerminator(BasicBlock *BB, bool DeleteDeadConditions,
 
       // Let the basic block know that we are letting go of one copy of it.
       assert(BI->getParent() && "Terminator not inserted in block!");
-      Dest1->removePredecessor(BI->getParent());
+      // Keep the predecesor if Dest has multiple incoming edges
+      if (Dest1->getUniquePredecessor())
+        Dest1->removePredecessor(BI->getParent());
 
       // Replace the conditional branch with an unconditional one.
       BranchInst *NewBI = Builder.CreateBr(Dest1);
diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
index e221022bb8361..27b0692ca7a4f 100644
--- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
@@ -8337,7 +8337,7 @@ bool SimplifyCFGOpt::simplifyOnce(BasicBlock *BB) {
   // Merge basic blocks into their predecessor if there is only one distinct
   // pred, and if there is only one distinct successor of the predecessor, and
   // if there are no PHI nodes.
-  if (MergeBlockIntoPredecessor(BB, DTU))
+  if (MergeBlockIntoPredecessor(BB, DTU, nullptr, nullptr, nullptr, true))
     return true;
 
   if (SinkCommon && Options.SinkCommonInsts)
diff --git a/llvm/test/Transforms/SimplifyCFG/two-succ-merge-block.ll b/llvm/test/Transforms/SimplifyCFG/two-succ-merge-block.ll
new file mode 100644
index 0000000000000..eb8ac774b6e5b
--- /dev/null
+++ b/llvm/test/Transforms/SimplifyCFG/two-succ-merge-block.ll
@@ -0,0 +1,58 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --tool ./opt --version 5
+; RUN: opt < %s -passes=simplifycfg -S | FileCheck %s
+
+define i1 @_Z7compareRK1SS1_(ptr %a, ptr %b) {
+; CHECK-LABEL: @_Z7compareRK1SS1_(
+; CHECK-SAME: ptr [[A:%.*]], ptr [[B:%.*]]) {
+; CHECK-NEXT: entry:
+; CHECK-NEXT:   %0 = load i32, ptr %a, align 4, !tbaa !3
+; CHECK-NEXT:   %1 = load i32, ptr %b, align 4, !tbaa !3
+; CHECK-NEXT:   %cmp.i = icmp slt i32 %0, %1
+; CHECK-NEXT:   %cmp.i19 = icmp eq i32 %0, %1
+; CHECK-NEXT:   %y = getelementptr inbounds nuw i8, ptr %a, i64 4
+; CHECK-NEXT:   %2 = load i32, ptr %y, align 4, !tbaa !8
+; CHECK-NEXT:   %y14 = getelementptr inbounds nuw i8, ptr %b, i64 4
+; CHECK-NEXT:   %3 = load i32, ptr %y14, align 4, !tbaa !8
+; CHECK-NEXT:   %cmp = icmp slt i32 %2, %3
+; CHECK-NEXT:   %4 = select i1 %cmp.i19, i1 %cmp, i1 false
+; CHECK-NEXT:   %5 = select i1 %cmp.i, i1 true, i1 %4
+; CHECK-NEXT:   br label %lor.end
+; CHECK-EMPTY:
+; CHECK-NEXT: lor.end:                                          ; preds = %entry
+; CHECK-NEXT:   ret i1 %5
+entry:
+  %0 = load i32, ptr %a, align 4, !tbaa !3
+  %1 = load i32, ptr %b, align 4, !tbaa !3
+  %cmp.i = icmp slt i32 %0, %1
+  br i1 %cmp.i, label %lor.end, label %lor.rhs
+
+lor.rhs:                                          ; preds = %entry
+  %cmp.i19 = icmp eq i32 %0, %1
+  br i1 %cmp.i19, label %land.rhs, label %lor.end
+
+land.rhs:                                         ; preds = %lor.rhs
+  %y = getelementptr inbounds nuw i8, ptr %a, i64 4
+  %2 = load i32, ptr %y, align 4, !tbaa !8
+  %y14 = getelementptr inbounds nuw i8, ptr %b, i64 4
+  %3 = load i32, ptr %y14, align 4, !tbaa !8
+  %cmp = icmp slt i32 %2, %3
+  br label %lor.end
+
+lor.end:                                          ; preds = %lor.rhs, %land.rhs, %entry
+  %4 = phi i1 [ true, %entry ], [ false, %lor.rhs ], [ %cmp, %land.rhs ]
+  ret i1 %4
+}
+
+!llvm.module.flags = !{!0, !1}
+!llvm.ident = !{!2}
+
+!0 = !{i32 1, !"wchar_size", i32 4}
+!1 = !{i32 7, !"uwtable", i32 2}
+!2 = !{!"clang"}
+!3 = !{!4, !5, i64 0}
+!4 = !{!"_ZTS1S", !5, i64 0, !5, i64 4}
+!5 = !{!"int", !6, i64 0}
+!6 = !{!"omnipotent char", !7, i64 0}
+!7 = !{!"Simple C++ TBAA"}
+!8 = !{!4, !5, i64 4}
+!9 = !{!5, !5, i64 0}

Copy link

github-actions bot commented Jun 11, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@HighW4y2H3ll
Copy link
Member Author

Still have a lot of unit testing failures that needs to be double checked... but would like to hear some feedbacks on this if possible... thx :)

@HighW4y2H3ll HighW4y2H3ll force-pushed the fix_cfg_flatten branch 4 times, most recently from e315f89 to d912e14 Compare June 12, 2025 06:58
Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you modified just MergeBlockIntoPredecessor without actually using the PredecessorWithTwoSuccessors mode anywhere in SimplifyCFG, so I don't see how your test could possibly work.

If I understand what you're trying to do, this is effectively speculating blocks into predecessors. Checking for side effects is incorrect there, you need to check speculability. For example, in your test case speculation is not legal because the loads are non-speculatable. On top of that, speculating instructions requires cost modelling, you can't just dump a block of arbitrary size into the predecessor, even though it may not be executed at all.

SimplifyCFG already has various transforms that do this correctly and with cost modelling.

@HighW4y2H3ll
Copy link
Member Author

Thanks for checking!! Sorry that I missed a commit earlier which actually uses the PredecessorWithTwoSuccessors. The speculativelyExecuteBB transformation seems to be more restrictive to the instructions: e.g. only allowing conditional load/store, only speculate merge basic block with one additional instruction etc. I'm checking the FlattenCFG pass right now. Maybe that's actually the right place for such transformation...

@HighW4y2H3ll HighW4y2H3ll changed the title [SimplifyCFG] Enable MergeBlockIntoPredecessor with two successors [FlattenCFG] Fixup Phi nodes during CFG flattening Jun 13, 2025
@HighW4y2H3ll
Copy link
Member Author

@nikic I fixed up the FlattenCFG pass which shall be able to handle this CFG now! Could you take another look when its convenient? Thank you! :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like your transform lost the contents of land.rhs entirely?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh not at all, I just skipped the checking of the land.rhs since its entirely unchanged. let me add it there if that works better.

@HighW4y2H3ll
Copy link
Member Author

@nikic Could you take a look at this PR again? I've fixed all the unittests and it shall be ready for review! :)
At the same time, I put together another PR: #144434 which uses proper cost model to emit SelectInst and merge basic blocks that have common successor. Could you also take a look at that too? Thx! :)

@HighW4y2H3ll HighW4y2H3ll requested review from MatzeB and arsenm June 17, 2025 22:37
@HighW4y2H3ll
Copy link
Member Author

Also cc @arsenm since it's touching quite a bit of AMDGPU unit tests... thx! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants