Skip to content

Conversation

@XChy
Copy link
Member

@XChy XChy commented Aug 21, 2025

Fixes #153295.
For test case below:

define i32 @caller() {
entry:
  %call1 = call i32 @callee(i32 1)
  %call2 = call i32 @callee(i32 0)
  %cond = icmp eq i32 %call2, 0
  br i1 %cond, label %common.ret, label %if.then

common.ret:                                       ; preds = %entry
  ret i32 0

if.then:                                         ; preds = %entry
  %unreachable_call = call i32 @callee(i32 2)
  ret i32 %unreachable_call
}

define internal i32 @callee(i32 %ac) {
entry:
  br label %ai

ai:                                               ; preds = %ai, %entry
  %add = or i32 0, 0
  %cond = icmp eq i32 %ac, 1
  br i1 %cond, label %aj, label %ai

aj:                                               ; preds = %ai
  ret i32 0
}

Before specialization, the SCCP solver determines that
unreachable_call is unexecutable, as the value of callee can only be
zero.
After specializing the call sites call1 and call2, FnSpecializer
announces callee is a dead function since all executable call sites
are specialized. However, the unexecutable call sites can become
executable again after solving specialized calls.
In this testcase, call2 is considered Overdefined after
specialization, making cond also Overdefined. Thus,
unreachable_call becomes executable.
This patch skips SCCP on the blocks in dead functions, and poisons the
call sites of dead functions.

@llvmbot
Copy link
Member

llvmbot commented Aug 21, 2025

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-function-specialization

Author: XChy (XChy)

Changes

Fixes #153295.
For test case below:

define i32 @<!-- -->caller() {
entry:
  %call1 = call i32 @<!-- -->callee(i32 1)
  %call2 = call i32 @<!-- -->callee(i32 0)
  %cond = icmp eq i32 %call2, 0
  br i1 %cond, label %common.ret, label %if.then

common.ret:                                       ; preds = %entry
  ret i32 0

if.then:                                         ; preds = %entry
  %unreachable_call = call i32 @<!-- -->callee(i32 2)
  ret i32 %unreachable_call
}

define internal i32 @<!-- -->callee(i32 %ac) {
entry:
  br label %ai

ai:                                               ; preds = %ai, %entry
  %add = or i32 0, 0
  %cond = icmp eq i32 %ac, 1
  br i1 %cond, label %aj, label %ai

aj:                                               ; preds = %ai
  ret i32 0
}

Before specialization, the SCCP solver determines that unreachable_call is unexecutable, as the value of callee can only be zero.
After specializing the call sites call1 and call2, FnSpecializer announces callee is a dead function since all executable call sites are specialized. However, the unexecutable call sites can become executable again after solving specialized calls.
In this testcase, call2 is considered Overdefined after specialization, making cond also Overdefined. Thus, unreachable_call becomes executable.
This patch prevents marking a function as dead and fully specialized if any unexecutable call site exists.


Full diff: https://github.com/llvm/llvm-project/pull/154668.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/IPO/FunctionSpecialization.cpp (+11-7)
  • (added) llvm/test/Transforms/FunctionSpecialization/reachable-after-specialization.ll (+42)
diff --git a/llvm/lib/Transforms/IPO/FunctionSpecialization.cpp b/llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
index c876a47ef2129..c799bb54a34b6 100644
--- a/llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
+++ b/llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
@@ -1167,15 +1167,17 @@ Constant *FunctionSpecializer::getCandidateConstant(Value *V) {
 
 void FunctionSpecializer::updateCallSites(Function *F, const Spec *Begin,
                                           const Spec *End) {
-  // Collect the call sites that need updating.
+  // Collect the call sites that need updating and count ALL the call sites.
   SmallVector<CallBase *> ToUpdate;
-  for (User *U : F->users())
-    if (auto *CS = dyn_cast<CallBase>(U);
-        CS && CS->getCalledFunction() == F &&
-        Solver.isBlockExecutable(CS->getParent()))
-      ToUpdate.push_back(CS);
+  unsigned NCallsLeft = 0;
+  for (User *U : F->users()) {
+    if (auto *CS = dyn_cast<CallBase>(U); CS && CS->getCalledFunction() == F) {
+      NCallsLeft++;
+      if (Solver.isBlockExecutable(CS->getParent()))
+        ToUpdate.push_back(CS);
+    }
+  }
 
-  unsigned NCallsLeft = ToUpdate.size();
   for (CallBase *CS : ToUpdate) {
     bool ShouldDecrementCount = CS->getFunction() == F;
 
@@ -1207,6 +1209,8 @@ void FunctionSpecializer::updateCallSites(Function *F, const Spec *Begin,
 
   // If the function has been completely specialized, the original function
   // is no longer needed. Mark it unreachable.
+  // NOTE: We cannot mark it unreachable if any unexecutable call site exists,
+  // as the unexecutable call site may become executable due to specialization.
   if (NCallsLeft == 0 && Solver.isArgumentTrackedFunction(F)) {
     Solver.markFunctionUnreachable(F);
     FullySpecialized.insert(F);
diff --git a/llvm/test/Transforms/FunctionSpecialization/reachable-after-specialization.ll b/llvm/test/Transforms/FunctionSpecialization/reachable-after-specialization.ll
new file mode 100644
index 0000000000000..92685c5c72f2e
--- /dev/null
+++ b/llvm/test/Transforms/FunctionSpecialization/reachable-after-specialization.ll
@@ -0,0 +1,42 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -passes=ipsccp  --funcspec-min-function-size=1 -S < %s | FileCheck %s
+
+define i32 @caller() {
+; CHECK-LABEL: define i32 @caller() {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[CALL1:%.*]] = call i32 @callee.specialized.1(i32 1)
+; CHECK-NEXT:    [[CALL2:%.*]] = call i32 @callee.specialized.2(i32 0)
+; CHECK-NEXT:    [[COND:%.*]] = icmp eq i32 undef, 0
+; CHECK-NEXT:    br i1 [[COND]], label %[[COMMON_RET:.*]], label %[[IF_THEN:.*]]
+; CHECK:       [[COMMON_RET]]:
+; CHECK-NEXT:    ret i32 0
+; CHECK:       [[IF_THEN]]:
+; CHECK-NEXT:    [[UNREACHABLE_CALL:%.*]] = call i32 @callee.specialized.3(i32 2)
+; CHECK-NEXT:    ret i32 undef
+;
+entry:
+  %call1 = call i32 @callee(i32 1)
+  %call2 = call i32 @callee(i32 0)
+  %cond = icmp eq i32 %call2, 0
+  br i1 %cond, label %common.ret, label %if.then
+
+common.ret:                                       ; preds = %entry
+  ret i32 0
+
+if.then:                                         ; preds = %entry
+  %unreachable_call = call i32 @callee(i32 2)
+  ret i32 %unreachable_call
+}
+
+define internal i32 @callee(i32 %ac) {
+entry:
+  br label %ai
+
+ai:                                               ; preds = %ai, %entry
+  %add = or i32 0, 0
+  %cond = icmp eq i32 %ac, 1
+  br i1 %cond, label %aj, label %ai
+
+aj:                                               ; preds = %ai
+  ret i32 0
+}

@XChy XChy changed the title [FunctionnSpecializer] Do not mark function dead if any unexecutable call site exists [FunctionSpecializer] Do not mark function dead if any unexecutable call site exists Aug 21, 2025
@github-actions
Copy link

github-actions bot commented Aug 21, 2025

⚠️ undef deprecator found issues in your code. ⚠️

You can test this locally with the following command:
git diff -U0 --pickaxe-regex -S '([^a-zA-Z0-9#_-]undef[^a-zA-Z0-9_-]|UndefValue::get)' 'HEAD~1' HEAD llvm/test/Transforms/FunctionSpecialization/reachable-after-specialization.ll llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h llvm/lib/Transforms/IPO/FunctionSpecialization.cpp llvm/lib/Transforms/IPO/SCCP.cpp

The following files introduce new uses of undef:

  • llvm/test/Transforms/FunctionSpecialization/reachable-after-specialization.ll

Undef is now deprecated and should only be used in the rare cases where no replacement is possible. For example, a load of uninitialized memory yields undef. You should use poison values for placeholders instead.

In tests, avoid using undef and having tests that trigger undefined behavior. If you need an operand with some unimportant value, you can add a new argument to the function and use that instead.

For example, this is considered a bad practice:

define void @fn() {
  ...
  br i1 undef, ...
}

Please use the following instead:

define void @fn(i1 %cond) {
  ...
  br i1 %cond, ...
}

Please refer to the Undefined Behavior Manual for more information.

@nikic nikic requested a review from labrinea August 21, 2025 14:16
Copy link
Contributor

@antoniofrighetto antoniofrighetto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLVM IR module before BBs pruning in SCCP:

define i32 @caller() {
entry:
  %call1 = call i32 @callee.specialized.1(i32 1)
  %call2 = call i32 @callee.specialized.2(i32 0)
  %cond = icmp eq i32 undef, 0
  br i1 %cond, label %common.ret, label %if.then

common.ret:                                       ; preds = %entry
  ret i32 0

if.then:                                          ; preds = %entry
  %unreachable_call = call i32 @callee(i32 2)
  ret i32 0
}

define internal i32 @callee(i32 %ac) {
entry:
  br label %ai

ai:                                               ; preds = %entry
  unreachable

aj:                                               ; No predecessors!
  unreachable
}

define internal i32 @callee.specialized.1(i32 %ac) {
entry:
  br label %ai

ai:                                               ; preds = %ai, %entry
  %add = or i32 0, 0
  %cond = icmp eq i32 %ac, 1
  br i1 %cond, label %aj, label %ai

aj:                                               ; preds = %ai
  ret i32 0
}

define internal i32 @callee.specialized.2(i32 %ac) {
entry:
  br label %ai

ai:                                               ; preds = %ai, %entry
  %add = or i32 0, 0
  %cond = icmp eq i32 %ac, 1
  br i1 %cond, label %aj, label %ai

aj:                                               ; preds = %ai
  ret i32 0
}

As you correctly noted, FunctionSpecialization happens to look only at executable call-sites. Since the last call-site is not marked as executable, it is ignored by specialization. IIUC the fix correctly, you're suggesting to introduce specialization to those call-sites that live in basic blocks which were determined by the solver to be never executable.

The current code in updateCallSites seems correct to me, as callee is eventually marked as unreachable, as being fully specialized along all its executable-proven paths:

// If the function has been completely specialized, the original function
// is no longer needed. Mark it unreachable.
if (NCallsLeft == 0 && Solver.isArgumentTrackedFunction(F)) {
Solver.markFunctionUnreachable(F);
FullySpecialized.insert(F);
}

Isn't the discrepancy here stemming from the fact that we are attempting to remove basic blocks of a fully specialized function, where it should be up to FunctionSpecializer's destructor to remove such functions (in which case, unreachable call-sites should be removed as well)?

@XChy
Copy link
Member Author

XChy commented Aug 23, 2025

Thanks for your detailed review.

IIUC the fix correctly, you're suggesting to introduce specialization to those call-sites that live in basic blocks which were determined by the solver to be never executable.

Exactly, my original purpose is to avoid removing functions that are proven inexecutable but may be reset to be executable later. New specialization is not my purpose.

Isn't the discrepancy here stemming from the fact that we are attempting to remove basic blocks of a fully specialized function, where it should be up to FunctionSpecializer's destructor to remove such functions (in which case, unreachable call-sites should be removed as well)?

Sounds reasonable to me. It's the undefined behaviour that causes inconsistent executability. How about replacing the unreachable call-sites with poison and removing them? The return value of the once-proven-inexecutable call is invalid in fact.

@antoniofrighetto
Copy link
Contributor

Isn't the discrepancy here stemming from the fact that we are attempting to remove basic blocks of a fully specialized function, where it should be up to FunctionSpecializer's destructor to remove such functions (in which case, unreachable call-sites should be removed as well)?

Sounds reasonable to me. It's the undefined behaviour that causes inconsistent executability. How about replacing the unreachable call-sites with poison and removing them? The return value of the once-proven-inexecutable call is invalid in fact.

Right. That would make sense to me.

@XChy XChy force-pushed the fix-fnspecializer-unreachable branch from 33995e8 to bc9c37a Compare August 24, 2025 11:07
@github-actions
Copy link

github-actions bot commented Aug 24, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@antoniofrighetto antoniofrighetto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! Could you please update PR title as well?

@XChy XChy changed the title [FunctionSpecializer] Do not mark function dead if any unexecutable call site exists [FunctionSpecializer] Keep the blocks in dead functions and remove the callsites of dead function properly. Aug 25, 2025
@labrinea
Copy link
Collaborator

Hi, please leave some time for me to look at it before you merge. I'll be able to review tomorrow.

Copy link
Collaborator

@labrinea labrinea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, the unexecutable call sites can become executable again after solving specialized calls. In this testcase, call2 is considered Overdefined after specialization, making cond also Overdefined. Thus, unreachable_call becomes executable.

What is the LatticeValue of call2 after specialization? Is it undef? I am puzzled why the solver considers unreachable_call non executable in the first place. Surely call2 falls into an infinite loop not returning zero.

@antoniofrighetto
Copy link
Contributor

antoniofrighetto commented Aug 26, 2025

What is the LatticeValue of call2 after specialization? Is it undef? I am puzzled why the solver considers unreachable_call non executable in the first place.

Prior to specialization, the lattice value for call2 is constantrange<0, 1>, i.e., the solver determined it always return 0 (in fact, for %cond = icmp eq i32 %call2, 0 we do get constantrange<-1, 0>, namely condition always true), leading to %unreachable_call be never executed. I assume non-returning paths do not add further information to the call-site return summary, so a path where we return 0 suffices.

After specialization, the lattice value of call2 is unknown (not immediately clear to me why, i32 undef when querying the known constant), thus %unreachable_call gets executable again.

@XChy
Copy link
Member Author

XChy commented Aug 26, 2025

(not immediately clear to me why, i32 undef when querying the known constant), thus %unreachable_call gets executable again.

Is it because the specialized callee will not return?

Copy link
Collaborator

@labrinea labrinea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding a test for addressTaken. One remark about the first test, otherwise the patch looks ok I think. I am wondering if we need another test to demonstrate the poisoning of users, and why does it now show in the first example.

@labrinea
Copy link
Collaborator

Also I'd rename the patch to [FuncSpec] Skip SCCP on blocks of dead functions and poison their callsites

@XChy XChy changed the title [FunctionSpecializer] Keep the blocks in dead functions and remove the callsites of dead function properly. [FuncSpec] Skip SCCP on blocks of dead functions and poison their callsites Aug 27, 2025
Copy link
Collaborator

@labrinea labrinea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I am satisfied with the patch in its current form. If I am missing something I hope that a bootstrap build will catch it for us.

@XChy XChy merged commit 6bd8448 into llvm:main Aug 27, 2025
8 of 9 checks passed
XChy added a commit that referenced this pull request Aug 28, 2025
…55753)

Fixes #155738.
The original assumption "we already replaced its users with a constant"
for the global variable becomes incorrect after #154668. The users in
the dead function are not simplified, in fact.
This patch poisons all the unsimplified constant global variable users.
@mikaelholmen
Copy link
Collaborator

Hi @XChy
With this patch, the testcase

llvm/test/Transforms/FunctionSpecialization/literal-const.ll

fails if opt is built with EXPENSIVE_CHECKS.
It fails like

LLVM ERROR: Module changed by IPSCCPPass without invalidating analyses
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /repo/llvm/llvm-main-expensive-checks/llvm/build-all-expensive/bin/opt -S --passes=ipsccp<func-spec> -force-specialization
1.	Running pass "ipsccp" on module "<stdin>"
 #0 0x00005651879e1926 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/repo/llvm/llvm-main-expensive-checks/llvm/build-all-expensive/bin/opt+0x4df5926)
 #1 0x00005651879deeb5 llvm::sys::RunSignalHandlers() (/repo/llvm/llvm-main-expensive-checks/llvm/build-all-expensive/bin/opt+0x4df2eb5)
 #2 0x00005651879e2af9 SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
 #3 0x00007f7e4a43c990 __restore_rt (/lib64/libpthread.so.0+0x12990)
 #4 0x00007f7e494e452f raise (/lib64/libc.so.6+0x4e52f)
 #5 0x00007f7e494b7e65 abort (/lib64/libc.so.6+0x21e65)
 #6 0x00005651879a7014 llvm::report_fatal_error(llvm::Twine const&, bool) (/repo/llvm/llvm-main-expensive-checks/llvm/build-all-expensive/bin/opt+0x4dbb014)
 #7 0x0000565189102fc8 void llvm::detail::UniqueFunctionBase<void, llvm::StringRef, llvm::Any, llvm::PreservedAnalyses const&>::CallImpl<llvm::PreservedCFGCheckerInstrumentation::registerCallbacks(llvm::PassInstrumentationCallbacks&, llvm::AnalysisManager<llvm::Module>&)::$_2>(void*, llvm::StringRef, llvm::Any&, llvm::PreservedAnalyses const&) StandardInstrumentations.cpp:0:0
 #8 0x0000565187c32a6e llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/repo/llvm/llvm-main-expensive-checks/llvm/build-all-expensive/bin/opt+0x5046a6e)
 #9 0x00005651890cfd07 llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::PassPlugin>, llvm::ArrayRef<std::function<void (llvm::PassBuilder&)>>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool, bool, bool, bool) (/repo/llvm/llvm-main-expensive-checks/llvm/build-all-expensive/bin/opt+0x64e3d07)
#10 0x000056518797b95f optMain (/repo/llvm/llvm-main-expensive-checks/llvm/build-all-expensive/bin/opt+0x4d8f95f)
#11 0x00007f7e494d07e5 __libc_start_main (/lib64/libc.so.6+0x3a7e5)
#12 0x0000565187978fae _start (/repo/llvm/llvm-main-expensive-checks/llvm/build-all-expensive/bin/opt+0x4d8cfae)
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /repo/llvm/llvm-main-expensive-checks/llvm/build-all-expensive/bin/FileCheck /repo/llvm/llvm-main-expensive-checks/llvm/test/Transforms/FunctionSpecialization/literal-const.ll -check-prefix CHECK-LIT

@XChy
Copy link
Member Author

XChy commented Aug 28, 2025

It seems that (post) dom tree analysis is not updated. I am still trying to compile LLVM with EXPENSIVE_CHECKS to reproduce. Hopefully, I will post the fix today.
@mikaelholmen, feel free to revert if it's emergent.

@mikaelholmen
Copy link
Collaborator

mikaelholmen commented Aug 28, 2025

It seems that (post) dom tree analysis is not updated. I am still trying to compile LLVM with EXPENSIVE_CHECKS to reproduce. Hopefully, I will post the fix today. @mikaelholmen, feel free to revert if it's emergent.

No need to revert for me, I just noticed this in a private buildbot that we have and thought I'd mention it since I don't see any comments about it from public bots here.

@antoniofrighetto
Copy link
Contributor

It seems that (post) dom tree analysis is not updated. I am still trying to compile LLVM with EXPENSIVE_CHECKS to reproduce. Hopefully, I will post the fix today. @mikaelholmen, feel free to revert if it's emergent.

I suspect we might now need to also set MadeChanges when DeadFunctions is not empty.

@XChy
Copy link
Member Author

XChy commented Aug 28, 2025

I suspect we might now need to also set MadeChanges when DeadFunctions is not empty.

Yes, I find out the reason, Prepare a patch now.

XChy added a commit that referenced this pull request Aug 28, 2025
…5833)

As reported in
#154668 (comment),
we missed invalidating analysis as we don't set the MadeChanges to true
after removing dead functions.

This patch makes it explicit to remove the dead functions marked by
FuncSpec in SCCP and set MadeChanges correctly.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Aug 28, 2025
…icitly (#155833)

As reported in
llvm/llvm-project#154668 (comment),
we missed invalidating analysis as we don't set the MadeChanges to true
after removing dead functions.

This patch makes it explicit to remove the dead functions marked by
FuncSpec in SCCP and set MadeChanges correctly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

clang crashes at -O{1,2,3} on x86_64-linux-gnu: Assertion `pred_empty(DelBB) && "DelBB has one or more predecessors."' failed

5 participants