-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[FuncSpec] Skip SCCP on blocks of dead functions and poison their callsites #154668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-function-specialization Author: XChy (XChy) ChangesFixes #153295. define i32 @<!-- -->caller() {
entry:
%call1 = call i32 @<!-- -->callee(i32 1)
%call2 = call i32 @<!-- -->callee(i32 0)
%cond = icmp eq i32 %call2, 0
br i1 %cond, label %common.ret, label %if.then
common.ret: ; preds = %entry
ret i32 0
if.then: ; preds = %entry
%unreachable_call = call i32 @<!-- -->callee(i32 2)
ret i32 %unreachable_call
}
define internal i32 @<!-- -->callee(i32 %ac) {
entry:
br label %ai
ai: ; preds = %ai, %entry
%add = or i32 0, 0
%cond = icmp eq i32 %ac, 1
br i1 %cond, label %aj, label %ai
aj: ; preds = %ai
ret i32 0
}Before specialization, the SCCP solver determines that Full diff: https://github.com/llvm/llvm-project/pull/154668.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/IPO/FunctionSpecialization.cpp b/llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
index c876a47ef2129..c799bb54a34b6 100644
--- a/llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
+++ b/llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
@@ -1167,15 +1167,17 @@ Constant *FunctionSpecializer::getCandidateConstant(Value *V) {
void FunctionSpecializer::updateCallSites(Function *F, const Spec *Begin,
const Spec *End) {
- // Collect the call sites that need updating.
+ // Collect the call sites that need updating and count ALL the call sites.
SmallVector<CallBase *> ToUpdate;
- for (User *U : F->users())
- if (auto *CS = dyn_cast<CallBase>(U);
- CS && CS->getCalledFunction() == F &&
- Solver.isBlockExecutable(CS->getParent()))
- ToUpdate.push_back(CS);
+ unsigned NCallsLeft = 0;
+ for (User *U : F->users()) {
+ if (auto *CS = dyn_cast<CallBase>(U); CS && CS->getCalledFunction() == F) {
+ NCallsLeft++;
+ if (Solver.isBlockExecutable(CS->getParent()))
+ ToUpdate.push_back(CS);
+ }
+ }
- unsigned NCallsLeft = ToUpdate.size();
for (CallBase *CS : ToUpdate) {
bool ShouldDecrementCount = CS->getFunction() == F;
@@ -1207,6 +1209,8 @@ void FunctionSpecializer::updateCallSites(Function *F, const Spec *Begin,
// If the function has been completely specialized, the original function
// is no longer needed. Mark it unreachable.
+ // NOTE: We cannot mark it unreachable if any unexecutable call site exists,
+ // as the unexecutable call site may become executable due to specialization.
if (NCallsLeft == 0 && Solver.isArgumentTrackedFunction(F)) {
Solver.markFunctionUnreachable(F);
FullySpecialized.insert(F);
diff --git a/llvm/test/Transforms/FunctionSpecialization/reachable-after-specialization.ll b/llvm/test/Transforms/FunctionSpecialization/reachable-after-specialization.ll
new file mode 100644
index 0000000000000..92685c5c72f2e
--- /dev/null
+++ b/llvm/test/Transforms/FunctionSpecialization/reachable-after-specialization.ll
@@ -0,0 +1,42 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -passes=ipsccp --funcspec-min-function-size=1 -S < %s | FileCheck %s
+
+define i32 @caller() {
+; CHECK-LABEL: define i32 @caller() {
+; CHECK-NEXT: [[ENTRY:.*:]]
+; CHECK-NEXT: [[CALL1:%.*]] = call i32 @callee.specialized.1(i32 1)
+; CHECK-NEXT: [[CALL2:%.*]] = call i32 @callee.specialized.2(i32 0)
+; CHECK-NEXT: [[COND:%.*]] = icmp eq i32 undef, 0
+; CHECK-NEXT: br i1 [[COND]], label %[[COMMON_RET:.*]], label %[[IF_THEN:.*]]
+; CHECK: [[COMMON_RET]]:
+; CHECK-NEXT: ret i32 0
+; CHECK: [[IF_THEN]]:
+; CHECK-NEXT: [[UNREACHABLE_CALL:%.*]] = call i32 @callee.specialized.3(i32 2)
+; CHECK-NEXT: ret i32 undef
+;
+entry:
+ %call1 = call i32 @callee(i32 1)
+ %call2 = call i32 @callee(i32 0)
+ %cond = icmp eq i32 %call2, 0
+ br i1 %cond, label %common.ret, label %if.then
+
+common.ret: ; preds = %entry
+ ret i32 0
+
+if.then: ; preds = %entry
+ %unreachable_call = call i32 @callee(i32 2)
+ ret i32 %unreachable_call
+}
+
+define internal i32 @callee(i32 %ac) {
+entry:
+ br label %ai
+
+ai: ; preds = %ai, %entry
+ %add = or i32 0, 0
+ %cond = icmp eq i32 %ac, 1
+ br i1 %cond, label %aj, label %ai
+
+aj: ; preds = %ai
+ ret i32 0
+}
|
You can test this locally with the following command:git diff -U0 --pickaxe-regex -S '([^a-zA-Z0-9#_-]undef[^a-zA-Z0-9_-]|UndefValue::get)' 'HEAD~1' HEAD llvm/test/Transforms/FunctionSpecialization/reachable-after-specialization.ll llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h llvm/lib/Transforms/IPO/FunctionSpecialization.cpp llvm/lib/Transforms/IPO/SCCP.cppThe following files introduce new uses of undef:
Undef is now deprecated and should only be used in the rare cases where no replacement is possible. For example, a load of uninitialized memory yields In tests, avoid using For example, this is considered a bad practice: define void @fn() {
...
br i1 undef, ...
}Please use the following instead: define void @fn(i1 %cond) {
...
br i1 %cond, ...
}Please refer to the Undefined Behavior Manual for more information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLVM IR module before BBs pruning in SCCP:
define i32 @caller() {
entry:
%call1 = call i32 @callee.specialized.1(i32 1)
%call2 = call i32 @callee.specialized.2(i32 0)
%cond = icmp eq i32 undef, 0
br i1 %cond, label %common.ret, label %if.then
common.ret: ; preds = %entry
ret i32 0
if.then: ; preds = %entry
%unreachable_call = call i32 @callee(i32 2)
ret i32 0
}
define internal i32 @callee(i32 %ac) {
entry:
br label %ai
ai: ; preds = %entry
unreachable
aj: ; No predecessors!
unreachable
}
define internal i32 @callee.specialized.1(i32 %ac) {
entry:
br label %ai
ai: ; preds = %ai, %entry
%add = or i32 0, 0
%cond = icmp eq i32 %ac, 1
br i1 %cond, label %aj, label %ai
aj: ; preds = %ai
ret i32 0
}
define internal i32 @callee.specialized.2(i32 %ac) {
entry:
br label %ai
ai: ; preds = %ai, %entry
%add = or i32 0, 0
%cond = icmp eq i32 %ac, 1
br i1 %cond, label %aj, label %ai
aj: ; preds = %ai
ret i32 0
}As you correctly noted, FunctionSpecialization happens to look only at executable call-sites. Since the last call-site is not marked as executable, it is ignored by specialization. IIUC the fix correctly, you're suggesting to introduce specialization to those call-sites that live in basic blocks which were determined by the solver to be never executable.
The current code in updateCallSites seems correct to me, as callee is eventually marked as unreachable, as being fully specialized along all its executable-proven paths:
llvm-project/llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
Lines 1207 to 1212 in cbf5af9
| // If the function has been completely specialized, the original function | |
| // is no longer needed. Mark it unreachable. | |
| if (NCallsLeft == 0 && Solver.isArgumentTrackedFunction(F)) { | |
| Solver.markFunctionUnreachable(F); | |
| FullySpecialized.insert(F); | |
| } |
Isn't the discrepancy here stemming from the fact that we are attempting to remove basic blocks of a fully specialized function, where it should be up to FunctionSpecializer's destructor to remove such functions (in which case, unreachable call-sites should be removed as well)?
|
Thanks for your detailed review.
Exactly, my original purpose is to avoid removing functions that are proven inexecutable but may be reset to be executable later. New specialization is not my purpose.
Sounds reasonable to me. It's the undefined behaviour that causes inconsistent executability. How about replacing the unreachable call-sites with |
Right. That would make sense to me. |
33995e8 to
bc9c37a
Compare
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks! Could you please update PR title as well?
llvm/test/Transforms/FunctionSpecialization/reachable-after-specialization.ll
Outdated
Show resolved
Hide resolved
|
Hi, please leave some time for me to look at it before you merge. I'll be able to review tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, the unexecutable call sites can become executable again after solving specialized calls. In this testcase,
call2is considered Overdefined after specialization, makingcondalso Overdefined. Thus,unreachable_callbecomes executable.
What is the LatticeValue of call2 after specialization? Is it undef? I am puzzled why the solver considers unreachable_call non executable in the first place. Surely call2 falls into an infinite loop not returning zero.
Prior to specialization, the lattice value for After specialization, the lattice value of |
Is it because the specialized callee will not return? |
…bidding address taken in updateCallsites
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding a test for addressTaken. One remark about the first test, otherwise the patch looks ok I think. I am wondering if we need another test to demonstrate the poisoning of users, and why does it now show in the first example.
llvm/test/Transforms/FunctionSpecialization/reachable-after-specialization.ll
Show resolved
Hide resolved
|
Also I'd rename the patch to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I am satisfied with the patch in its current form. If I am missing something I hope that a bootstrap build will catch it for us.
|
Hi @XChy fails if opt is built with EXPENSIVE_CHECKS. |
|
It seems that (post) dom tree analysis is not updated. I am still trying to compile LLVM with EXPENSIVE_CHECKS to reproduce. Hopefully, I will post the fix today. |
No need to revert for me, I just noticed this in a private buildbot that we have and thought I'd mention it since I don't see any comments about it from public bots here. |
I suspect we might now need to also set MadeChanges when DeadFunctions is not empty. |
Yes, I find out the reason, Prepare a patch now. |
…5833) As reported in #154668 (comment), we missed invalidating analysis as we don't set the MadeChanges to true after removing dead functions. This patch makes it explicit to remove the dead functions marked by FuncSpec in SCCP and set MadeChanges correctly.
…icitly (#155833) As reported in llvm/llvm-project#154668 (comment), we missed invalidating analysis as we don't set the MadeChanges to true after removing dead functions. This patch makes it explicit to remove the dead functions marked by FuncSpec in SCCP and set MadeChanges correctly.
Fixes #153295.
For test case below:
Before specialization, the SCCP solver determines that
unreachable_callis unexecutable, as the value ofcalleecan only bezero.
After specializing the call sites
call1andcall2, FnSpecializerannounces
calleeis a dead function since all executable call sitesare specialized. However, the unexecutable call sites can become
executable again after solving specialized calls.
In this testcase,
call2is consideredOverdefinedafterspecialization, making
condalsoOverdefined. Thus,unreachable_callbecomes executable.This patch skips SCCP on the blocks in dead functions, and poisons the
call sites of dead functions.