-
Notifications
You must be signed in to change notification settings - Fork 12.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CGData] Global Merge Functions #112671
[CGData] Global Merge Functions #112671
Conversation
f6fc259
to
09f1ec7
Compare
584a2d7
to
1601086
Compare
b6007c4
to
09f1ec7
Compare
1601086
to
ded5771
Compare
5425db1
to
06a4969
Compare
12b539e
to
3194154
Compare
3194154
to
e86d78b
Compare
cc. @nocchijiang |
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-lto Author: Kyungwoo Lee (kyulee-com) ChangesThis implements a global function merging pass. Unlike traditional function merging passes that use IR comparators, this pass employs a structurally stable hash to identify similar functions while ignoring certain constant operands. These ignored constants are tracked and encoded into a stable function summary. When merging, instead of explicitly folding similar functions and their call sites, we form a merging instance by supplying different parameters via thunks. The actual size reduction occurs when identically created merging instances are folded by the linker. Currently, this pass is wired to llvm-lto2, enabled by the
We can enable a global merging mode similar to the global function outliner (https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753/), which will perform the above steps separately.
Depends on #112664 Patch is 49.23 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/112671.diff 15 Files Affected:
diff --git a/llvm/include/llvm/CGData/CodeGenData.h b/llvm/include/llvm/CGData/CodeGenData.h
index 5d7c74725ccef1..da0e412f2a0e03 100644
--- a/llvm/include/llvm/CGData/CodeGenData.h
+++ b/llvm/include/llvm/CGData/CodeGenData.h
@@ -145,6 +145,9 @@ class CodeGenData {
const OutlinedHashTree *getOutlinedHashTree() {
return PublishedHashTree.get();
}
+ const StableFunctionMap *getStableFunctionMap() {
+ return PublishedStableFunctionMap.get();
+ }
/// Returns true if we should write codegen data.
bool emitCGData() { return EmitCGData; }
@@ -169,10 +172,18 @@ inline bool hasOutlinedHashTree() {
return CodeGenData::getInstance().hasOutlinedHashTree();
}
+inline bool hasStableFunctionMap() {
+ return CodeGenData::getInstance().hasStableFunctionMap();
+}
+
inline const OutlinedHashTree *getOutlinedHashTree() {
return CodeGenData::getInstance().getOutlinedHashTree();
}
+inline const StableFunctionMap *getStableFunctionMap() {
+ return CodeGenData::getInstance().getStableFunctionMap();
+}
+
inline bool emitCGData() { return CodeGenData::getInstance().emitCGData(); }
inline void
diff --git a/llvm/include/llvm/InitializePasses.h b/llvm/include/llvm/InitializePasses.h
index 54c070401ec8a4..a43a96e90543cc 100644
--- a/llvm/include/llvm/InitializePasses.h
+++ b/llvm/include/llvm/InitializePasses.h
@@ -123,6 +123,7 @@ void initializeGCEmptyBasicBlocksPass(PassRegistry &);
void initializeGCMachineCodeAnalysisPass(PassRegistry &);
void initializeGCModuleInfoPass(PassRegistry &);
void initializeGVNLegacyPassPass(PassRegistry &);
+void initializeGlobalMergeFuncPass(PassRegistry &);
void initializeGlobalMergePass(PassRegistry &);
void initializeGlobalsAAWrapperPassPass(PassRegistry &);
void initializeHardwareLoopsLegacyPass(PassRegistry &);
diff --git a/llvm/include/llvm/LinkAllPasses.h b/llvm/include/llvm/LinkAllPasses.h
index 3516b47d29ef36..d1219d6ee2a13e 100644
--- a/llvm/include/llvm/LinkAllPasses.h
+++ b/llvm/include/llvm/LinkAllPasses.h
@@ -79,6 +79,7 @@ struct ForcePassLinking {
(void)llvm::createDomOnlyViewerWrapperPassPass();
(void)llvm::createDomViewerWrapperPassPass();
(void)llvm::createAlwaysInlinerLegacyPass();
+ (void)llvm::createGlobalMergeFuncPass();
(void)llvm::createGlobalsAAWrapperPass();
(void)llvm::createInstSimplifyLegacyPass();
(void)llvm::createInstructionCombiningPass();
diff --git a/llvm/include/llvm/Transforms/IPO.h b/llvm/include/llvm/Transforms/IPO.h
index ee0e35aa618325..28cf330ad20812 100644
--- a/llvm/include/llvm/Transforms/IPO.h
+++ b/llvm/include/llvm/Transforms/IPO.h
@@ -55,6 +55,11 @@ enum class PassSummaryAction {
Export, ///< Export information to summary.
};
+/// createGlobalMergeFuncPass - This pass generates merged instances by
+/// parameterizing distinct constants across similar functions, utilizing stable
+/// function hash information.
+Pass *createGlobalMergeFuncPass();
+
} // End llvm namespace
#endif
diff --git a/llvm/include/llvm/Transforms/IPO/GlobalMergeFunctions.h b/llvm/include/llvm/Transforms/IPO/GlobalMergeFunctions.h
new file mode 100644
index 00000000000000..395aeda73b2d0f
--- /dev/null
+++ b/llvm/include/llvm/Transforms/IPO/GlobalMergeFunctions.h
@@ -0,0 +1,78 @@
+//===------ GlobalMergeFunctions.h - Global merge functions -----*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This pass defines the implementation of a function merging mechanism
+// that utilizes a stable function hash to track differences in constants and
+// identify potential merge candidates. The process involves two rounds:
+// 1. The first round collects stable function hashes and identifies merge
+// candidates with matching hashes. It also computes the set of parameters
+// that point to different constants during the stable function merge.
+// 2. The second round leverages this collected global function information to
+// optimistically create a merged function in each module context, ensuring
+// correct transformation.
+// Similar to the global outliner, this approach uses the linker's deduplication
+// (ICF) to fold identical merged functions, thereby reducing the final binary
+// size. The work is inspired by the concepts discussed in the following paper:
+// https://dl.acm.org/doi/pdf/10.1145/3652032.3657575.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_TRANSFORMS_IPO_GLOBALMERGEFUNCTIONS_H
+#define LLVM_TRANSFORMS_IPO_GLOBALMERGEFUNCTIONS_H
+
+#include "llvm/CGData/StableFunctionMap.h"
+#include "llvm/IR/Module.h"
+#include "llvm/Pass.h"
+
+enum class HashFunctionMode {
+ Local,
+ BuildingHashFuncion,
+ UsingHashFunction,
+};
+
+namespace llvm {
+
+// A vector of locations (the pair of (instruction, operand) indices) reachable
+// from a parameter.
+using ParamLocs = SmallVector<IndexPair, 4>;
+// A vector of parameters
+using ParamLocsVecTy = SmallVector<ParamLocs, 8>;
+
+/// GlobalMergeFunc is a ModulePass that implements a function merging mechanism
+/// using stable function hashes. It identifies and merges functions with
+/// matching hashes across modules to optimize binary size.
+class GlobalMergeFunc : public ModulePass {
+ HashFunctionMode MergerMode = HashFunctionMode::Local;
+
+ std::unique_ptr<StableFunctionMap> LocalFunctionMap;
+
+public:
+ static char ID;
+
+ GlobalMergeFunc();
+
+ StringRef getPassName() const override;
+
+ void getAnalysisUsage(AnalysisUsage &AU) const override;
+
+ void initializeMergerMode(const Module &M);
+
+ bool runOnModule(Module &M) override;
+
+ /// Analyze module to create stable function into LocalFunctionMap.
+ void analyze(Module &M);
+
+ /// Emit LocalFunctionMap into __llvm_merge section.
+ void emitFunctionMap(Module &M);
+
+ /// Merge functions in the module using the given function map.
+ bool merge(Module &M, const StableFunctionMap *FunctionMap);
+};
+
+} // end namespace llvm
+#endif // LLVM_TRANSFORMS_IPO_GLOBALMERGEFUNCTIONS_H
diff --git a/llvm/lib/CodeGen/TargetPassConfig.cpp b/llvm/lib/CodeGen/TargetPassConfig.cpp
index aff74104006e5a..beeb8ea2d078d9 100644
--- a/llvm/lib/CodeGen/TargetPassConfig.cpp
+++ b/llvm/lib/CodeGen/TargetPassConfig.cpp
@@ -141,6 +141,9 @@ static cl::opt<RunOutliner> EnableMachineOutliner(
"Disable all outlining"),
// Sentinel value for unspecified option.
clEnumValN(RunOutliner::AlwaysOutline, "", "")));
+cl::opt<bool> EnableGlobalMergeFunc(
+ "enable-global-merge-func", cl::Hidden,
+ cl::desc("Enable global merge functions that are based on hash function"));
// Disable the pass to fix unwind information. Whether the pass is included in
// the pipeline is controlled via the target options, this option serves as
// manual override.
diff --git a/llvm/lib/LTO/LTO.cpp b/llvm/lib/LTO/LTO.cpp
index 0f53c608512171..3f88c6c475d4d8 100644
--- a/llvm/lib/LTO/LTO.cpp
+++ b/llvm/lib/LTO/LTO.cpp
@@ -53,6 +53,7 @@
#include "llvm/Support/raw_ostream.h"
#include "llvm/Target/TargetOptions.h"
#include "llvm/Transforms/IPO.h"
+#include "llvm/Transforms/IPO/GlobalMergeFunctions.h"
#include "llvm/Transforms/IPO/MemProfContextDisambiguation.h"
#include "llvm/Transforms/IPO/WholeProgramDevirt.h"
#include "llvm/Transforms/Utils/FunctionImportUtils.h"
diff --git a/llvm/lib/Transforms/IPO/CMakeLists.txt b/llvm/lib/Transforms/IPO/CMakeLists.txt
index 15cb57399d2460..6a36ed64149f93 100644
--- a/llvm/lib/Transforms/IPO/CMakeLists.txt
+++ b/llvm/lib/Transforms/IPO/CMakeLists.txt
@@ -21,6 +21,7 @@ add_llvm_component_library(LLVMipo
GlobalDCE.cpp
GlobalOpt.cpp
GlobalSplit.cpp
+ GlobalMergeFunctions.cpp
HotColdSplitting.cpp
IPO.cpp
IROutliner.cpp
@@ -60,6 +61,7 @@ add_llvm_component_library(LLVMipo
Analysis
BitReader
BitWriter
+ CGData
Core
FrontendOpenMP
InstCombine
diff --git a/llvm/lib/Transforms/IPO/GlobalMergeFunctions.cpp b/llvm/lib/Transforms/IPO/GlobalMergeFunctions.cpp
new file mode 100644
index 00000000000000..5a52c8f3304186
--- /dev/null
+++ b/llvm/lib/Transforms/IPO/GlobalMergeFunctions.cpp
@@ -0,0 +1,744 @@
+//===---- GlobalMergeFunctions.cpp - Global merge functions -------*- C++ -===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This pass defines the implementation of a function merging mechanism
+// that utilizes a stable function hash to track differences in constants and
+// create potential merge candidates. The process involves two rounds:
+// 1. The first round collects stable function hashes and identifies merge
+// candidates with matching hashes. It also computes the set of parameters
+// that point to different constants during the stable function merge.
+// 2. The second round leverages this collected global function information to
+// optimistically create a merged function in each module context, ensuring
+// correct transformation.
+// Similar to the global outliner, this approach uses the linker's deduplication
+// (ICF) to fold identical merged functions, thereby reducing the final binary
+// size. The work is inspired by the concepts discussed in the following paper:
+// https://dl.acm.org/doi/pdf/10.1145/3652032.3657575.
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/Transforms/IPO/GlobalMergeFunctions.h"
+#include "llvm/ADT/Statistic.h"
+#include "llvm/Analysis/ModuleSummaryAnalysis.h"
+#include "llvm/CGData/CodeGenData.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/StructuralHash.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Support/CommandLine.h"
+#include "llvm/Transforms/Utils/ModuleUtils.h"
+
+#define DEBUG_TYPE "global-merge-func"
+
+using namespace llvm;
+using namespace llvm::support;
+
+static cl::opt<bool>
+ DisableGlobalMerging("disable-global-merging", cl::Hidden,
+ cl::desc("Disable global merging only by ignoring "
+ "the codegen data generation or use. Local "
+ "merging is still enabled within a module."),
+ cl::init(false));
+static cl::opt<unsigned> GlobalMergingMinInstrs(
+ "global-merging-min-instrs",
+ cl::desc("The minimum instruction count required when merging functions."),
+ cl::init(1), cl::Hidden);
+static cl::opt<unsigned> GlobalMergingMaxParams(
+ "global-merging-max-params",
+ cl::desc(
+ "The maximum number of parameters allowed when merging functions."),
+ cl::init(std::numeric_limits<unsigned>::max()), cl::Hidden);
+static cl::opt<unsigned> GlobalMergingParamOverhead(
+ "global-merging-param-overhead",
+ cl::desc("The overhead cost associated with each parameter when merging "
+ "functions."),
+ cl::init(2), cl::Hidden);
+static cl::opt<unsigned>
+ GlobalMergingCallOverhead("global-merging-call-overhead",
+ cl::desc("The overhead cost associated with each "
+ "function call when merging functions."),
+ cl::init(1), cl::Hidden);
+static cl::opt<unsigned> GlobalMergingExtraThreshold(
+ "global-merging-extra-threshold",
+ cl::desc("An additional cost threshold that must be exceeded for merging "
+ "to be considered beneficial."),
+ cl::init(0), cl::Hidden);
+
+extern cl::opt<bool> EnableGlobalMergeFunc;
+
+STATISTIC(NumMismatchedFunctionHashGlobalMergeFunction,
+ "Number of mismatched function hash for global merge function");
+STATISTIC(NumMismatchedInstCountGlobalMergeFunction,
+ "Number of mismatched instruction count for global merge function");
+STATISTIC(NumMismatchedConstHashGlobalMergeFunction,
+ "Number of mismatched const hash for global merge function");
+STATISTIC(NumMismatchedModuleIdGlobalMergeFunction,
+ "Number of mismatched Module Id for global merge function");
+STATISTIC(NumGlobalMergeFunctions,
+ "Number of functions that are actually merged using function hash");
+STATISTIC(NumAnalyzedModues, "Number of modules that are analyzed");
+STATISTIC(NumAnalyzedFunctions, "Number of functions that are analyzed");
+STATISTIC(NumEligibleFunctions, "Number of functions that are eligible");
+
+/// Returns true if the \OpIdx operand of \p CI is the callee operand.
+static bool isCalleeOperand(const CallBase *CI, unsigned OpIdx) {
+ return &CI->getCalledOperandUse() == &CI->getOperandUse(OpIdx);
+}
+
+static bool canParameterizeCallOperand(const CallBase *CI, unsigned OpIdx) {
+ if (CI->isInlineAsm())
+ return false;
+ Function *Callee = CI->getCalledOperand()
+ ? dyn_cast_or_null<Function>(
+ CI->getCalledOperand()->stripPointerCasts())
+ : nullptr;
+ if (Callee) {
+ if (Callee->isIntrinsic())
+ return false;
+ // objc_msgSend stubs must be called, and can't have their address taken.
+ if (Callee->getName().starts_with("objc_msgSend$"))
+ return false;
+ }
+ if (isCalleeOperand(CI, OpIdx) &&
+ CI->getOperandBundle(LLVMContext::OB_ptrauth).has_value()) {
+ // The operand is the callee and it has already been signed. Ignore this
+ // because we cannot add another ptrauth bundle to the call instruction.
+ return false;
+ }
+ return true;
+}
+
+bool isEligibleInstrunctionForConstantSharing(const Instruction *I) {
+ switch (I->getOpcode()) {
+ case Instruction::Load:
+ case Instruction::Store:
+ case Instruction::Call:
+ case Instruction::Invoke:
+ return true;
+ default:
+ return false;
+ }
+}
+
+bool isEligibleOperandForConstantSharing(const Instruction *I, unsigned OpIdx) {
+ assert(OpIdx < I->getNumOperands() && "Invalid operand index");
+
+ if (!isEligibleInstrunctionForConstantSharing(I))
+ return false;
+
+ auto Opnd = I->getOperand(OpIdx);
+ if (!isa<Constant>(Opnd))
+ return false;
+
+ if (const auto *CI = dyn_cast<CallBase>(I))
+ return canParameterizeCallOperand(CI, OpIdx);
+
+ return true;
+}
+
+/// Returns true if function \p F is eligible for merging.
+bool isEligibleFunction(Function *F) {
+ if (F->isDeclaration())
+ return false;
+
+ if (F->hasFnAttribute(llvm::Attribute::NoMerge))
+ return false;
+
+ if (F->hasAvailableExternallyLinkage())
+ return false;
+
+ if (F->getFunctionType()->isVarArg())
+ return false;
+
+ if (F->getCallingConv() == CallingConv::SwiftTail)
+ return false;
+
+ // If function contains callsites with musttail, if we merge
+ // it, the merged function will have the musttail callsite, but
+ // the number of parameters can change, thus the parameter count
+ // of the callsite will mismatch with the function itself.
+ for (const BasicBlock &BB : *F) {
+ for (const Instruction &I : BB) {
+ const auto *CB = dyn_cast<CallBase>(&I);
+ if (CB && CB->isMustTailCall())
+ return false;
+ }
+ }
+
+ return true;
+}
+
+static bool
+isEligibleInstrunctionForConstantSharingLocal(const Instruction *I) {
+ switch (I->getOpcode()) {
+ case Instruction::Load:
+ case Instruction::Store:
+ case Instruction::Call:
+ case Instruction::Invoke:
+ return true;
+ default:
+ return false;
+ }
+}
+
+static bool ignoreOp(const Instruction *I, unsigned OpIdx) {
+ assert(OpIdx < I->getNumOperands() && "Invalid operand index");
+
+ if (!isEligibleInstrunctionForConstantSharingLocal(I))
+ return false;
+
+ if (!isa<Constant>(I->getOperand(OpIdx)))
+ return false;
+
+ if (const auto *CI = dyn_cast<CallBase>(I))
+ return canParameterizeCallOperand(CI, OpIdx);
+
+ return true;
+}
+
+static Value *createCast(IRBuilder<> &Builder, Value *V, Type *DestTy) {
+ Type *SrcTy = V->getType();
+ if (SrcTy->isStructTy()) {
+ assert(DestTy->isStructTy());
+ assert(SrcTy->getStructNumElements() == DestTy->getStructNumElements());
+ Value *Result = PoisonValue::get(DestTy);
+ for (unsigned int I = 0, E = SrcTy->getStructNumElements(); I < E; ++I) {
+ Value *Element =
+ createCast(Builder, Builder.CreateExtractValue(V, ArrayRef(I)),
+ DestTy->getStructElementType(I));
+
+ Result = Builder.CreateInsertValue(Result, Element, ArrayRef(I));
+ }
+ return Result;
+ }
+ assert(!DestTy->isStructTy());
+ if (auto *SrcAT = dyn_cast<ArrayType>(SrcTy)) {
+ auto *DestAT = dyn_cast<ArrayType>(DestTy);
+ assert(DestAT);
+ assert(SrcAT->getNumElements() == DestAT->getNumElements());
+ Value *Result = UndefValue::get(DestTy);
+ for (unsigned int I = 0, E = SrcAT->getNumElements(); I < E; ++I) {
+ Value *Element =
+ createCast(Builder, Builder.CreateExtractValue(V, ArrayRef(I)),
+ DestAT->getElementType());
+
+ Result = Builder.CreateInsertValue(Result, Element, ArrayRef(I));
+ }
+ return Result;
+ }
+ assert(!DestTy->isArrayTy());
+ if (SrcTy->isIntegerTy() && DestTy->isPointerTy())
+ return Builder.CreateIntToPtr(V, DestTy);
+ else if (SrcTy->isPointerTy() && DestTy->isIntegerTy())
+ return Builder.CreatePtrToInt(V, DestTy);
+ else
+ return Builder.CreateBitCast(V, DestTy);
+}
+
+void GlobalMergeFunc::analyze(Module &M) {
+ ++NumAnalyzedModues;
+ for (Function &Func : M) {
+ ++NumAnalyzedFunctions;
+ if (isEligibleFunction(&Func)) {
+ ++NumEligibleFunctions;
+
+ auto FI = llvm::StructuralHashWithDifferences(Func, ignoreOp);
+
+ // Convert the operand map to a vector for a serialization-friendly
+ // format.
+ IndexOperandHashVecType IndexOperandHashes;
+ for (auto &Pair : *FI.IndexOperandHashMap)
+ IndexOperandHashes.emplace_back(Pair);
+
+ StableFunction SF(FI.FunctionHash, get_stable_name(Func.getName()).str(),
+ M.getModuleIdentifier(), FI.IndexInstruction->size(),
+ std::move(IndexOperandHashes));
+
+ LocalFunctionMap->insert(SF);
+ }
+ }
+}
+
+/// Tuple to hold function info to process merging.
+struct FuncMergeInfo {
+ StableFunctionMap::StableFunctionEntry *SF;
+ Function *F;
+ std::unique_ptr<IndexInstrMap> IndexInstruction;
+};
+
+// Given the func info, and the parameterized locations, create and return
+// a new merged function by replacing the original constants with the new
+// parameters.
+static Function *createMergedFunction(FuncMergeInfo &FI,
+ ArrayRef<Type *> ConstParamTypes,
+ const ParamLocsVecTy &ParamLocsVec) {
+ // Synthesize a new merged function name by appending ".Tgm" to the root
+ // function's name.
+ auto *MergedFunc = FI.F;
+ auto NewFunctionName = MergedFunc->getName().str() + ".Tgm";
+ auto *M = MergedFunc->getParent();
+ assert(!M->getFunction(NewFunctionName));
+
+ FunctionType *OrigTy = MergedFunc->getFunctionType();
+ // Get the original params' types.
+ SmallVector<Type *> ParamTypes(OrigTy->param_begin(), OrigTy->param_end());
+ // Append const parameter types that are passed in.
+ ParamTypes.append(ConstParamTypes.begin(), ConstParamTypes.end());
+ FunctionType *FuncType =
+ FunctionType::get(OrigTy->getReturnType(), ParamTypes, false);
+
+ // Declare a new function
+ Function *NewFunction =
+ Function::Create(FuncType, MergedFunc->getLinkage(), NewFunctionName);
+ if (auto *SP = MergedFunc->getSubprogram())
+ NewFunction->setSubprogram(SP);
+ NewFunction->copyAttributesFrom(MergedFunc);
+ NewFunction->setDLLStorageClass(GlobalValue::DefaultStorageClass);
+
+ NewFunction->setLinkage(GlobalValue::InternalLinkage);
+ NewFunction->addFnAttr(Attribute::NoInline);
+
+ // Add the new function before the root function.
+ M->getFunctionList().insert(MergedFunc->getIterator(), NewFunction);
+
+ // Move the body of MergedFunc into the NewFunction.
+ NewFunction->splice(NewFunction->begin(), MergedFunc);
+
+ // Update the original args by the new args.
+ auto NewArgIter ...
[truncated]
|
cc. @eeckstein |
Nice! Will this new pass make Swift's function merging pass obsolete? |
Yes. We use this new pass with thin-LTO, which is built on top of Swift's existing function merging pass. Since the former fully covers the functionality of the latter, technically we can disable it. |
Shall we add new pass manager support in this PR or you have planned a separated one for it? |
NewFunction->setDLLStorageClass(GlobalValue::DefaultStorageClass); | ||
|
||
NewFunction->setLinkage(GlobalValue::InternalLinkage); | ||
NewFunction->addFnAttr(Attribute::NoInline); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If MergedFunc
is alwaysinline
, we should remove the attribute from NewFunction
. The verifier will complain about it if both noinline
and alwaysinline
are present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now bail out the alwaysinline
case as an eligible function as it doesn't make sense to merge functions that must be inlined.
// objc_msgSend stubs must be called, and can't have their address taken. | ||
if (Callee->getName().starts_with("objc_msgSend$")) | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calls to dtrace probes should probably be filtered here as well. Linkers have special logic to handle them. Parameterizing these calls will break the linkage. See 6c641d0 for lld, https://github.com/apple-oss-distributions/ld64/blob/47f477cb721755419018f7530038b272e9d0cdea/src/ld/OutputFile.cpp#L1597 for ld64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I now bail out __dtrace
case similar to https://github.com/swiftlang/swift/blob/main/lib/LLVMPasses/LLVMMergeFunctions.cpp#L643-L644.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/27/builds/2002 Here is the relevant piece of the build log for the reference
|
This is a follow-up PR to refactor the initial global merge function pass implemented in #112671. It first collects stable functions relevant to the current module and iterates over those only, instead of iterating through all stable functions in the stable function map. This is a patch for https://discourse.llvm.org/t/rfc-global-function-merging/82608.
Hi, I've no idea if this is expected but I noticed that if I run
it crashes like
|
Thank you for testing this! I can't reproduce the issue on my end, but I suspect it might be due to the module summary being run twice, even though it's optional. |
This is a follow-up PR to refactor the initial global merge function pass implemented in llvm#112671. It first collects stable functions relevant to the current module and iterates over those only, instead of iterating through all stable functions in the stable function map. This is a patch for https://discourse.llvm.org/t/rfc-global-function-merging/82608.
This is a follow-up PR to refactor the initial global merge function pass implemented in llvm#112671. It first collects stable functions relevant to the current module and iterates over those only, instead of iterating through all stable functions in the stable function map. This is a patch for https://discourse.llvm.org/t/rfc-global-function-merging/82608.
This is a follow-up PR to refactor the initial global merge function pass implemented in #112671. It first collects stable functions relevant to the current module and iterates over those only, instead of iterating through all stable functions in the stable function map. This is a patch for https://discourse.llvm.org/t/rfc-global-function-merging/82608.
Weird if you can't reproduce it. I did again at 878b03e. Asserts must be on. |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/24/builds/2657 Here is the relevant piece of the build log for the reference
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for some late comments, since I didn't have free time actually debugging around the changes.
ParamLocsVecTy ParamLocsVec; | ||
for (auto &[HashSeq, Locs] : HashSeqToLocs) { | ||
ParamLocsVec.push_back(std::move(Locs)); | ||
llvm::sort(ParamLocsVec, [&](const ParamLocs &L, const ParamLocs &R) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move the sort after the for
loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! Apparently I mis-placed this code inside the loop while I was refactoring this code.
if (InstCount < GlobalMergingMinInstrs) | ||
return false; | ||
|
||
unsigned ParamCount = SFS[0]->IndexOperandHashMap->size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number of unique values of SFS[0]->IndexOperandHashMap
is more accurate, if a given constant is used multiple times in the stable function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number of unique values can be different depending on a stable function.
Anyhow, I consider this count as a parameter count, and add them up to compute the total cost.
|
||
unsigned Benefit = InstCount * (StableFunctionCount - 1); | ||
unsigned Cost = | ||
(GlobalMergingParamOverhead * ParamCount + GlobalMergingCallOverhead) * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GlobalMergingParamOverhead
could potentially be fine grained for different kinds of constant. I ran some tests from Swift repo to test the compatibility between the Apple implementation and this one, and found that some tested functions were not merged because it overestimated the cost for some parameters of small scalar type. This should not make a noticeable difference in production, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also FYI the implementation that comes with this PR is not passing all the existing Swift repo tests despite lowering GlobalMergingParamOverhead
to 0
to force merges to happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it overestimated the cost for some parameters of small scalar type.
It seems we need to incorporate some type information to accurately reflect the precise cost, in theory. Since the profit model is currently computed offline, I've refined the parameter count for greater precision, as mentioned above
FYI the implementation that comes with this PR is not passing all the existing Swift repo tests despite lowering GlobalMergingParamOverhead to 0 to force merges to happen.
Despite some differences in the underlying assumptions, I'm curious about what the existing Swift merge can accomplish that this new pass cannot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious about what the existing Swift merge can accomplish that this new pass cannot.
These are the behavioral differences I have found by running merge_func*.ll
from Swift repo, some of which I believe we should consider implementing as well:
Test | Differences | Remark |
---|---|---|
merge_func_preserves_vfe.ll |
merge_candidate_c merged or not |
The new pass seems unaware of metadata differences. |
merge_func_ptrauth.ll |
presence of ptrauth info | I believe the new pass does not implement ptrauth for simplicity. |
merge_func_return_type_cast.ll |
return_0 merged with return_null or not |
The 2 functions have different stable hash values. |
merge_func.ll |
func*_merged_with* merging strategy |
Apple implementation tends to avoid too many parameters. |
merge_func.ll |
caller*_* merging strategy |
Apple implementation does not parameterize the calls on the call chain. |
merge_func.ll |
first merged with second or not |
Apple implementation is aware of the order of incoming blocks to a phi. |
merge_func.ll |
not_really_recursive eliminated or not |
Apple implementation replaces the call to the original function with the merged one. |
if (StableFunctionCount < GlobalMergingMinMerges) | ||
return false; | ||
|
||
unsigned InstCount = SFS[0]->InstCount; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number of IR cannot precisely reflect the actual number of machine instructions (often the latter is larger for AArch64. Take the access of a global value for example, it will be expanded into an ADRP
pair for small code model), which results in Benefit
underestimated and some profitable merging opportunities dropped. I am not sure if there is existing code that could be reused to better estimate the machine instruction count, but at least we may introduce a multiplier on InstCount
for fine-tuning of the behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added -global-merging-inst-overhead
to tune this parameter.
|
||
// Convert the operand map to a vector for a serialization-friendly | ||
// format. | ||
IndexOperandHashVecType IndexOperandHashes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we bail out for functions without parameterizable constants here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've found not skipping no parameter case is beneficial. Anyhow, I now introduced -global-merging-skip-no-params
to control this behavior.
Thank you for testing and reporting these issues! |
This implements a global function merging pass. Unlike traditional function merging passes that use IR comparators, this pass employs a structurally stable hash to identify similar functions while ignoring certain constant operands. These ignored constants are tracked and encoded into a stable function summary. When merging, instead of explicitly folding similar functions and their call sites, we form a merging instance by supplying different parameters via thunks. The actual size reduction occurs when identically created merging instances are folded by the linker. Currently, this pass is wired to a pre-codegen pass, enabled by the `-enable-global-merge-func` flag. In a local merging mode, the analysis and merging steps occur sequentially within a module: - `analyze`: Collects stable function hashes and tracks locations of ignored constant operands. - `finalize`: Identifies merge candidates with matching hashes and computes the set of parameters that point to different constants. - `merge`: Uses the stable function map to optimistically create a merged function. We can enable a global merging mode similar to the global function outliner (https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753/), which will perform the above steps separately. - `-codegen-data-generate`: During the first round of code generation, we analyze local merging instances and publish their summaries. - Offline using `llvm-cgdata` or at link-time, we can finalize all these merging summaries that are combined to determine parameters. - `-codegen-data-use`: During the second round of code generation, we optimistically create merging instances within each module, and finally, the linker folds identically created merging instances. Depends on llvm#112664 This is a patch for https://discourse.llvm.org/t/rfc-global-function-merging/82608.
This is a follow-up PR to refactor the initial global merge function pass implemented in llvm#112671. It first collects stable functions relevant to the current module and iterates over those only, instead of iterating through all stable functions in the stable function map. This is a patch for https://discourse.llvm.org/t/rfc-global-function-merging/82608.
This is a follow-up PR to refactor the initial global merge function pass implemented in llvm#112671. It first collects stable functions relevant to the current module and iterates over those only, instead of iterating through all stable functions in the stable function map. This is a patch for https://discourse.llvm.org/t/rfc-global-function-merging/82608.
…6241) Module summary index is optional for this pass, and we shouldn't run it, but import it as necessary.
This update follows up on change #112671 and is mostly a NFC, with the following exceptions: - Introduced `-global-merging-skip-no-params` to bypass merging when no parameters are required. - Parameter count is now calculated based on the unique hash count. - Added `-global-merging-inst-overhead` to adjust the instruction overhead, reflecting the machine instruction size. - Costs and benefits are now computed using the double data type. Since the finalization process occurs offline, this should not significantly impact build time. - Moved a sorting operation outside of the loop. This is a patch for https://discourse.llvm.org/t/rfc-global-function-merging/82608.
This update follows up on change #112671 and is mostly a NFC, with the following exceptions: - Introduced `-global-merging-skip-no-params` to bypass merging when no parameters are required. - Parameter count is now calculated based on the unique hash count. - Added `-global-merging-inst-overhead` to adjust the instruction overhead, reflecting the machine instruction size. - Costs and benefits are now computed using the double data type. Since the finalization process occurs offline, this should not significantly impact build time. - Moved a sorting operation outside of the loop. This is a patch for https://discourse.llvm.org/t/rfc-global-function-merging/82608.
This implements a global function merging pass. Unlike traditional function merging passes that use IR comparators, this pass employs a structurally stable hash to identify similar functions while ignoring certain constant operands. These ignored constants are tracked and encoded into a stable function summary. When merging, instead of explicitly folding similar functions and their call sites, we form a merging instance by supplying different parameters via thunks. The actual size reduction occurs when identically created merging instances are folded by the linker.
Currently, this pass is wired to llvm-lto2, enabled by the
-enable-global-merge-func
flag.In a local merging mode, the analysis and merging steps occur sequentially within a module:
analyze
: Collects stable function hashes and tracks locations of ignored constant operands.finalize
: Identifies merge candidates with matching hashes and computes the set of parameters that point to different constants.merge
: Uses the stable function map to optimistically create a merged function.We can enable a global merging mode similar to the global function outliner (https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753/), which will perform the above steps separately.
-codegen-data-generate
: During the first round of code generation, we analyze local merging instances and publish their summaries.llvm-cgdata
or at link-time, we can finalize all these merging summaries that are combined to determine parameters.-codegen-data-use
: During the second round of code generation, we optimistically create merging instances within each module, and finally, the linker folds identically created merging instances.Depends on #112664
This is a patch for https://discourse.llvm.org/t/rfc-global-function-merging/82608.