Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CGData] Global Merge Functions #112671

Merged
merged 7 commits into from
Nov 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions llvm/include/llvm/CGData/CodeGenData.h
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,9 @@ class CodeGenData {
const OutlinedHashTree *getOutlinedHashTree() {
return PublishedHashTree.get();
}
const StableFunctionMap *getStableFunctionMap() {
return PublishedStableFunctionMap.get();
}

/// Returns true if we should write codegen data.
bool emitCGData() { return EmitCGData; }
Expand All @@ -169,10 +172,18 @@ inline bool hasOutlinedHashTree() {
return CodeGenData::getInstance().hasOutlinedHashTree();
}

inline bool hasStableFunctionMap() {
return CodeGenData::getInstance().hasStableFunctionMap();
}

inline const OutlinedHashTree *getOutlinedHashTree() {
return CodeGenData::getInstance().getOutlinedHashTree();
}

inline const StableFunctionMap *getStableFunctionMap() {
return CodeGenData::getInstance().getStableFunctionMap();
}

inline bool emitCGData() { return CodeGenData::getInstance().emitCGData(); }

inline void
Expand Down
2 changes: 1 addition & 1 deletion llvm/include/llvm/CGData/StableFunctionMap.h
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ struct StableFunctionMap {
size_t size(SizeType Type = UniqueHashCount) const;

/// Finalize the stable function map by trimming content.
void finalize();
void finalize(bool SkipTrim = false);

private:
/// Insert a `StableFunctionEntry` into the function map directly. This
Expand Down
2 changes: 1 addition & 1 deletion llvm/include/llvm/CGData/StableFunctionMapRecord.h
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ struct StableFunctionMapRecord {
void deserializeYAML(yaml::Input &YIS);

/// Finalize the stable function map by trimming content.
void finalize() { FunctionMap->finalize(); }
void finalize(bool SkipTrim = false) { FunctionMap->finalize(SkipTrim); }

/// Merge the stable function map into this one.
void merge(const StableFunctionMapRecord &Other) {
Expand Down
85 changes: 85 additions & 0 deletions llvm/include/llvm/CodeGen/GlobalMergeFunctions.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
//===------ GlobalMergeFunctions.h - Global merge functions -----*- C++ -*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
// This pass defines the implementation of a function merging mechanism
// that utilizes a stable function hash to track differences in constants and
// identify potential merge candidates. The process involves two rounds:
// 1. The first round collects stable function hashes and identifies merge
// candidates with matching hashes. It also computes the set of parameters
// that point to different constants during the stable function merge.
// 2. The second round leverages this collected global function information to
// optimistically create a merged function in each module context, ensuring
// correct transformation.
// Similar to the global outliner, this approach uses the linker's deduplication
// (ICF) to fold identical merged functions, thereby reducing the final binary
// size. The work is inspired by the concepts discussed in the following paper:
// https://dl.acm.org/doi/pdf/10.1145/3652032.3657575.
//
//===----------------------------------------------------------------------===//

#ifndef LLVM_CODEGEN_GLOBALMERGEFUNCTIONS_H
#define LLVM_CODEGEN_GLOBALMERGEFUNCTIONS_H

#include "llvm/CGData/StableFunctionMap.h"
#include "llvm/IR/Module.h"
#include "llvm/IR/PassManager.h"
#include "llvm/Pass.h"

enum class HashFunctionMode {
Local,
BuildingHashFuncion,
UsingHashFunction,
};

namespace llvm {

// A vector of locations (the pair of (instruction, operand) indices) reachable
// from a parameter.
using ParamLocs = SmallVector<IndexPair, 4>;
// A vector of parameters
using ParamLocsVecTy = SmallVector<ParamLocs, 8>;

/// GlobalMergeFunc is a ModulePass that implements a function merging mechanism
/// using stable function hashes. It identifies and merges functions with
/// matching hashes across modules to optimize binary size.
class GlobalMergeFunc {
HashFunctionMode MergerMode = HashFunctionMode::Local;

std::unique_ptr<StableFunctionMap> LocalFunctionMap;

const ModuleSummaryIndex *Index;

public:
/// The suffix used to identify the merged function that parameterizes
/// the constant values. Note that the original function, without this suffix,
/// becomes a thunk supplying contexts to the merged function via parameters.
static constexpr const char MergingInstanceSuffix[] = ".Tgm";

GlobalMergeFunc(const ModuleSummaryIndex *Index) : Index(Index) {};

void initializeMergerMode(const Module &M);

bool run(Module &M);

/// Analyze module to create stable function into LocalFunctionMap.
void analyze(Module &M);

/// Emit LocalFunctionMap into __llvm_merge section.
void emitFunctionMap(Module &M);

/// Merge functions in the module using the given function map.
bool merge(Module &M, const StableFunctionMap *FunctionMap);
};

/// Global function merging pass for new pass manager.
struct GlobalMergeFuncPass : public PassInfoMixin<GlobalMergeFuncPass> {
PreservedAnalyses run(Module &M, AnalysisManager<Module> &);
};

} // end namespace llvm
#endif // LLVM_CODEGEN_GLOBALMERGEFUNCTIONS_H
3 changes: 3 additions & 0 deletions llvm/include/llvm/CodeGen/Passes.h
Original file line number Diff line number Diff line change
Expand Up @@ -507,6 +507,9 @@ namespace llvm {
/// This pass frees the memory occupied by the MachineFunction.
FunctionPass *createFreeMachineFunctionPass();

/// This pass performs merging similar functions globally.
ModulePass *createGlobalMergeFuncPass();

/// This pass performs outlining on machine instructions directly before
/// printing assembly.
ModulePass *createMachineOutlinerPass(bool RunOnAllFunctions = true);
Expand Down
1 change: 1 addition & 0 deletions llvm/include/llvm/InitializePasses.h
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,7 @@ void initializeGCEmptyBasicBlocksPass(PassRegistry &);
void initializeGCMachineCodeAnalysisPass(PassRegistry &);
void initializeGCModuleInfoPass(PassRegistry &);
void initializeGVNLegacyPassPass(PassRegistry &);
void initializeGlobalMergeFuncPassWrapperPass(PassRegistry &);
void initializeGlobalMergePass(PassRegistry &);
void initializeGlobalsAAWrapperPassPass(PassRegistry &);
void initializeHardwareLoopsLegacyPass(PassRegistry &);
Expand Down
1 change: 1 addition & 0 deletions llvm/include/llvm/LinkAllPasses.h
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@ struct ForcePassLinking {
(void)llvm::createDomOnlyViewerWrapperPassPass();
(void)llvm::createDomViewerWrapperPassPass();
(void)llvm::createAlwaysInlinerLegacyPass();
(void)llvm::createGlobalMergeFuncPass();
(void)llvm::createGlobalsAAWrapperPass();
(void)llvm::createInstSimplifyLegacyPass();
(void)llvm::createInstructionCombiningPass();
Expand Down
4 changes: 4 additions & 0 deletions llvm/include/llvm/Passes/CodeGenPassBuilder.h
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
#include "llvm/CodeGen/FinalizeISel.h"
#include "llvm/CodeGen/GCMetadata.h"
#include "llvm/CodeGen/GlobalMerge.h"
#include "llvm/CodeGen/GlobalMergeFunctions.h"
#include "llvm/CodeGen/IndirectBrExpand.h"
#include "llvm/CodeGen/InterleavedAccess.h"
#include "llvm/CodeGen/InterleavedLoadCombine.h"
Expand Down Expand Up @@ -713,6 +714,9 @@ void CodeGenPassBuilder<Derived, TargetMachineT>::addIRPasses(
// Convert conditional moves to conditional jumps when profitable.
if (getOptLevel() != CodeGenOptLevel::None && !Opt.DisableSelectOptimize)
addPass(SelectOptimizePass(&TM));

if (Opt.EnableGlobalMergeFunc)
addPass(GlobalMergeFuncPass());
}

/// Turn exception handling constructs into something the code generators can
Expand Down
1 change: 1 addition & 0 deletions llvm/include/llvm/Passes/MachinePassRegistry.def
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ MODULE_PASS("jmc-instrumenter", JMCInstrumenterPass())
MODULE_PASS("lower-emutls", LowerEmuTLSPass())
MODULE_PASS("pre-isel-intrinsic-lowering", PreISelIntrinsicLoweringPass())
MODULE_PASS("shadow-stack-gc-lowering", ShadowStackGCLoweringPass())
MODULE_PASS("global-merge-func", GlobalMergeFuncPass())
#undef MODULE_PASS

#ifndef FUNCTION_ANALYSIS
Expand Down
1 change: 1 addition & 0 deletions llvm/include/llvm/Target/CGPassBuilderOption.h
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ struct CGPassBuilderOption {
bool DisableVerify = false;
bool EnableImplicitNullChecks = false;
bool EnableBlockPlacementStats = false;
bool EnableGlobalMergeFunc = false;
bool EnableMachineFunctionSplitter = false;
bool MISchedPostRA = false;
bool EarlyLiveIntervals = false;
Expand Down
71 changes: 70 additions & 1 deletion llvm/lib/CGData/StableFunctionMap.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,43 @@
//===----------------------------------------------------------------------===//

#include "llvm/CGData/StableFunctionMap.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"

#define DEBUG_TYPE "stable-function-map"

using namespace llvm;

static cl::opt<unsigned>
GlobalMergingMinMerges("global-merging-min-merges",
cl::desc("Minimum number of similar functions with "
"the same hash required for merging."),
cl::init(2), cl::Hidden);
static cl::opt<unsigned> GlobalMergingMinInstrs(
"global-merging-min-instrs",
cl::desc("The minimum instruction count required when merging functions."),
cl::init(1), cl::Hidden);
static cl::opt<unsigned> GlobalMergingMaxParams(
"global-merging-max-params",
cl::desc(
"The maximum number of parameters allowed when merging functions."),
cl::init(std::numeric_limits<unsigned>::max()), cl::Hidden);
static cl::opt<unsigned> GlobalMergingParamOverhead(
"global-merging-param-overhead",
cl::desc("The overhead cost associated with each parameter when merging "
"functions."),
cl::init(2), cl::Hidden);
static cl::opt<unsigned>
GlobalMergingCallOverhead("global-merging-call-overhead",
cl::desc("The overhead cost associated with each "
"function call when merging functions."),
cl::init(1), cl::Hidden);
static cl::opt<unsigned> GlobalMergingExtraThreshold(
"global-merging-extra-threshold",
cl::desc("An additional cost threshold that must be exceeded for merging "
"to be considered beneficial."),
cl::init(0), cl::Hidden);

unsigned StableFunctionMap::getIdOrCreateForName(StringRef Name) {
auto It = NameToId.find(Name);
if (It != NameToId.end())
Expand Down Expand Up @@ -117,7 +149,38 @@ static void removeIdenticalIndexPair(
SF->IndexOperandHashMap->erase(Pair);
}

void StableFunctionMap::finalize() {
static bool isProfitable(
const SmallVector<std::unique_ptr<StableFunctionMap::StableFunctionEntry>>
&SFS) {
unsigned StableFunctionCount = SFS.size();
if (StableFunctionCount < GlobalMergingMinMerges)
return false;

unsigned InstCount = SFS[0]->InstCount;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of IR cannot precisely reflect the actual number of machine instructions (often the latter is larger for AArch64. Take the access of a global value for example, it will be expanded into an ADRP pair for small code model), which results in Benefit underestimated and some profitable merging opportunities dropped. I am not sure if there is existing code that could be reused to better estimate the machine instruction count, but at least we may introduce a multiplier on InstCount for fine-tuning of the behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added -global-merging-inst-overhead to tune this parameter.

if (InstCount < GlobalMergingMinInstrs)
return false;

unsigned ParamCount = SFS[0]->IndexOperandHashMap->size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of unique values of SFS[0]->IndexOperandHashMap is more accurate, if a given constant is used multiple times in the stable function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of unique values can be different depending on a stable function.
Anyhow, I consider this count as a parameter count, and add them up to compute the total cost.

if (ParamCount > GlobalMergingMaxParams)
return false;

unsigned Benefit = InstCount * (StableFunctionCount - 1);
unsigned Cost =
(GlobalMergingParamOverhead * ParamCount + GlobalMergingCallOverhead) *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GlobalMergingParamOverhead could potentially be fine grained for different kinds of constant. I ran some tests from Swift repo to test the compatibility between the Apple implementation and this one, and found that some tested functions were not merged because it overestimated the cost for some parameters of small scalar type. This should not make a noticeable difference in production, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also FYI the implementation that comes with this PR is not passing all the existing Swift repo tests despite lowering GlobalMergingParamOverhead to 0 to force merges to happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it overestimated the cost for some parameters of small scalar type.

It seems we need to incorporate some type information to accurately reflect the precise cost, in theory. Since the profit model is currently computed offline, I've refined the parameter count for greater precision, as mentioned above

FYI the implementation that comes with this PR is not passing all the existing Swift repo tests despite lowering GlobalMergingParamOverhead to 0 to force merges to happen.

Despite some differences in the underlying assumptions, I'm curious about what the existing Swift merge can accomplish that this new pass cannot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about what the existing Swift merge can accomplish that this new pass cannot.

These are the behavioral differences I have found by running merge_func*.ll from Swift repo, some of which I believe we should consider implementing as well:

Test Differences Remark
merge_func_preserves_vfe.ll merge_candidate_c merged or not The new pass seems unaware of metadata differences.
merge_func_ptrauth.ll presence of ptrauth info I believe the new pass does not implement ptrauth for simplicity.
merge_func_return_type_cast.ll return_0 merged with return_null or not The 2 functions have different stable hash values.
merge_func.ll func*_merged_with* merging strategy Apple implementation tends to avoid too many parameters.
merge_func.ll caller*_* merging strategy Apple implementation does not parameterize the calls on the call chain.
merge_func.ll first merged with second or not Apple implementation is aware of the order of incoming blocks to a phi.
merge_func.ll not_really_recursive eliminated or not Apple implementation replaces the call to the original function with the merged one.

StableFunctionCount +
GlobalMergingExtraThreshold;

bool Result = Benefit > Cost;
LLVM_DEBUG(dbgs() << "isProfitable: Hash = " << SFS[0]->Hash << ", "
<< "StableFunctionCount = " << StableFunctionCount
<< ", InstCount = " << InstCount
<< ", ParamCount = " << ParamCount
<< ", Benefit = " << Benefit << ", Cost = " << Cost
<< ", Result = " << (Result ? "true" : "false") << "\n");
return Result;
}

void StableFunctionMap::finalize(bool SkipTrim) {
for (auto It = HashToFuncs.begin(); It != HashToFuncs.end(); ++It) {
auto &[StableHash, SFS] = *It;

Expand Down Expand Up @@ -158,9 +221,15 @@ void StableFunctionMap::finalize() {
continue;
}

if (SkipTrim)
continue;

// Trim the index pair that has the same operand hash across
// stable functions.
removeIdenticalIndexPair(SFS);

if (!isProfitable(SFS))
HashToFuncs.erase(It);
}

Finalized = true;
Expand Down
1 change: 1 addition & 0 deletions llvm/lib/CodeGen/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ add_llvm_component_library(LLVMCodeGen
GCMetadataPrinter.cpp
GCRootLowering.cpp
GlobalMerge.cpp
GlobalMergeFunctions.cpp
HardwareLoops.cpp
IfConversion.cpp
ImplicitNullChecks.cpp
Expand Down
Loading
Loading