Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CGData] Global Merge Functions #112671

Merged
merged 7 commits into from
Nov 14, 2024
Merged

[CGData] Global Merge Functions #112671

merged 7 commits into from
Nov 14, 2024

Conversation

kyulee-com
Copy link
Contributor

@kyulee-com kyulee-com commented Oct 17, 2024

This implements a global function merging pass. Unlike traditional function merging passes that use IR comparators, this pass employs a structurally stable hash to identify similar functions while ignoring certain constant operands. These ignored constants are tracked and encoded into a stable function summary. When merging, instead of explicitly folding similar functions and their call sites, we form a merging instance by supplying different parameters via thunks. The actual size reduction occurs when identically created merging instances are folded by the linker.

Currently, this pass is wired to llvm-lto2, enabled by the -enable-global-merge-func flag.
In a local merging mode, the analysis and merging steps occur sequentially within a module:

  • analyze: Collects stable function hashes and tracks locations of ignored constant operands.
  • finalize: Identifies merge candidates with matching hashes and computes the set of parameters that point to different constants.
  • merge: Uses the stable function map to optimistically create a merged function.

We can enable a global merging mode similar to the global function outliner (https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753/), which will perform the above steps separately.

  • -codegen-data-generate: During the first round of code generation, we analyze local merging instances and publish their summaries.
  • Offline using llvm-cgdata or at link-time, we can finalize all these merging summaries that are combined to determine parameters.
  • -codegen-data-use: During the second round of code generation, we optimistically create merging instances within each module, and finally, the linker folds identically created merging instances.

Depends on #112664
This is a patch for https://discourse.llvm.org/t/rfc-global-function-merging/82608.

@kyulee-com kyulee-com force-pushed the users/kyulee-com/cgdata branch from f6fc259 to 09f1ec7 Compare October 17, 2024 18:43
@kyulee-com kyulee-com force-pushed the users/kyulee-com/globmerge branch 2 times, most recently from 584a2d7 to 1601086 Compare October 17, 2024 20:47
@kyulee-com kyulee-com force-pushed the users/kyulee-com/cgdata branch 2 times, most recently from b6007c4 to 09f1ec7 Compare October 17, 2024 23:54
@kyulee-com kyulee-com force-pushed the users/kyulee-com/globmerge branch from 1601086 to ded5771 Compare October 18, 2024 00:02
@kyulee-com kyulee-com force-pushed the users/kyulee-com/cgdata branch 2 times, most recently from 5425db1 to 06a4969 Compare November 1, 2024 15:52
@kyulee-com kyulee-com force-pushed the users/kyulee-com/globmerge branch from 12b539e to 3194154 Compare November 4, 2024 08:48
Base automatically changed from users/kyulee-com/cgdata to main November 5, 2024 01:32
@kyulee-com kyulee-com force-pushed the users/kyulee-com/globmerge branch from 3194154 to e86d78b Compare November 5, 2024 05:11
@kyulee-com
Copy link
Contributor Author

cc. @nocchijiang

@kyulee-com kyulee-com marked this pull request as ready for review November 5, 2024 08:59
@llvmbot llvmbot added LTO Link time optimization (regular/full LTO or ThinLTO) llvm:transforms labels Nov 5, 2024
@llvmbot
Copy link
Member

llvmbot commented Nov 5, 2024

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-lto

Author: Kyungwoo Lee (kyulee-com)

Changes

This implements a global function merging pass. Unlike traditional function merging passes that use IR comparators, this pass employs a structurally stable hash to identify similar functions while ignoring certain constant operands. These ignored constants are tracked and encoded into a stable function summary. When merging, instead of explicitly folding similar functions and their call sites, we form a merging instance by supplying different parameters via thunks. The actual size reduction occurs when identically created merging instances are folded by the linker.

Currently, this pass is wired to llvm-lto2, enabled by the -enable-global-merge-func flag.
In a local merging mode, the analysis and merging steps occur sequentially within a module:

  • analyze: Collects stable function hashes and tracks locations of ignored constant operands.
  • finish: Identifies merge candidates with matching hashes and computes the set of parameters that point to different constants.
  • merge: Uses the stable function map to optimistically create a merged function.

We can enable a global merging mode similar to the global function outliner (https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753/), which will perform the above steps separately.

  • -codegen-data-generate: During the first round of code generation, we analyze local merging instances and publish their summaries.
  • Offline using llvm-cgdata or at link-time, we can finalize all these merging summaries that are combined to determine parameters.
  • -codegen-data-use: During the second round of code generation, we optimistically create merging instances within each module, and finally, the linker folds identically created merging instances.

Depends on #112664
This is a patch for https://discourse.llvm.org/t/rfc-global-function-merging/82608.


Patch is 49.23 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/112671.diff

15 Files Affected:

  • (modified) llvm/include/llvm/CGData/CodeGenData.h (+11)
  • (modified) llvm/include/llvm/InitializePasses.h (+1)
  • (modified) llvm/include/llvm/LinkAllPasses.h (+1)
  • (modified) llvm/include/llvm/Transforms/IPO.h (+5)
  • (added) llvm/include/llvm/Transforms/IPO/GlobalMergeFunctions.h (+78)
  • (modified) llvm/lib/CodeGen/TargetPassConfig.cpp (+3)
  • (modified) llvm/lib/LTO/LTO.cpp (+1)
  • (modified) llvm/lib/Transforms/IPO/CMakeLists.txt (+2)
  • (added) llvm/lib/Transforms/IPO/GlobalMergeFunctions.cpp (+744)
  • (added) llvm/test/ThinLTO/AArch64/cgdata-merge-local.ll (+62)
  • (added) llvm/test/ThinLTO/AArch64/cgdata-merge-read.ll (+82)
  • (added) llvm/test/ThinLTO/AArch64/cgdata-merge-two-rounds.ll (+68)
  • (added) llvm/test/ThinLTO/AArch64/cgdata-merge-write.ll (+97)
  • (modified) llvm/tools/llvm-lto2/CMakeLists.txt (+1)
  • (modified) llvm/tools/llvm-lto2/llvm-lto2.cpp (+6)
diff --git a/llvm/include/llvm/CGData/CodeGenData.h b/llvm/include/llvm/CGData/CodeGenData.h
index 5d7c74725ccef1..da0e412f2a0e03 100644
--- a/llvm/include/llvm/CGData/CodeGenData.h
+++ b/llvm/include/llvm/CGData/CodeGenData.h
@@ -145,6 +145,9 @@ class CodeGenData {
   const OutlinedHashTree *getOutlinedHashTree() {
     return PublishedHashTree.get();
   }
+  const StableFunctionMap *getStableFunctionMap() {
+    return PublishedStableFunctionMap.get();
+  }
 
   /// Returns true if we should write codegen data.
   bool emitCGData() { return EmitCGData; }
@@ -169,10 +172,18 @@ inline bool hasOutlinedHashTree() {
   return CodeGenData::getInstance().hasOutlinedHashTree();
 }
 
+inline bool hasStableFunctionMap() {
+  return CodeGenData::getInstance().hasStableFunctionMap();
+}
+
 inline const OutlinedHashTree *getOutlinedHashTree() {
   return CodeGenData::getInstance().getOutlinedHashTree();
 }
 
+inline const StableFunctionMap *getStableFunctionMap() {
+  return CodeGenData::getInstance().getStableFunctionMap();
+}
+
 inline bool emitCGData() { return CodeGenData::getInstance().emitCGData(); }
 
 inline void
diff --git a/llvm/include/llvm/InitializePasses.h b/llvm/include/llvm/InitializePasses.h
index 54c070401ec8a4..a43a96e90543cc 100644
--- a/llvm/include/llvm/InitializePasses.h
+++ b/llvm/include/llvm/InitializePasses.h
@@ -123,6 +123,7 @@ void initializeGCEmptyBasicBlocksPass(PassRegistry &);
 void initializeGCMachineCodeAnalysisPass(PassRegistry &);
 void initializeGCModuleInfoPass(PassRegistry &);
 void initializeGVNLegacyPassPass(PassRegistry &);
+void initializeGlobalMergeFuncPass(PassRegistry &);
 void initializeGlobalMergePass(PassRegistry &);
 void initializeGlobalsAAWrapperPassPass(PassRegistry &);
 void initializeHardwareLoopsLegacyPass(PassRegistry &);
diff --git a/llvm/include/llvm/LinkAllPasses.h b/llvm/include/llvm/LinkAllPasses.h
index 3516b47d29ef36..d1219d6ee2a13e 100644
--- a/llvm/include/llvm/LinkAllPasses.h
+++ b/llvm/include/llvm/LinkAllPasses.h
@@ -79,6 +79,7 @@ struct ForcePassLinking {
     (void)llvm::createDomOnlyViewerWrapperPassPass();
     (void)llvm::createDomViewerWrapperPassPass();
     (void)llvm::createAlwaysInlinerLegacyPass();
+    (void)llvm::createGlobalMergeFuncPass();
     (void)llvm::createGlobalsAAWrapperPass();
     (void)llvm::createInstSimplifyLegacyPass();
     (void)llvm::createInstructionCombiningPass();
diff --git a/llvm/include/llvm/Transforms/IPO.h b/llvm/include/llvm/Transforms/IPO.h
index ee0e35aa618325..28cf330ad20812 100644
--- a/llvm/include/llvm/Transforms/IPO.h
+++ b/llvm/include/llvm/Transforms/IPO.h
@@ -55,6 +55,11 @@ enum class PassSummaryAction {
   Export, ///< Export information to summary.
 };
 
+/// createGlobalMergeFuncPass - This pass generates merged instances by
+/// parameterizing distinct constants across similar functions, utilizing stable
+/// function hash information.
+Pass *createGlobalMergeFuncPass();
+
 } // End llvm namespace
 
 #endif
diff --git a/llvm/include/llvm/Transforms/IPO/GlobalMergeFunctions.h b/llvm/include/llvm/Transforms/IPO/GlobalMergeFunctions.h
new file mode 100644
index 00000000000000..395aeda73b2d0f
--- /dev/null
+++ b/llvm/include/llvm/Transforms/IPO/GlobalMergeFunctions.h
@@ -0,0 +1,78 @@
+//===------ GlobalMergeFunctions.h - Global merge functions -----*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This pass defines the implementation of a function merging mechanism
+// that utilizes a stable function hash to track differences in constants and
+// identify potential merge candidates. The process involves two rounds:
+// 1. The first round collects stable function hashes and identifies merge
+//    candidates with matching hashes. It also computes the set of parameters
+//    that point to different constants during the stable function merge.
+// 2. The second round leverages this collected global function information to
+//    optimistically create a merged function in each module context, ensuring
+//    correct transformation.
+// Similar to the global outliner, this approach uses the linker's deduplication
+// (ICF) to fold identical merged functions, thereby reducing the final binary
+// size. The work is inspired by the concepts discussed in the following paper:
+// https://dl.acm.org/doi/pdf/10.1145/3652032.3657575.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_TRANSFORMS_IPO_GLOBALMERGEFUNCTIONS_H
+#define LLVM_TRANSFORMS_IPO_GLOBALMERGEFUNCTIONS_H
+
+#include "llvm/CGData/StableFunctionMap.h"
+#include "llvm/IR/Module.h"
+#include "llvm/Pass.h"
+
+enum class HashFunctionMode {
+  Local,
+  BuildingHashFuncion,
+  UsingHashFunction,
+};
+
+namespace llvm {
+
+// A vector of locations (the pair of (instruction, operand) indices) reachable
+// from a parameter.
+using ParamLocs = SmallVector<IndexPair, 4>;
+// A vector of parameters
+using ParamLocsVecTy = SmallVector<ParamLocs, 8>;
+
+/// GlobalMergeFunc is a ModulePass that implements a function merging mechanism
+/// using stable function hashes. It identifies and merges functions with
+/// matching hashes across modules to optimize binary size.
+class GlobalMergeFunc : public ModulePass {
+  HashFunctionMode MergerMode = HashFunctionMode::Local;
+
+  std::unique_ptr<StableFunctionMap> LocalFunctionMap;
+
+public:
+  static char ID;
+
+  GlobalMergeFunc();
+
+  StringRef getPassName() const override;
+
+  void getAnalysisUsage(AnalysisUsage &AU) const override;
+
+  void initializeMergerMode(const Module &M);
+
+  bool runOnModule(Module &M) override;
+
+  /// Analyze module to create stable function into LocalFunctionMap.
+  void analyze(Module &M);
+
+  /// Emit LocalFunctionMap into __llvm_merge section.
+  void emitFunctionMap(Module &M);
+
+  /// Merge functions in the module using the given function map.
+  bool merge(Module &M, const StableFunctionMap *FunctionMap);
+};
+
+} // end namespace llvm
+#endif // LLVM_TRANSFORMS_IPO_GLOBALMERGEFUNCTIONS_H
diff --git a/llvm/lib/CodeGen/TargetPassConfig.cpp b/llvm/lib/CodeGen/TargetPassConfig.cpp
index aff74104006e5a..beeb8ea2d078d9 100644
--- a/llvm/lib/CodeGen/TargetPassConfig.cpp
+++ b/llvm/lib/CodeGen/TargetPassConfig.cpp
@@ -141,6 +141,9 @@ static cl::opt<RunOutliner> EnableMachineOutliner(
                           "Disable all outlining"),
                // Sentinel value for unspecified option.
                clEnumValN(RunOutliner::AlwaysOutline, "", "")));
+cl::opt<bool> EnableGlobalMergeFunc(
+    "enable-global-merge-func", cl::Hidden,
+    cl::desc("Enable global merge functions that are based on hash function"));
 // Disable the pass to fix unwind information. Whether the pass is included in
 // the pipeline is controlled via the target options, this option serves as
 // manual override.
diff --git a/llvm/lib/LTO/LTO.cpp b/llvm/lib/LTO/LTO.cpp
index 0f53c608512171..3f88c6c475d4d8 100644
--- a/llvm/lib/LTO/LTO.cpp
+++ b/llvm/lib/LTO/LTO.cpp
@@ -53,6 +53,7 @@
 #include "llvm/Support/raw_ostream.h"
 #include "llvm/Target/TargetOptions.h"
 #include "llvm/Transforms/IPO.h"
+#include "llvm/Transforms/IPO/GlobalMergeFunctions.h"
 #include "llvm/Transforms/IPO/MemProfContextDisambiguation.h"
 #include "llvm/Transforms/IPO/WholeProgramDevirt.h"
 #include "llvm/Transforms/Utils/FunctionImportUtils.h"
diff --git a/llvm/lib/Transforms/IPO/CMakeLists.txt b/llvm/lib/Transforms/IPO/CMakeLists.txt
index 15cb57399d2460..6a36ed64149f93 100644
--- a/llvm/lib/Transforms/IPO/CMakeLists.txt
+++ b/llvm/lib/Transforms/IPO/CMakeLists.txt
@@ -21,6 +21,7 @@ add_llvm_component_library(LLVMipo
   GlobalDCE.cpp
   GlobalOpt.cpp
   GlobalSplit.cpp
+  GlobalMergeFunctions.cpp
   HotColdSplitting.cpp
   IPO.cpp
   IROutliner.cpp
@@ -60,6 +61,7 @@ add_llvm_component_library(LLVMipo
   Analysis
   BitReader
   BitWriter
+  CGData
   Core
   FrontendOpenMP
   InstCombine
diff --git a/llvm/lib/Transforms/IPO/GlobalMergeFunctions.cpp b/llvm/lib/Transforms/IPO/GlobalMergeFunctions.cpp
new file mode 100644
index 00000000000000..5a52c8f3304186
--- /dev/null
+++ b/llvm/lib/Transforms/IPO/GlobalMergeFunctions.cpp
@@ -0,0 +1,744 @@
+//===---- GlobalMergeFunctions.cpp - Global merge functions -------*- C++ -===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This pass defines the implementation of a function merging mechanism
+// that utilizes a stable function hash to track differences in constants and
+// create potential merge candidates. The process involves two rounds:
+// 1. The first round collects stable function hashes and identifies merge
+//    candidates with matching hashes. It also computes the set of parameters
+//    that point to different constants during the stable function merge.
+// 2. The second round leverages this collected global function information to
+//    optimistically create a merged function in each module context, ensuring
+//    correct transformation.
+// Similar to the global outliner, this approach uses the linker's deduplication
+// (ICF) to fold identical merged functions, thereby reducing the final binary
+// size. The work is inspired by the concepts discussed in the following paper:
+// https://dl.acm.org/doi/pdf/10.1145/3652032.3657575.
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/Transforms/IPO/GlobalMergeFunctions.h"
+#include "llvm/ADT/Statistic.h"
+#include "llvm/Analysis/ModuleSummaryAnalysis.h"
+#include "llvm/CGData/CodeGenData.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/StructuralHash.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Support/CommandLine.h"
+#include "llvm/Transforms/Utils/ModuleUtils.h"
+
+#define DEBUG_TYPE "global-merge-func"
+
+using namespace llvm;
+using namespace llvm::support;
+
+static cl::opt<bool>
+    DisableGlobalMerging("disable-global-merging", cl::Hidden,
+                         cl::desc("Disable global merging only by ignoring "
+                                  "the codegen data generation or use. Local "
+                                  "merging is still enabled within a module."),
+                         cl::init(false));
+static cl::opt<unsigned> GlobalMergingMinInstrs(
+    "global-merging-min-instrs",
+    cl::desc("The minimum instruction count required when merging functions."),
+    cl::init(1), cl::Hidden);
+static cl::opt<unsigned> GlobalMergingMaxParams(
+    "global-merging-max-params",
+    cl::desc(
+        "The maximum number of parameters allowed when merging functions."),
+    cl::init(std::numeric_limits<unsigned>::max()), cl::Hidden);
+static cl::opt<unsigned> GlobalMergingParamOverhead(
+    "global-merging-param-overhead",
+    cl::desc("The overhead cost associated with each parameter when merging "
+             "functions."),
+    cl::init(2), cl::Hidden);
+static cl::opt<unsigned>
+    GlobalMergingCallOverhead("global-merging-call-overhead",
+                              cl::desc("The overhead cost associated with each "
+                                       "function call when merging functions."),
+                              cl::init(1), cl::Hidden);
+static cl::opt<unsigned> GlobalMergingExtraThreshold(
+    "global-merging-extra-threshold",
+    cl::desc("An additional cost threshold that must be exceeded for merging "
+             "to be considered beneficial."),
+    cl::init(0), cl::Hidden);
+
+extern cl::opt<bool> EnableGlobalMergeFunc;
+
+STATISTIC(NumMismatchedFunctionHashGlobalMergeFunction,
+          "Number of mismatched function hash for global merge function");
+STATISTIC(NumMismatchedInstCountGlobalMergeFunction,
+          "Number of mismatched instruction count for global merge function");
+STATISTIC(NumMismatchedConstHashGlobalMergeFunction,
+          "Number of mismatched const hash for global merge function");
+STATISTIC(NumMismatchedModuleIdGlobalMergeFunction,
+          "Number of mismatched Module Id for global merge function");
+STATISTIC(NumGlobalMergeFunctions,
+          "Number of functions that are actually merged using function hash");
+STATISTIC(NumAnalyzedModues, "Number of modules that are analyzed");
+STATISTIC(NumAnalyzedFunctions, "Number of functions that are analyzed");
+STATISTIC(NumEligibleFunctions, "Number of functions that are eligible");
+
+/// Returns true if the \OpIdx operand of \p CI is the callee operand.
+static bool isCalleeOperand(const CallBase *CI, unsigned OpIdx) {
+  return &CI->getCalledOperandUse() == &CI->getOperandUse(OpIdx);
+}
+
+static bool canParameterizeCallOperand(const CallBase *CI, unsigned OpIdx) {
+  if (CI->isInlineAsm())
+    return false;
+  Function *Callee = CI->getCalledOperand()
+                         ? dyn_cast_or_null<Function>(
+                               CI->getCalledOperand()->stripPointerCasts())
+                         : nullptr;
+  if (Callee) {
+    if (Callee->isIntrinsic())
+      return false;
+    // objc_msgSend stubs must be called, and can't have their address taken.
+    if (Callee->getName().starts_with("objc_msgSend$"))
+      return false;
+  }
+  if (isCalleeOperand(CI, OpIdx) &&
+      CI->getOperandBundle(LLVMContext::OB_ptrauth).has_value()) {
+    // The operand is the callee and it has already been signed. Ignore this
+    // because we cannot add another ptrauth bundle to the call instruction.
+    return false;
+  }
+  return true;
+}
+
+bool isEligibleInstrunctionForConstantSharing(const Instruction *I) {
+  switch (I->getOpcode()) {
+  case Instruction::Load:
+  case Instruction::Store:
+  case Instruction::Call:
+  case Instruction::Invoke:
+    return true;
+  default:
+    return false;
+  }
+}
+
+bool isEligibleOperandForConstantSharing(const Instruction *I, unsigned OpIdx) {
+  assert(OpIdx < I->getNumOperands() && "Invalid operand index");
+
+  if (!isEligibleInstrunctionForConstantSharing(I))
+    return false;
+
+  auto Opnd = I->getOperand(OpIdx);
+  if (!isa<Constant>(Opnd))
+    return false;
+
+  if (const auto *CI = dyn_cast<CallBase>(I))
+    return canParameterizeCallOperand(CI, OpIdx);
+
+  return true;
+}
+
+/// Returns true if function \p F is eligible for merging.
+bool isEligibleFunction(Function *F) {
+  if (F->isDeclaration())
+    return false;
+
+  if (F->hasFnAttribute(llvm::Attribute::NoMerge))
+    return false;
+
+  if (F->hasAvailableExternallyLinkage())
+    return false;
+
+  if (F->getFunctionType()->isVarArg())
+    return false;
+
+  if (F->getCallingConv() == CallingConv::SwiftTail)
+    return false;
+
+  // If function contains callsites with musttail, if we merge
+  // it, the merged function will have the musttail callsite, but
+  // the number of parameters can change, thus the parameter count
+  // of the callsite will mismatch with the function itself.
+  for (const BasicBlock &BB : *F) {
+    for (const Instruction &I : BB) {
+      const auto *CB = dyn_cast<CallBase>(&I);
+      if (CB && CB->isMustTailCall())
+        return false;
+    }
+  }
+
+  return true;
+}
+
+static bool
+isEligibleInstrunctionForConstantSharingLocal(const Instruction *I) {
+  switch (I->getOpcode()) {
+  case Instruction::Load:
+  case Instruction::Store:
+  case Instruction::Call:
+  case Instruction::Invoke:
+    return true;
+  default:
+    return false;
+  }
+}
+
+static bool ignoreOp(const Instruction *I, unsigned OpIdx) {
+  assert(OpIdx < I->getNumOperands() && "Invalid operand index");
+
+  if (!isEligibleInstrunctionForConstantSharingLocal(I))
+    return false;
+
+  if (!isa<Constant>(I->getOperand(OpIdx)))
+    return false;
+
+  if (const auto *CI = dyn_cast<CallBase>(I))
+    return canParameterizeCallOperand(CI, OpIdx);
+
+  return true;
+}
+
+static Value *createCast(IRBuilder<> &Builder, Value *V, Type *DestTy) {
+  Type *SrcTy = V->getType();
+  if (SrcTy->isStructTy()) {
+    assert(DestTy->isStructTy());
+    assert(SrcTy->getStructNumElements() == DestTy->getStructNumElements());
+    Value *Result = PoisonValue::get(DestTy);
+    for (unsigned int I = 0, E = SrcTy->getStructNumElements(); I < E; ++I) {
+      Value *Element =
+          createCast(Builder, Builder.CreateExtractValue(V, ArrayRef(I)),
+                     DestTy->getStructElementType(I));
+
+      Result = Builder.CreateInsertValue(Result, Element, ArrayRef(I));
+    }
+    return Result;
+  }
+  assert(!DestTy->isStructTy());
+  if (auto *SrcAT = dyn_cast<ArrayType>(SrcTy)) {
+    auto *DestAT = dyn_cast<ArrayType>(DestTy);
+    assert(DestAT);
+    assert(SrcAT->getNumElements() == DestAT->getNumElements());
+    Value *Result = UndefValue::get(DestTy);
+    for (unsigned int I = 0, E = SrcAT->getNumElements(); I < E; ++I) {
+      Value *Element =
+          createCast(Builder, Builder.CreateExtractValue(V, ArrayRef(I)),
+                     DestAT->getElementType());
+
+      Result = Builder.CreateInsertValue(Result, Element, ArrayRef(I));
+    }
+    return Result;
+  }
+  assert(!DestTy->isArrayTy());
+  if (SrcTy->isIntegerTy() && DestTy->isPointerTy())
+    return Builder.CreateIntToPtr(V, DestTy);
+  else if (SrcTy->isPointerTy() && DestTy->isIntegerTy())
+    return Builder.CreatePtrToInt(V, DestTy);
+  else
+    return Builder.CreateBitCast(V, DestTy);
+}
+
+void GlobalMergeFunc::analyze(Module &M) {
+  ++NumAnalyzedModues;
+  for (Function &Func : M) {
+    ++NumAnalyzedFunctions;
+    if (isEligibleFunction(&Func)) {
+      ++NumEligibleFunctions;
+
+      auto FI = llvm::StructuralHashWithDifferences(Func, ignoreOp);
+
+      // Convert the operand map to a vector for a serialization-friendly
+      // format.
+      IndexOperandHashVecType IndexOperandHashes;
+      for (auto &Pair : *FI.IndexOperandHashMap)
+        IndexOperandHashes.emplace_back(Pair);
+
+      StableFunction SF(FI.FunctionHash, get_stable_name(Func.getName()).str(),
+                        M.getModuleIdentifier(), FI.IndexInstruction->size(),
+                        std::move(IndexOperandHashes));
+
+      LocalFunctionMap->insert(SF);
+    }
+  }
+}
+
+/// Tuple to hold function info to process merging.
+struct FuncMergeInfo {
+  StableFunctionMap::StableFunctionEntry *SF;
+  Function *F;
+  std::unique_ptr<IndexInstrMap> IndexInstruction;
+};
+
+// Given the func info, and the parameterized locations, create and return
+// a new merged function by replacing the original constants with the new
+// parameters.
+static Function *createMergedFunction(FuncMergeInfo &FI,
+                                      ArrayRef<Type *> ConstParamTypes,
+                                      const ParamLocsVecTy &ParamLocsVec) {
+  // Synthesize a new merged function name by appending ".Tgm" to the root
+  // function's name.
+  auto *MergedFunc = FI.F;
+  auto NewFunctionName = MergedFunc->getName().str() + ".Tgm";
+  auto *M = MergedFunc->getParent();
+  assert(!M->getFunction(NewFunctionName));
+
+  FunctionType *OrigTy = MergedFunc->getFunctionType();
+  // Get the original params' types.
+  SmallVector<Type *> ParamTypes(OrigTy->param_begin(), OrigTy->param_end());
+  // Append const parameter types that are passed in.
+  ParamTypes.append(ConstParamTypes.begin(), ConstParamTypes.end());
+  FunctionType *FuncType =
+      FunctionType::get(OrigTy->getReturnType(), ParamTypes, false);
+
+  // Declare a new function
+  Function *NewFunction =
+      Function::Create(FuncType, MergedFunc->getLinkage(), NewFunctionName);
+  if (auto *SP = MergedFunc->getSubprogram())
+    NewFunction->setSubprogram(SP);
+  NewFunction->copyAttributesFrom(MergedFunc);
+  NewFunction->setDLLStorageClass(GlobalValue::DefaultStorageClass);
+
+  NewFunction->setLinkage(GlobalValue::InternalLinkage);
+  NewFunction->addFnAttr(Attribute::NoInline);
+
+  // Add the new function before the root function.
+  M->getFunctionList().insert(MergedFunc->getIterator(), NewFunction);
+
+  // Move the body of MergedFunc into the NewFunction.
+  NewFunction->splice(NewFunction->begin(), MergedFunc);
+
+  // Update the original args by the new args.
+  auto NewArgIter ...
[truncated]

@kyulee-com
Copy link
Contributor Author

cc. @eeckstein

@eeckstein
Copy link
Contributor

Nice! Will this new pass make Swift's function merging pass obsolete?

@kyulee-com
Copy link
Contributor Author

Nice! Will this new pass make Swift's function merging pass obsolete?

Yes. We use this new pass with thin-LTO, which is built on top of Swift's existing function merging pass. Since the former fully covers the functionality of the latter, technically we can disable it.

llvm/lib/Transforms/IPO/GlobalMergeFunctions.cpp Outdated Show resolved Hide resolved
llvm/lib/Transforms/IPO/GlobalMergeFunctions.cpp Outdated Show resolved Hide resolved
llvm/lib/Transforms/IPO/GlobalMergeFunctions.cpp Outdated Show resolved Hide resolved
llvm/lib/Transforms/IPO/GlobalMergeFunctions.cpp Outdated Show resolved Hide resolved
llvm/lib/Transforms/IPO/GlobalMergeFunctions.cpp Outdated Show resolved Hide resolved
llvm/lib/Transforms/IPO/GlobalMergeFunctions.cpp Outdated Show resolved Hide resolved
llvm/lib/Transforms/IPO/GlobalMergeFunctions.cpp Outdated Show resolved Hide resolved
llvm/lib/Transforms/IPO/GlobalMergeFunctions.cpp Outdated Show resolved Hide resolved
llvm/lib/Transforms/IPO/GlobalMergeFunctions.cpp Outdated Show resolved Hide resolved
llvm/lib/Transforms/IPO/GlobalMergeFunctions.cpp Outdated Show resolved Hide resolved
@nocchijiang
Copy link
Contributor

Shall we add new pass manager support in this PR or you have planned a separated one for it?

NewFunction->setDLLStorageClass(GlobalValue::DefaultStorageClass);

NewFunction->setLinkage(GlobalValue::InternalLinkage);
NewFunction->addFnAttr(Attribute::NoInline);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If MergedFunc is alwaysinline, we should remove the attribute from NewFunction. The verifier will complain about it if both noinline and alwaysinline are present.

Copy link
Contributor Author

@kyulee-com kyulee-com Nov 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now bail out the alwaysinline case as an eligible function as it doesn't make sense to merge functions that must be inlined.

// objc_msgSend stubs must be called, and can't have their address taken.
if (Callee->getName().starts_with("objc_msgSend$"))
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calls to dtrace probes should probably be filtered here as well. Linkers have special logic to handle them. Parameterizing these calls will break the linkage. See 6c641d0 for lld, https://github.com/apple-oss-distributions/ld64/blob/47f477cb721755419018f7530038b272e9d0cdea/src/ld/OutputFile.cpp#L1597 for ld64.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kyulee-com kyulee-com deleted the users/kyulee-com/globmerge branch November 14, 2024 01:34
@kyulee-com kyulee-com restored the users/kyulee-com/globmerge branch November 14, 2024 01:37
@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 14, 2024

LLVM Buildbot has detected a new failure on builder clang-m68k-linux-cross running on suse-gary-m68k-cross while building llvm at step 5 "ninja check 1".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/27/builds/2002

Here is the relevant piece of the build log for the reference
Step 5 (ninja check 1) failure: stage 1 checked (failure)
...
                 from /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/clang/include/clang/Tooling/DependencyScanning/DependencyScanningTool.h:13,
                 from /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/clang/unittests/Tooling/DependencyScanning/DependencyScannerTest.cpp:17:
/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/clang/include/clang/Sema/Sema.h:463:7: warning: ‘clang::Sema’ declared with greater visibility than the type of its field ‘clang::Sema::UnusedFileScopedDecls’ [-Wattributes]
  463 | class Sema final : public SemaBase {
      |       ^~~~
/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/clang/include/clang/Sema/Sema.h:463:7: warning: ‘clang::Sema’ declared with greater visibility than the type of its field ‘clang::Sema::TentativeDefinitions’ [-Wattributes]
/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/clang/include/clang/Sema/Sema.h:463:7: warning: ‘clang::Sema’ declared with greater visibility than the type of its field ‘clang::Sema::ExtVectorDecls’ [-Wattributes]
/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/clang/include/clang/Sema/Sema.h:463:7: warning: ‘clang::Sema’ declared with greater visibility than the type of its field ‘clang::Sema::DelegatingCtorDecls’ [-Wattributes]
[298/1137] Building CXX object tools/clang/unittests/Tooling/CMakeFiles/ToolingTests.dir/ExecutionTest.cpp.o
[299/1137] Building CXX object tools/clang/unittests/AST/CMakeFiles/ASTTests.dir/ASTImporterTest.cpp.o
FAILED: tools/clang/unittests/AST/CMakeFiles/ASTTests.dir/ASTImporterTest.cpp.o 
/usr/bin/c++ -DGTEST_HAS_RTTI=0 -DLLVM_BUILD_STATIC -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/stage1/tools/clang/unittests/AST -I/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/clang/unittests/AST -I/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/clang/include -I/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/stage1/tools/clang/include -I/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/stage1/include -I/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/llvm/include -I/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/third-party/unittest/googletest/include -I/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/third-party/unittest/googlemock/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -fno-lifetime-dse -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-maybe-uninitialized -Wno-nonnull -Wno-class-memaccess -Wno-redundant-move -Wno-pessimizing-move -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wno-misleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -fno-strict-aliasing -O3 -DNDEBUG -std=c++17  -Wno-variadic-macros -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -Wno-suggest-override -MD -MT tools/clang/unittests/AST/CMakeFiles/ASTTests.dir/ASTImporterTest.cpp.o -MF tools/clang/unittests/AST/CMakeFiles/ASTTests.dir/ASTImporterTest.cpp.o.d -o tools/clang/unittests/AST/CMakeFiles/ASTTests.dir/ASTImporterTest.cpp.o -c /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/clang/unittests/AST/ASTImporterTest.cpp
c++: fatal error: Killed signal terminated program cc1plus
compilation terminated.
[300/1137] Building CXX object tools/clang/unittests/Tooling/CMakeFiles/ToolingTests.dir/RecursiveASTVisitorTests/Attr.cpp.o
[301/1137] Building CXX object tools/clang/unittests/Tooling/CMakeFiles/ToolingTests.dir/LookupTest.cpp.o
[302/1137] Building CXX object tools/clang/unittests/Tooling/CMakeFiles/ToolingTests.dir/QualTypeNamesTest.cpp.o
[303/1137] Building CXX object tools/clang/unittests/Tooling/CMakeFiles/ToolingTests.dir/RangeSelectorTest.cpp.o
[304/1137] Building CXX object tools/clang/unittests/Tooling/CMakeFiles/ToolingTests.dir/RecursiveASTVisitorTests/CXXMemberCall.cpp.o
[305/1137] Building CXX object tools/clang/unittests/Tooling/CMakeFiles/ToolingTests.dir/RecursiveASTVisitorTests/ConstructExpr.cpp.o
[306/1137] Building CXX object tools/clang/unittests/Tooling/CMakeFiles/ToolingTests.dir/RecursiveASTVisitorTests/ImplicitCtor.cpp.o
[307/1137] Building CXX object tools/clang/unittests/Tooling/CMakeFiles/ToolingTests.dir/RecursiveASTVisitorTests/CXXMethodDecl.cpp.o
[308/1137] Building CXX object tools/clang/unittests/Tooling/CMakeFiles/ToolingTests.dir/RecursiveASTVisitorTests/BitfieldInitializer.cpp.o
[309/1137] Building CXX object tools/clang/unittests/Tooling/CMakeFiles/ToolingTests.dir/RecursiveASTVisitorTests/DeclRefExpr.cpp.o
[310/1137] Building CXX object tools/clang/unittests/Tooling/CMakeFiles/ToolingTests.dir/RecursiveASTVisitorTests/DeductionGuide.cpp.o
[311/1137] Building CXX object tools/clang/unittests/Tooling/CMakeFiles/ToolingTests.dir/RecursiveASTVisitorTests/CXXOperatorCallExprTraverser.cpp.o
[312/1137] Building CXX object tools/clang/unittests/Tooling/CMakeFiles/ToolingTests.dir/RecursiveASTVisitorTests/Class.cpp.o
[313/1137] Building CXX object tools/clang/unittests/ASTMatchers/CMakeFiles/ASTMatchersTests.dir/ASTMatchersNarrowingTest.cpp.o
[314/1137] Building CXX object tools/clang/unittests/Tooling/CMakeFiles/ToolingTests.dir/RecursiveASTVisitorTests/CallbacksCompoundAssignOperator.cpp.o
[315/1137] Building CXX object tools/clang/unittests/Tooling/CMakeFiles/ToolingTests.dir/LexicallyOrderedRecursiveASTVisitorTest.cpp.o
[316/1137] Building CXX object tools/clang/unittests/Tooling/CMakeFiles/ToolingTests.dir/RecursiveASTVisitorTests/CallbacksBinaryOperator.cpp.o
[317/1137] Building CXX object tools/clang/unittests/Tooling/CMakeFiles/ToolingTests.dir/RecursiveASTVisitorTests/ImplicitCtorInitializer.cpp.o
[318/1137] Building CXX object tools/clang/unittests/Tooling/CMakeFiles/ToolingTests.dir/RecursiveASTVisitorTests/CXXBoolLiteralExpr.cpp.o
[319/1137] Building CXX object tools/clang/unittests/Tooling/CMakeFiles/ToolingTests.dir/RecursiveASTVisitorTests/IntegerLiteral.cpp.o
[320/1137] Building CXX object tools/clang/unittests/Tooling/CMakeFiles/ToolingTests.dir/RecursiveASTVisitorTests/InitListExprPostOrderNoQueue.cpp.o
[321/1137] Building CXX object tools/clang/unittests/Tooling/CMakeFiles/ToolingTests.dir/RecursiveASTVisitorTests/LambdaTemplateParams.cpp.o
[322/1137] Building CXX object tools/clang/unittests/Tooling/CMakeFiles/ToolingTests.dir/RecursiveASTVisitorTests/Concept.cpp.o
[323/1137] Building CXX object tools/clang/unittests/Tooling/CMakeFiles/ToolingTests.dir/RecursiveASTVisitorTests/InitListExprPostOrder.cpp.o
[324/1137] Building CXX object tools/clang/unittests/Tooling/CMakeFiles/ToolingTests.dir/RecursiveASTVisitorTests/LambdaExpr.cpp.o
[325/1137] Building CXX object tools/clang/unittests/Tooling/CMakeFiles/ToolingTests.dir/RecursiveASTVisitorTests/InitListExprPreOrderNoQueue.cpp.o
[326/1137] Building CXX object tools/clang/unittests/Tooling/CMakeFiles/ToolingTests.dir/RecursiveASTVisitorTests/CallbacksCallExpr.cpp.o
[327/1137] Building CXX object tools/clang/unittests/Tooling/CMakeFiles/ToolingTests.dir/RecursiveASTVisitorTests/MemberPointerTypeLoc.cpp.o
[328/1137] Building CXX object tools/clang/unittests/Tooling/CMakeFiles/ToolingTests.dir/RecursiveASTVisitorTests/InitListExprPreOrder.cpp.o
[329/1137] Building CXX object tools/clang/unittests/Tooling/CMakeFiles/ToolingTests.dir/RecursiveASTVisitorTests/LambdaDefaultCapture.cpp.o
[330/1137] Building CXX object tools/clang/unittests/Tooling/CMakeFiles/ToolingTests.dir/RecursiveASTVisitorTests/CallbacksLeaf.cpp.o
[331/1137] Building CXX object tools/clang/unittests/Tooling/CMakeFiles/ToolingTests.dir/RecursiveASTVisitorTests/CallbacksUnaryOperator.cpp.o
[332/1137] Building CXX object tools/clang/unittests/ASTMatchers/CMakeFiles/ASTMatchersTests.dir/ASTMatchersTraversalTest.cpp.o
ninja: build stopped: subcommand failed.

kyulee-com added a commit that referenced this pull request Nov 14, 2024
This is a follow-up PR to refactor the initial global merge function
pass implemented in #112671.

It first collects stable functions relevant to the current module and
iterates over those only, instead of iterating through all stable
functions in the stable function map.

This is a patch for
https://discourse.llvm.org/t/rfc-global-function-merging/82608.
@mikaelholmen
Copy link
Collaborator

Hi,

I've no idea if this is expected but I noticed that if I run
opt -passes=global-merge-func gmf.ll -o /dev/null
on gmf.ll being

@0 = global { { ptr, i32, i32 } } { { ptr, i32, i32 } { ptr null, i32 19, i32 5 } }
@1 = global { { ptr, i32, i32 } } { { ptr, i32, i32 } { ptr null, i32 22, i32 5 } }

it crashes like

opt: ../lib/Analysis/ModuleSummaryAnalysis.cpp:1072: ModuleSummaryIndex llvm::buildModuleSummaryIndex(const Module &, std::function<BlockFrequencyInfo *(const Function &)>, ProfileSummaryInfo *, std::function<const StackSafetyInfo *(const Function &)>): Assertion `GlobalList.second.SummaryList.size() == 1 && "Expected module's index to have one summary per GUID"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: build-all/bin/opt -passes=global-merge-func gmf.ll -o /dev/null
1.	Running pass "global-merge-func" on module "gmf.ll"
 #0 0x000055e51084dd78 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (build-all/bin/opt+0x4427d78)
 #1 0x000055e51084b83e llvm::sys::RunSignalHandlers() (build-all/bin/opt+0x442583e)
 #2 0x000055e51084e5ad SignalHandler(int) Signals.cpp:0:0
 #3 0x00007f1cdb046cf0 __restore_rt (/lib64/libpthread.so.0+0x12cf0)
 #4 0x00007f1cd8bffacf raise (/lib64/libc.so.6+0x4eacf)
 #5 0x00007f1cd8bd2ea5 abort (/lib64/libc.so.6+0x21ea5)
 #6 0x00007f1cd8bd2d79 _nl_load_domain.cold.0 (/lib64/libc.so.6+0x21d79)
 #7 0x00007f1cd8bf8426 (/lib64/libc.so.6+0x47426)
 #8 0x000055e511442fe1 llvm::buildModuleSummaryIndex(llvm::Module const&, std::function<llvm::BlockFrequencyInfo* (llvm::Function const&)>, llvm::ProfileSummaryInfo*, std::function<llvm::StackSafetyInfo const* (llvm::Function const&)>) (build-all/bin/opt+0x501cfe1)
 #9 0x000055e511447e72 llvm::ModuleSummaryIndexAnalysis::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (build-all/bin/opt+0x5021e72)
#10 0x000055e5123dce42 llvm::detail::AnalysisPassModel<llvm::Module, llvm::ModuleSummaryIndexAnalysis, llvm::AnalysisManager<llvm::Module>::Invalidator>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) PassBuilder.cpp:0:0
#11 0x000055e510a5849a llvm::AnalysisManager<llvm::Module>::getResultImpl(llvm::AnalysisKey*, llvm::Module&) (build-all/bin/opt+0x463249a)
#12 0x000055e511b31fbf llvm::GlobalMergeFuncPass::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (build-all/bin/opt+0x570bfbf)
#13 0x000055e5123ef72d llvm::detail::PassModel<llvm::Module, llvm::GlobalMergeFuncPass, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) PassBuilder.cpp:0:0
#14 0x000055e510a55477 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (build-all/bin/opt+0x462f477)
#15 0x000055e511b9f6f3 llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::PassPlugin>, llvm::ArrayRef<std::function<void (llvm::PassBuilder&)>>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool, bool, bool) (build-all/bin/opt+0x57796f3)
#16 0x000055e51081514b optMain (build-all/bin/opt+0x43ef14b)
#17 0x00007f1cd8bebd85 __libc_start_main (/lib64/libc.so.6+0x3ad85)
#18 0x000055e51080ef6e _start (build-all/bin/opt+0x43e8f6e)
Abort (core dumped)

kyulee-com added a commit to kyulee-com/llvm-project that referenced this pull request Nov 14, 2024
@kyulee-com
Copy link
Contributor Author

Hi,

I've no idea if this is expected but I noticed that if I run opt -passes=global-merge-func gmf.ll -o /dev/null on gmf.ll being

@0 = global { { ptr, i32, i32 } } { { ptr, i32, i32 } { ptr null, i32 19, i32 5 } }
@1 = global { { ptr, i32, i32 } } { { ptr, i32, i32 } { ptr null, i32 22, i32 5 } }

it crashes like

Thank you for testing this! I can't reproduce the issue on my end, but I suspect it might be due to the module summary being run twice, even though it's optional.
Could you please try out #116241 again?

kyulee-com added a commit to kyulee-com/llvm-project that referenced this pull request Nov 14, 2024
kyulee-com added a commit to kyulee-com/llvm-project that referenced this pull request Nov 14, 2024
This is a follow-up PR to refactor the initial global merge function
pass implemented in llvm#112671.

It first collects stable functions relevant to the current module and
iterates over those only, instead of iterating through all stable
functions in the stable function map.

This is a patch for
https://discourse.llvm.org/t/rfc-global-function-merging/82608.
kyulee-com added a commit to kyulee-com/llvm-project that referenced this pull request Nov 14, 2024
kyulee-com added a commit to kyulee-com/llvm-project that referenced this pull request Nov 14, 2024
This is a follow-up PR to refactor the initial global merge function
pass implemented in llvm#112671.

It first collects stable functions relevant to the current module and
iterates over those only, instead of iterating through all stable
functions in the stable function map.

This is a patch for
https://discourse.llvm.org/t/rfc-global-function-merging/82608.
kyulee-com added a commit that referenced this pull request Nov 14, 2024
This is a follow-up PR to refactor the initial global merge function
pass implemented in #112671.

It first collects stable functions relevant to the current module and
iterates over those only, instead of iterating through all stable
functions in the stable function map.

This is a patch for
https://discourse.llvm.org/t/rfc-global-function-merging/82608.
@mikaelholmen
Copy link
Collaborator

Hi,
I've no idea if this is expected but I noticed that if I run opt -passes=global-merge-func gmf.ll -o /dev/null on gmf.ll being

@0 = global { { ptr, i32, i32 } } { { ptr, i32, i32 } { ptr null, i32 19, i32 5 } }
@1 = global { { ptr, i32, i32 } } { { ptr, i32, i32 } { ptr null, i32 22, i32 5 } }

it crashes like

Thank you for testing this! I can't reproduce the issue on my end, but I suspect it might be due to the module summary being run twice, even though it's optional. Could you please try out #116241 again?

Weird if you can't reproduce it. I did again at 878b03e. Asserts must be on.
Anyway I tried #116241 and with that it doesn't crash for me.

@kyulee-com
Copy link
Contributor Author

Weird if you can't reproduce it. I did again at 878b03e. Asserts must be on. Anyway I tried #116241 and with that it doesn't crash for me.

Yeah. I forgot enabling asserts. Now I can see an assertion. I think #116241 should land it for this fix. Can you review it?

@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 15, 2024

LLVM Buildbot has detected a new failure on builder sanitizer-aarch64-linux-bootstrap-asan running on sanitizer-buildbot7 while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/24/builds/2657

Here is the relevant piece of the build log for the reference
Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld.lld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 83743 of 83744 tests, 48 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
FAIL: lld :: ELF/lto/unified-lto.ll (82543 of 83743)
******************** TEST 'lld :: ELF/lto/unified-lto.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/opt -thinlto-bc -thinlto-split-lto-unit -unified-lto /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/lld/test/ELF/lto/unified-lto.ll -o /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0.o
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/opt -thinlto-bc -thinlto-split-lto-unit -unified-lto /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/lld/test/ELF/lto/unified-lto.ll -o /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0.o
RUN: at line 3: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/opt -thinlto-bc -thinlto-split-lto-unit /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/lld/test/ELF/lto/unified-lto.ll -o /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp1.o
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/opt -thinlto-bc -thinlto-split-lto-unit /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/lld/test/ELF/lto/unified-lto.ll -o /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp1.o
RUN: at line 4: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld --lto=full /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0.o -o /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld --lto=full /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0.o -o /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0
RUN: at line 5: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-readelf -s /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0 | /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/lld/test/ELF/lto/unified-lto.ll --check-prefix=FULL
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-readelf -s /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/lld/test/ELF/lto/unified-lto.ll --check-prefix=FULL
RUN: at line 6: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld --lto=thin /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0.o -o /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld --lto=thin /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0.o -o /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0
RUN: at line 7: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-readelf -s /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0 | /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/lld/test/ELF/lto/unified-lto.ll --check-prefix=THIN
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-readelf -s /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/lld/test/ELF/lto/unified-lto.ll --check-prefix=THIN
RUN: at line 8: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld --lto=default /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0.o -o /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld --lto=default /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0.o -o /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0
RUN: at line 9: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-readelf -s /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0 | /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/lld/test/ELF/lto/unified-lto.ll --check-prefix=THIN
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-readelf -s /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/lld/test/ELF/lto/unified-lto.ll --check-prefix=THIN
RUN: at line 10: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld --lto=default /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp1.o -o /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp1
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld --lto=default /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp1.o -o /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp1
RUN: at line 11: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-readelf -s /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp1 | /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/lld/test/ELF/lto/unified-lto.ll --check-prefix=THIN
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/lld/test/ELF/lto/unified-lto.ll --check-prefix=THIN
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-readelf -s /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp1
RUN: at line 12: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0.o -o /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0 2>&1 | /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/count 0
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0.o -o /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/count 0
Expected 0 lines, got 1.

--

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
Step 10 (stage2/asan check) failure: stage2/asan check (failure)
...
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld.lld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 83743 of 83744 tests, 48 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
FAIL: lld :: ELF/lto/unified-lto.ll (82543 of 83743)
******************** TEST 'lld :: ELF/lto/unified-lto.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/opt -thinlto-bc -thinlto-split-lto-unit -unified-lto /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/lld/test/ELF/lto/unified-lto.ll -o /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0.o
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/opt -thinlto-bc -thinlto-split-lto-unit -unified-lto /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/lld/test/ELF/lto/unified-lto.ll -o /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0.o
RUN: at line 3: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/opt -thinlto-bc -thinlto-split-lto-unit /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/lld/test/ELF/lto/unified-lto.ll -o /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp1.o
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/opt -thinlto-bc -thinlto-split-lto-unit /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/lld/test/ELF/lto/unified-lto.ll -o /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp1.o
RUN: at line 4: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld --lto=full /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0.o -o /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld --lto=full /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0.o -o /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0
RUN: at line 5: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-readelf -s /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0 | /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/lld/test/ELF/lto/unified-lto.ll --check-prefix=FULL
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-readelf -s /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/lld/test/ELF/lto/unified-lto.ll --check-prefix=FULL
RUN: at line 6: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld --lto=thin /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0.o -o /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld --lto=thin /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0.o -o /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0
RUN: at line 7: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-readelf -s /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0 | /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/lld/test/ELF/lto/unified-lto.ll --check-prefix=THIN
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-readelf -s /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/lld/test/ELF/lto/unified-lto.ll --check-prefix=THIN
RUN: at line 8: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld --lto=default /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0.o -o /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld --lto=default /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0.o -o /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0
RUN: at line 9: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-readelf -s /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0 | /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/lld/test/ELF/lto/unified-lto.ll --check-prefix=THIN
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-readelf -s /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/lld/test/ELF/lto/unified-lto.ll --check-prefix=THIN
RUN: at line 10: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld --lto=default /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp1.o -o /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp1
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld --lto=default /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp1.o -o /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp1
RUN: at line 11: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-readelf -s /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp1 | /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/lld/test/ELF/lto/unified-lto.ll --check-prefix=THIN
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/lld/test/ELF/lto/unified-lto.ll --check-prefix=THIN
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-readelf -s /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp1
RUN: at line 12: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0.o -o /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0 2>&1 | /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/count 0
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0.o -o /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/lto/Output/unified-lto.ll.tmp0
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/count 0
Expected 0 lines, got 1.

--

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 

kyulee-com added a commit to kyulee-com/llvm-project that referenced this pull request Nov 15, 2024
kyulee-com added a commit to kyulee-com/llvm-project that referenced this pull request Nov 15, 2024
kyulee-com added a commit to kyulee-com/llvm-project that referenced this pull request Nov 15, 2024
kyulee-com added a commit to kyulee-com/llvm-project that referenced this pull request Nov 15, 2024
kyulee-com added a commit that referenced this pull request Nov 15, 2024
Module summary index is optional for this pass, and we shouldn't run it,
but import it as necessary.
Copy link
Contributor

@nocchijiang nocchijiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for some late comments, since I didn't have free time actually debugging around the changes.

ParamLocsVecTy ParamLocsVec;
for (auto &[HashSeq, Locs] : HashSeqToLocs) {
ParamLocsVec.push_back(std::move(Locs));
llvm::sort(ParamLocsVec, [&](const ParamLocs &L, const ParamLocs &R) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move the sort after the for loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! Apparently I mis-placed this code inside the loop while I was refactoring this code.

if (InstCount < GlobalMergingMinInstrs)
return false;

unsigned ParamCount = SFS[0]->IndexOperandHashMap->size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of unique values of SFS[0]->IndexOperandHashMap is more accurate, if a given constant is used multiple times in the stable function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of unique values can be different depending on a stable function.
Anyhow, I consider this count as a parameter count, and add them up to compute the total cost.


unsigned Benefit = InstCount * (StableFunctionCount - 1);
unsigned Cost =
(GlobalMergingParamOverhead * ParamCount + GlobalMergingCallOverhead) *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GlobalMergingParamOverhead could potentially be fine grained for different kinds of constant. I ran some tests from Swift repo to test the compatibility between the Apple implementation and this one, and found that some tested functions were not merged because it overestimated the cost for some parameters of small scalar type. This should not make a noticeable difference in production, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also FYI the implementation that comes with this PR is not passing all the existing Swift repo tests despite lowering GlobalMergingParamOverhead to 0 to force merges to happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it overestimated the cost for some parameters of small scalar type.

It seems we need to incorporate some type information to accurately reflect the precise cost, in theory. Since the profit model is currently computed offline, I've refined the parameter count for greater precision, as mentioned above

FYI the implementation that comes with this PR is not passing all the existing Swift repo tests despite lowering GlobalMergingParamOverhead to 0 to force merges to happen.

Despite some differences in the underlying assumptions, I'm curious about what the existing Swift merge can accomplish that this new pass cannot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about what the existing Swift merge can accomplish that this new pass cannot.

These are the behavioral differences I have found by running merge_func*.ll from Swift repo, some of which I believe we should consider implementing as well:

Test Differences Remark
merge_func_preserves_vfe.ll merge_candidate_c merged or not The new pass seems unaware of metadata differences.
merge_func_ptrauth.ll presence of ptrauth info I believe the new pass does not implement ptrauth for simplicity.
merge_func_return_type_cast.ll return_0 merged with return_null or not The 2 functions have different stable hash values.
merge_func.ll func*_merged_with* merging strategy Apple implementation tends to avoid too many parameters.
merge_func.ll caller*_* merging strategy Apple implementation does not parameterize the calls on the call chain.
merge_func.ll first merged with second or not Apple implementation is aware of the order of incoming blocks to a phi.
merge_func.ll not_really_recursive eliminated or not Apple implementation replaces the call to the original function with the merged one.

if (StableFunctionCount < GlobalMergingMinMerges)
return false;

unsigned InstCount = SFS[0]->InstCount;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of IR cannot precisely reflect the actual number of machine instructions (often the latter is larger for AArch64. Take the access of a global value for example, it will be expanded into an ADRP pair for small code model), which results in Benefit underestimated and some profitable merging opportunities dropped. I am not sure if there is existing code that could be reused to better estimate the machine instruction count, but at least we may introduce a multiplier on InstCount for fine-tuning of the behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added -global-merging-inst-overhead to tune this parameter.


// Convert the operand map to a vector for a serialization-friendly
// format.
IndexOperandHashVecType IndexOperandHashes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we bail out for functions without parameterizable constants here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've found not skipping no parameter case is beneficial. Anyhow, I now introduced -global-merging-skip-no-params to control this behavior.

@kyulee-com
Copy link
Contributor Author

Sorry for some late comments, since I didn't have free time actually debugging around the changes.

Thank you for testing and reporting these issues!
I now incorporated them into #116548.

akshayrdeodhar pushed a commit to akshayrdeodhar/llvm-project that referenced this pull request Nov 18, 2024
This implements a global function merging pass. Unlike traditional
function merging passes that use IR comparators, this pass employs a
structurally stable hash to identify similar functions while ignoring
certain constant operands. These ignored constants are tracked and
encoded into a stable function summary. When merging, instead of
explicitly folding similar functions and their call sites, we form a
merging instance by supplying different parameters via thunks. The
actual size reduction occurs when identically created merging instances
are folded by the linker.

Currently, this pass is wired to a pre-codegen pass, enabled by the
`-enable-global-merge-func` flag.
In a local merging mode, the analysis and merging steps occur
sequentially within a module:
- `analyze`: Collects stable function hashes and tracks locations of
ignored constant operands.
- `finalize`: Identifies merge candidates with matching hashes and
computes the set of parameters that point to different constants.
- `merge`: Uses the stable function map to optimistically create a
merged function.

We can enable a global merging mode similar to the global function
outliner
(https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753/),
which will perform the above steps separately.
- `-codegen-data-generate`: During the first round of code generation,
we analyze local merging instances and publish their summaries.
- Offline using `llvm-cgdata` or at link-time, we can finalize all these
merging summaries that are combined to determine parameters.
- `-codegen-data-use`: During the second round of code generation, we
optimistically create merging instances within each module, and finally,
the linker folds identically created merging instances.

Depends on llvm#112664
This is a patch for
https://discourse.llvm.org/t/rfc-global-function-merging/82608.
akshayrdeodhar pushed a commit to akshayrdeodhar/llvm-project that referenced this pull request Nov 18, 2024
This is a follow-up PR to refactor the initial global merge function
pass implemented in llvm#112671.

It first collects stable functions relevant to the current module and
iterates over those only, instead of iterating through all stable
functions in the stable function map.

This is a patch for
https://discourse.llvm.org/t/rfc-global-function-merging/82608.
akshayrdeodhar pushed a commit to akshayrdeodhar/llvm-project that referenced this pull request Nov 18, 2024
This is a follow-up PR to refactor the initial global merge function
pass implemented in llvm#112671.

It first collects stable functions relevant to the current module and
iterates over those only, instead of iterating through all stable
functions in the stable function map.

This is a patch for
https://discourse.llvm.org/t/rfc-global-function-merging/82608.
akshayrdeodhar pushed a commit to akshayrdeodhar/llvm-project that referenced this pull request Nov 18, 2024
…6241)

Module summary index is optional for this pass, and we shouldn't run it,
but import it as necessary.
kyulee-com added a commit that referenced this pull request Nov 25, 2024
This update follows up on change #112671 and is mostly a NFC, with the following exceptions:
  - Introduced `-global-merging-skip-no-params` to bypass merging when no parameters are required.
  - Parameter count is now calculated based on the unique hash count.
  - Added `-global-merging-inst-overhead` to adjust the instruction overhead, reflecting the machine instruction size.
  - Costs and benefits are now computed using the double data type. Since the finalization process occurs offline, this should not significantly impact build time.
  - Moved a sorting operation outside of the loop.

This is a patch for
https://discourse.llvm.org/t/rfc-global-function-merging/82608.
kyulee-com added a commit that referenced this pull request Nov 25, 2024
This update follows up on change #112671 and is mostly a NFC, with the following exceptions:
  - Introduced `-global-merging-skip-no-params` to bypass merging when no parameters are required.
  - Parameter count is now calculated based on the unique hash count.
  - Added `-global-merging-inst-overhead` to adjust the instruction overhead, reflecting the machine instruction size.
  - Costs and benefits are now computed using the double data type. Since the finalization process occurs offline, this should not significantly impact build time.
  - Moved a sorting operation outside of the loop.

This is a patch for
https://discourse.llvm.org/t/rfc-global-function-merging/82608.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:transforms LTO Link time optimization (regular/full LTO or ThinLTO)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants