Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[StructuralHash] Refactor #112621

Merged
merged 4 commits into from
Oct 26, 2024
Merged

[StructuralHash] Refactor #112621

merged 4 commits into from
Oct 26, 2024

Conversation

kyulee-com
Copy link
Contributor

@kyulee-com kyulee-com commented Oct 16, 2024

This is largely NFC, and it prepares for #112638.

  • Use stable_hash instead of uint64_t
  • Rename update* to hash* functions. They compute stable_hash locally and return it.

This is a patch for https://discourse.llvm.org/t/rfc-global-function-merging/82608.

 - Use stable_hash instead of uint64_t
 - Rename update* to hash* functions. They compute stable_hash locally
   and return it.
Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the use of stable_hash? This hash is not expected to be stable, e.g. because we may start hashing additional things into it.

@kyulee-com
Copy link
Contributor Author

Why the use of stable_hash? This hash is not expected to be stable, e.g. because we may start hashing additional things into it.

As posted in https://discourse.llvm.org/t/rfc-global-function-merging/82608, I plan to make this structural hash stable (in the following PR). If it is preferable to maintain the existing structural hash implementation as is, I am open to creating a separate data structure or extending it through a type hierarchy.

@kyulee-com kyulee-com marked this pull request as ready for review October 18, 2024 04:43
@llvmbot
Copy link
Member

llvmbot commented Oct 18, 2024

@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-llvm-transforms

Author: Kyungwoo Lee (kyulee-com)

Changes
  • Use stable_hash instead of uint64_t
  • Rename update* to hash* functions. They compute stable_hash locally and return it.

This is a patch for https://discourse.llvm.org/t/rfc-global-function-merging/82608.


Full diff: https://github.com/llvm/llvm-project/pull/112621.diff

3 Files Affected:

  • (modified) llvm/include/llvm/IR/StructuralHash.h (+2-1)
  • (modified) llvm/lib/IR/StructuralHash.cpp (+78-43)
  • (modified) llvm/test/Transforms/MergeFunc/call-and-invoke-with-ranges.ll (+8-9)
diff --git a/llvm/include/llvm/IR/StructuralHash.h b/llvm/include/llvm/IR/StructuralHash.h
index 57fb45db849110..aa292bc3446799 100644
--- a/llvm/include/llvm/IR/StructuralHash.h
+++ b/llvm/include/llvm/IR/StructuralHash.h
@@ -14,6 +14,7 @@
 #ifndef LLVM_IR_STRUCTURALHASH_H
 #define LLVM_IR_STRUCTURALHASH_H
 
+#include "llvm/ADT/StableHashing.h"
 #include <cstdint>
 
 namespace llvm {
@@ -21,7 +22,7 @@ namespace llvm {
 class Function;
 class Module;
 
-using IRHash = uint64_t;
+using IRHash = stable_hash;
 
 /// Returns a hash of the function \p F.
 /// \param F The function to hash.
diff --git a/llvm/lib/IR/StructuralHash.cpp b/llvm/lib/IR/StructuralHash.cpp
index fb4f33a021a96b..a1fabab77d52b2 100644
--- a/llvm/lib/IR/StructuralHash.cpp
+++ b/llvm/lib/IR/StructuralHash.cpp
@@ -24,61 +24,86 @@ namespace {
 // by the MergeFunctions pass.
 
 class StructuralHashImpl {
-  uint64_t Hash = 4;
+  stable_hash Hash = 4;
 
-  void hash(uint64_t V) { Hash = hashing::detail::hash_16_bytes(Hash, V); }
+  bool DetailedHash;
 
   // This will produce different values on 32-bit and 64-bit systens as
   // hash_combine returns a size_t. However, this is only used for
   // detailed hashing which, in-tree, only needs to distinguish between
   // differences in functions.
-  template <typename T> void hashArbitaryType(const T &V) {
-    hash(hash_combine(V));
+  // TODO: This is not stable.
+  template <typename T> stable_hash hashArbitaryType(const T &V) {
+    return hash_combine(V);
   }
 
-  void hashType(Type *ValueType) {
-    hash(ValueType->getTypeID());
+  stable_hash hashType(Type *ValueType) {
+    SmallVector<stable_hash> Hashes;
+    Hashes.emplace_back(ValueType->getTypeID());
     if (ValueType->isIntegerTy())
-      hash(ValueType->getIntegerBitWidth());
+      Hashes.emplace_back(ValueType->getIntegerBitWidth());
+    return stable_hash_combine(Hashes);
   }
 
 public:
-  StructuralHashImpl() = default;
-
-  void updateOperand(Value *Operand) {
-    hashType(Operand->getType());
-
-    // The cases enumerated below are not exhaustive and are only aimed to
-    // get decent coverage over the function.
-    if (ConstantInt *ConstInt = dyn_cast<ConstantInt>(Operand)) {
-      hashArbitaryType(ConstInt->getValue());
-    } else if (ConstantFP *ConstFP = dyn_cast<ConstantFP>(Operand)) {
-      hashArbitaryType(ConstFP->getValue());
-    } else if (Argument *Arg = dyn_cast<Argument>(Operand)) {
-      hash(Arg->getArgNo());
-    } else if (Function *Func = dyn_cast<Function>(Operand)) {
+  StructuralHashImpl() = delete;
+  explicit StructuralHashImpl(bool DetailedHash) : DetailedHash(DetailedHash) {}
+
+  stable_hash hashConstant(Constant *C) {
+    SmallVector<stable_hash> Hashes;
+    // TODO: hashArbitaryType() is not stable.
+    if (ConstantInt *ConstInt = dyn_cast<ConstantInt>(C)) {
+      Hashes.emplace_back(hashArbitaryType(ConstInt->getValue()));
+    } else if (ConstantFP *ConstFP = dyn_cast<ConstantFP>(C)) {
+      Hashes.emplace_back(hashArbitaryType(ConstFP->getValue()));
+    } else if (Function *Func = dyn_cast<Function>(C))
       // Hashing the name will be deterministic as LLVM's hashing infrastructure
       // has explicit support for hashing strings and will not simply hash
       // the pointer.
-      hashArbitaryType(Func->getName());
-    }
+      Hashes.emplace_back(hashArbitaryType(Func->getName()));
+
+    return stable_hash_combine(Hashes);
+  }
+
+  stable_hash hashValue(Value *V) {
+    // Check constant and return its hash.
+    Constant *C = dyn_cast<Constant>(V);
+    if (C)
+      return hashConstant(C);
+
+    // Hash argument number.
+    SmallVector<stable_hash> Hashes;
+    if (Argument *Arg = dyn_cast<Argument>(V))
+      Hashes.emplace_back(Arg->getArgNo());
+
+    return stable_hash_combine(Hashes);
   }
 
-  void updateInstruction(const Instruction &Inst, bool DetailedHash) {
-    hash(Inst.getOpcode());
+  stable_hash hashOperand(Value *Operand) {
+    SmallVector<stable_hash> Hashes;
+    Hashes.emplace_back(hashType(Operand->getType()));
+    Hashes.emplace_back(hashValue(Operand));
+    return stable_hash_combine(Hashes);
+  }
+
+  stable_hash hashInstruction(const Instruction &Inst) {
+    SmallVector<stable_hash> Hashes;
+    Hashes.emplace_back(Inst.getOpcode());
 
     if (!DetailedHash)
-      return;
+      return stable_hash_combine(Hashes);
 
-    hashType(Inst.getType());
+    Hashes.emplace_back(hashType(Inst.getType()));
 
     // Handle additional properties of specific instructions that cause
     // semantic differences in the IR.
     if (const auto *ComparisonInstruction = dyn_cast<CmpInst>(&Inst))
-      hash(ComparisonInstruction->getPredicate());
+      Hashes.emplace_back(ComparisonInstruction->getPredicate());
 
     for (const auto &Op : Inst.operands())
-      updateOperand(Op);
+      Hashes.emplace_back(hashOperand(Op));
+
+    return stable_hash_combine(Hashes);
   }
 
   // A function hash is calculated by considering only the number of arguments
@@ -97,15 +122,17 @@ class StructuralHashImpl {
   // expensive checks for pass modification status). When modifying this
   // function, most changes should be gated behind an option and enabled
   // selectively.
-  void update(const Function &F, bool DetailedHash) {
+  void update(const Function &F) {
     // Declarations don't affect analyses.
     if (F.isDeclaration())
       return;
 
-    hash(0x62642d6b6b2d6b72); // Function header
+    SmallVector<stable_hash> Hashes;
+    Hashes.emplace_back(Hash);
+    Hashes.emplace_back(0x62642d6b6b2d6b72); // Function header
 
-    hash(F.isVarArg());
-    hash(F.arg_size());
+    Hashes.emplace_back(F.isVarArg());
+    Hashes.emplace_back(F.arg_size());
 
     SmallVector<const BasicBlock *, 8> BBs;
     SmallPtrSet<const BasicBlock *, 16> VisitedBBs;
@@ -121,14 +148,17 @@ class StructuralHashImpl {
       // This random value acts as a block header, as otherwise the partition of
       // opcodes into BBs wouldn't affect the hash, only the order of the
       // opcodes
-      hash(45798);
+      Hashes.emplace_back(45798);
       for (auto &Inst : *BB)
-        updateInstruction(Inst, DetailedHash);
+        Hashes.emplace_back(hashInstruction(Inst));
 
       for (const BasicBlock *Succ : successors(BB))
         if (VisitedBBs.insert(Succ).second)
           BBs.push_back(Succ);
     }
+
+    // Update the combined hash in place.
+    Hash = stable_hash_combine(Hashes);
   }
 
   void update(const GlobalVariable &GV) {
@@ -137,15 +167,20 @@ class StructuralHashImpl {
     // we ignore anything with the `.llvm` prefix
     if (GV.isDeclaration() || GV.getName().starts_with("llvm."))
       return;
-    hash(23456); // Global header
-    hash(GV.getValueType()->getTypeID());
+    SmallVector<stable_hash> Hashes;
+    Hashes.emplace_back(Hash);
+    Hashes.emplace_back(23456); // Global header
+    Hashes.emplace_back(GV.getValueType()->getTypeID());
+
+    // Update the combined hash in place.
+    Hash = stable_hash_combine(Hashes);
   }
 
-  void update(const Module &M, bool DetailedHash) {
+  void update(const Module &M) {
     for (const GlobalVariable &GV : M.globals())
       update(GV);
     for (const Function &F : M)
-      update(F, DetailedHash);
+      update(F);
   }
 
   uint64_t getHash() const { return Hash; }
@@ -154,13 +189,13 @@ class StructuralHashImpl {
 } // namespace
 
 IRHash llvm::StructuralHash(const Function &F, bool DetailedHash) {
-  StructuralHashImpl H;
-  H.update(F, DetailedHash);
+  StructuralHashImpl H(DetailedHash);
+  H.update(F);
   return H.getHash();
 }
 
 IRHash llvm::StructuralHash(const Module &M, bool DetailedHash) {
-  StructuralHashImpl H;
-  H.update(M, DetailedHash);
+  StructuralHashImpl H(DetailedHash);
+  H.update(M);
   return H.getHash();
 }
diff --git a/llvm/test/Transforms/MergeFunc/call-and-invoke-with-ranges.ll b/llvm/test/Transforms/MergeFunc/call-and-invoke-with-ranges.ll
index e7718ca84d3165..0ceb363a67b1fa 100644
--- a/llvm/test/Transforms/MergeFunc/call-and-invoke-with-ranges.ll
+++ b/llvm/test/Transforms/MergeFunc/call-and-invoke-with-ranges.ll
@@ -63,6 +63,14 @@ lpad:
   resume { ptr, i32 } zeroinitializer
 }
 
+define i8 @call_with_same_range() {
+; CHECK-LABEL: @call_with_same_range
+; CHECK: tail call i8 @call_with_range
+  bitcast i8 0 to i8
+  %out = call i8 @dummy(), !range !0
+  ret i8 %out
+}
+
 define i8 @invoke_with_same_range() personality ptr undef {
 ; CHECK-LABEL: @invoke_with_same_range()
 ; CHECK: tail call i8 @invoke_with_range()
@@ -76,15 +84,6 @@ lpad:
   resume { ptr, i32 } zeroinitializer
 }
 
-define i8 @call_with_same_range() {
-; CHECK-LABEL: @call_with_same_range
-; CHECK: tail call i8 @call_with_range
-  bitcast i8 0 to i8
-  %out = call i8 @dummy(), !range !0
-  ret i8 %out
-}
-
-
 declare i8 @dummy();
 declare i32 @__gxx_personality_v0(...)
 

Comment on lines +66 to +73
define i8 @call_with_same_range() {
; CHECK-LABEL: @call_with_same_range
; CHECK: tail call i8 @call_with_range
bitcast i8 0 to i8
%out = call i8 @dummy(), !range !0
ret i8 %out
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason we need to change the function order in this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the existing function merger pass processes functions based on hash values, which slightly changes the order of merged functions -- https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/IPO/MergeFunctions.cpp#L424-L433.
We could use CHECK-DAG for all lines, but that seems too loose. Instead, I adjusted the order of CHECK to match the updated behavior.

llvm/lib/IR/StructuralHash.cpp Outdated Show resolved Hide resolved
llvm/lib/IR/StructuralHash.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@boomanaiden154 boomanaiden154 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looking good.

I also remember running into issues where there would be different hashes between 32-bit and 64-bit x86. Not sure if that's an issue anymore given everything is uint64_t, but might be worth testing that the merge functions tests pass.

llvm/include/llvm/IR/StructuralHash.h Outdated Show resolved Hide resolved
Copy link
Contributor

@boomanaiden154 boomanaiden154 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@kyulee-com kyulee-com merged commit b667d16 into main Oct 26, 2024
8 checks passed
@kyulee-com kyulee-com deleted the users/kyulee-com/strhash-nfc branch October 26, 2024 16:20
@llvm-ci
Copy link
Collaborator

llvm-ci commented Oct 26, 2024

LLVM Buildbot has detected a new failure on builder clang-solaris11-sparcv9 running on solaris11-sparcv9 while building llvm at step 5 "ninja check 1".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/13/builds/3149

Here is the relevant piece of the build log for the reference
Step 5 (ninja check 1) failure: stage 1 checked (failure)
******************** TEST 'LLVM :: Transforms/MergeFunc/call-and-invoke-with-ranges-attr.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 1: /opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/stage1/bin/opt -passes=mergefunc -S < /opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/llvm/llvm/test/Transforms/MergeFunc/call-and-invoke-with-ranges-attr.ll | /opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/stage1/bin/FileCheck /opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/llvm/llvm/test/Transforms/MergeFunc/call-and-invoke-with-ranges-attr.ll
+ /opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/stage1/bin/opt -passes=mergefunc -S
+ /opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/stage1/bin/FileCheck /opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/llvm/llvm/test/Transforms/MergeFunc/call-and-invoke-with-ranges-attr.ll
/opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/llvm/llvm/test/Transforms/MergeFunc/call-and-invoke-with-ranges-attr.ll:96:16: error: CHECK-LABEL: expected string not found in input
; CHECK-LABEL: @call_with_same_range
               ^
<stdin>:89:36: note: scanning from here
define i8 @invoke_with_same_range() personality ptr undef {
                                   ^
<stdin>:90:22: note: possible intended match here
 %1 = tail call i8 @invoke_with_range()
                     ^

Input file: <stdin>
Check file: /opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/llvm/llvm/test/Transforms/MergeFunc/call-and-invoke-with-ranges-attr.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            .
            .
            .
           84: define i8 @call_with_same_range() { 
           85:  %1 = tail call i8 @call_with_range() 
           86:  ret i8 %1 
           87: } 
           88:  
           89: define i8 @invoke_with_same_range() personality ptr undef { 
label:96'0                                        X~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
           90:  %1 = tail call i8 @invoke_with_range() 
label:96'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
label:96'1                          ?                   possible intended match
           91:  ret i8 %1 
label:96'0     ~~~~~~~~~~~
           92: } 
label:96'0     ~~
>>>>>>

--

********************


kyulee-com added a commit that referenced this pull request Oct 26, 2024
kyulee-com added a commit that referenced this pull request Oct 26, 2024
This is largely NFC, and it prepares for #112638.
 - Use stable_hash instead of uint64_t
 - Rename update* to hash* functions. They compute stable_hash locally and return it.

This is a patch for
https://discourse.llvm.org/t/rfc-global-function-merging/82608.
kyulee-com added a commit that referenced this pull request Oct 26, 2024
kyulee-com added a commit that referenced this pull request Oct 26, 2024
This is largely NFC, and it prepares for #112638.
 - Use stable_hash instead of uint64_t
 - Rename update* to hash* functions. They compute stable_hash locally and return it.

This is a patch for
https://discourse.llvm.org/t/rfc-global-function-merging/82608.
kyulee-com added a commit that referenced this pull request Oct 27, 2024
This computes a structural hash while allowing for selective ignoring of
certain operands based on a custom function that is provided. Instead of
a single hash value, it now returns FunctionHashInfo which includes a
hash value, an instruction mapping, and a map to track the operand
location and its corresponding hash value that is ignored.

Depends on #112621.
This is a patch for
https://discourse.llvm.org/t/rfc-global-function-merging/82608.
NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this pull request Nov 4, 2024
This is largely NFC, and it prepares for llvm#112638.
 - Use stable_hash instead of uint64_t
- Rename update* to hash* functions. They compute stable_hash locally and return it.
 
This is a patch for
https://discourse.llvm.org/t/rfc-global-function-merging/82608.
NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this pull request Nov 4, 2024
NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this pull request Nov 4, 2024
This is largely NFC, and it prepares for llvm#112638.
 - Use stable_hash instead of uint64_t
 - Rename update* to hash* functions. They compute stable_hash locally and return it.

This is a patch for
https://discourse.llvm.org/t/rfc-global-function-merging/82608.
NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this pull request Nov 4, 2024
NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this pull request Nov 4, 2024
This is largely NFC, and it prepares for llvm#112638.
 - Use stable_hash instead of uint64_t
 - Rename update* to hash* functions. They compute stable_hash locally and return it.

This is a patch for
https://discourse.llvm.org/t/rfc-global-function-merging/82608.
NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this pull request Nov 4, 2024
This computes a structural hash while allowing for selective ignoring of
certain operands based on a custom function that is provided. Instead of
a single hash value, it now returns FunctionHashInfo which includes a
hash value, an instruction mapping, and a map to track the operand
location and its corresponding hash value that is ignored.

Depends on llvm#112621.
This is a patch for
https://discourse.llvm.org/t/rfc-global-function-merging/82608.
@nikic
Copy link
Contributor

nikic commented Nov 5, 2024

Why the use of stable_hash? This hash is not expected to be stable, e.g. because we may start hashing additional things into it.

As posted in https://discourse.llvm.org/t/rfc-global-function-merging/82608, I plan to make this structural hash stable (in the following PR). If it is preferable to maintain the existing structural hash implementation as is, I am open to creating a separate data structure or extending it through a type hierarchy.

Just to check, how "stable" is stable here? Is this just stable for a given LLVM build, or will it have to be stable across versions?

@kyulee-com
Copy link
Contributor Author

Why the use of stable_hash? This hash is not expected to be stable, e.g. because we may start hashing additional things into it.

As posted in https://discourse.llvm.org/t/rfc-global-function-merging/82608, I plan to make this structural hash stable (in the following PR). If it is preferable to maintain the existing structural hash implementation as is, I am open to creating a separate data structure or extending it through a type hierarchy.

Just to check, how "stable" is stable here? Is this just stable for a given LLVM build, or will it have to be stable across versions?

Here "stable" refers to a specific LLVM version. We previously discussed this in #105849 (review). Consequently, I've updated the comment in the header of StableHashing.h -- https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/ADT/StableHashing.h#L10C1-L14C9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants