-
Notifications
You must be signed in to change notification settings - Fork 12.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[StructuralHash] Refactor #112621
[StructuralHash] Refactor #112621
Conversation
- Use stable_hash instead of uint64_t - Rename update* to hash* functions. They compute stable_hash locally and return it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the use of stable_hash? This hash is not expected to be stable, e.g. because we may start hashing additional things into it.
As posted in https://discourse.llvm.org/t/rfc-global-function-merging/82608, I plan to make this structural hash stable (in the following PR). If it is preferable to maintain the existing structural hash implementation as is, I am open to creating a separate data structure or extending it through a type hierarchy. |
@llvm/pr-subscribers-llvm-ir @llvm/pr-subscribers-llvm-transforms Author: Kyungwoo Lee (kyulee-com) Changes
This is a patch for https://discourse.llvm.org/t/rfc-global-function-merging/82608. Full diff: https://github.com/llvm/llvm-project/pull/112621.diff 3 Files Affected:
diff --git a/llvm/include/llvm/IR/StructuralHash.h b/llvm/include/llvm/IR/StructuralHash.h
index 57fb45db849110..aa292bc3446799 100644
--- a/llvm/include/llvm/IR/StructuralHash.h
+++ b/llvm/include/llvm/IR/StructuralHash.h
@@ -14,6 +14,7 @@
#ifndef LLVM_IR_STRUCTURALHASH_H
#define LLVM_IR_STRUCTURALHASH_H
+#include "llvm/ADT/StableHashing.h"
#include <cstdint>
namespace llvm {
@@ -21,7 +22,7 @@ namespace llvm {
class Function;
class Module;
-using IRHash = uint64_t;
+using IRHash = stable_hash;
/// Returns a hash of the function \p F.
/// \param F The function to hash.
diff --git a/llvm/lib/IR/StructuralHash.cpp b/llvm/lib/IR/StructuralHash.cpp
index fb4f33a021a96b..a1fabab77d52b2 100644
--- a/llvm/lib/IR/StructuralHash.cpp
+++ b/llvm/lib/IR/StructuralHash.cpp
@@ -24,61 +24,86 @@ namespace {
// by the MergeFunctions pass.
class StructuralHashImpl {
- uint64_t Hash = 4;
+ stable_hash Hash = 4;
- void hash(uint64_t V) { Hash = hashing::detail::hash_16_bytes(Hash, V); }
+ bool DetailedHash;
// This will produce different values on 32-bit and 64-bit systens as
// hash_combine returns a size_t. However, this is only used for
// detailed hashing which, in-tree, only needs to distinguish between
// differences in functions.
- template <typename T> void hashArbitaryType(const T &V) {
- hash(hash_combine(V));
+ // TODO: This is not stable.
+ template <typename T> stable_hash hashArbitaryType(const T &V) {
+ return hash_combine(V);
}
- void hashType(Type *ValueType) {
- hash(ValueType->getTypeID());
+ stable_hash hashType(Type *ValueType) {
+ SmallVector<stable_hash> Hashes;
+ Hashes.emplace_back(ValueType->getTypeID());
if (ValueType->isIntegerTy())
- hash(ValueType->getIntegerBitWidth());
+ Hashes.emplace_back(ValueType->getIntegerBitWidth());
+ return stable_hash_combine(Hashes);
}
public:
- StructuralHashImpl() = default;
-
- void updateOperand(Value *Operand) {
- hashType(Operand->getType());
-
- // The cases enumerated below are not exhaustive and are only aimed to
- // get decent coverage over the function.
- if (ConstantInt *ConstInt = dyn_cast<ConstantInt>(Operand)) {
- hashArbitaryType(ConstInt->getValue());
- } else if (ConstantFP *ConstFP = dyn_cast<ConstantFP>(Operand)) {
- hashArbitaryType(ConstFP->getValue());
- } else if (Argument *Arg = dyn_cast<Argument>(Operand)) {
- hash(Arg->getArgNo());
- } else if (Function *Func = dyn_cast<Function>(Operand)) {
+ StructuralHashImpl() = delete;
+ explicit StructuralHashImpl(bool DetailedHash) : DetailedHash(DetailedHash) {}
+
+ stable_hash hashConstant(Constant *C) {
+ SmallVector<stable_hash> Hashes;
+ // TODO: hashArbitaryType() is not stable.
+ if (ConstantInt *ConstInt = dyn_cast<ConstantInt>(C)) {
+ Hashes.emplace_back(hashArbitaryType(ConstInt->getValue()));
+ } else if (ConstantFP *ConstFP = dyn_cast<ConstantFP>(C)) {
+ Hashes.emplace_back(hashArbitaryType(ConstFP->getValue()));
+ } else if (Function *Func = dyn_cast<Function>(C))
// Hashing the name will be deterministic as LLVM's hashing infrastructure
// has explicit support for hashing strings and will not simply hash
// the pointer.
- hashArbitaryType(Func->getName());
- }
+ Hashes.emplace_back(hashArbitaryType(Func->getName()));
+
+ return stable_hash_combine(Hashes);
+ }
+
+ stable_hash hashValue(Value *V) {
+ // Check constant and return its hash.
+ Constant *C = dyn_cast<Constant>(V);
+ if (C)
+ return hashConstant(C);
+
+ // Hash argument number.
+ SmallVector<stable_hash> Hashes;
+ if (Argument *Arg = dyn_cast<Argument>(V))
+ Hashes.emplace_back(Arg->getArgNo());
+
+ return stable_hash_combine(Hashes);
}
- void updateInstruction(const Instruction &Inst, bool DetailedHash) {
- hash(Inst.getOpcode());
+ stable_hash hashOperand(Value *Operand) {
+ SmallVector<stable_hash> Hashes;
+ Hashes.emplace_back(hashType(Operand->getType()));
+ Hashes.emplace_back(hashValue(Operand));
+ return stable_hash_combine(Hashes);
+ }
+
+ stable_hash hashInstruction(const Instruction &Inst) {
+ SmallVector<stable_hash> Hashes;
+ Hashes.emplace_back(Inst.getOpcode());
if (!DetailedHash)
- return;
+ return stable_hash_combine(Hashes);
- hashType(Inst.getType());
+ Hashes.emplace_back(hashType(Inst.getType()));
// Handle additional properties of specific instructions that cause
// semantic differences in the IR.
if (const auto *ComparisonInstruction = dyn_cast<CmpInst>(&Inst))
- hash(ComparisonInstruction->getPredicate());
+ Hashes.emplace_back(ComparisonInstruction->getPredicate());
for (const auto &Op : Inst.operands())
- updateOperand(Op);
+ Hashes.emplace_back(hashOperand(Op));
+
+ return stable_hash_combine(Hashes);
}
// A function hash is calculated by considering only the number of arguments
@@ -97,15 +122,17 @@ class StructuralHashImpl {
// expensive checks for pass modification status). When modifying this
// function, most changes should be gated behind an option and enabled
// selectively.
- void update(const Function &F, bool DetailedHash) {
+ void update(const Function &F) {
// Declarations don't affect analyses.
if (F.isDeclaration())
return;
- hash(0x62642d6b6b2d6b72); // Function header
+ SmallVector<stable_hash> Hashes;
+ Hashes.emplace_back(Hash);
+ Hashes.emplace_back(0x62642d6b6b2d6b72); // Function header
- hash(F.isVarArg());
- hash(F.arg_size());
+ Hashes.emplace_back(F.isVarArg());
+ Hashes.emplace_back(F.arg_size());
SmallVector<const BasicBlock *, 8> BBs;
SmallPtrSet<const BasicBlock *, 16> VisitedBBs;
@@ -121,14 +148,17 @@ class StructuralHashImpl {
// This random value acts as a block header, as otherwise the partition of
// opcodes into BBs wouldn't affect the hash, only the order of the
// opcodes
- hash(45798);
+ Hashes.emplace_back(45798);
for (auto &Inst : *BB)
- updateInstruction(Inst, DetailedHash);
+ Hashes.emplace_back(hashInstruction(Inst));
for (const BasicBlock *Succ : successors(BB))
if (VisitedBBs.insert(Succ).second)
BBs.push_back(Succ);
}
+
+ // Update the combined hash in place.
+ Hash = stable_hash_combine(Hashes);
}
void update(const GlobalVariable &GV) {
@@ -137,15 +167,20 @@ class StructuralHashImpl {
// we ignore anything with the `.llvm` prefix
if (GV.isDeclaration() || GV.getName().starts_with("llvm."))
return;
- hash(23456); // Global header
- hash(GV.getValueType()->getTypeID());
+ SmallVector<stable_hash> Hashes;
+ Hashes.emplace_back(Hash);
+ Hashes.emplace_back(23456); // Global header
+ Hashes.emplace_back(GV.getValueType()->getTypeID());
+
+ // Update the combined hash in place.
+ Hash = stable_hash_combine(Hashes);
}
- void update(const Module &M, bool DetailedHash) {
+ void update(const Module &M) {
for (const GlobalVariable &GV : M.globals())
update(GV);
for (const Function &F : M)
- update(F, DetailedHash);
+ update(F);
}
uint64_t getHash() const { return Hash; }
@@ -154,13 +189,13 @@ class StructuralHashImpl {
} // namespace
IRHash llvm::StructuralHash(const Function &F, bool DetailedHash) {
- StructuralHashImpl H;
- H.update(F, DetailedHash);
+ StructuralHashImpl H(DetailedHash);
+ H.update(F);
return H.getHash();
}
IRHash llvm::StructuralHash(const Module &M, bool DetailedHash) {
- StructuralHashImpl H;
- H.update(M, DetailedHash);
+ StructuralHashImpl H(DetailedHash);
+ H.update(M);
return H.getHash();
}
diff --git a/llvm/test/Transforms/MergeFunc/call-and-invoke-with-ranges.ll b/llvm/test/Transforms/MergeFunc/call-and-invoke-with-ranges.ll
index e7718ca84d3165..0ceb363a67b1fa 100644
--- a/llvm/test/Transforms/MergeFunc/call-and-invoke-with-ranges.ll
+++ b/llvm/test/Transforms/MergeFunc/call-and-invoke-with-ranges.ll
@@ -63,6 +63,14 @@ lpad:
resume { ptr, i32 } zeroinitializer
}
+define i8 @call_with_same_range() {
+; CHECK-LABEL: @call_with_same_range
+; CHECK: tail call i8 @call_with_range
+ bitcast i8 0 to i8
+ %out = call i8 @dummy(), !range !0
+ ret i8 %out
+}
+
define i8 @invoke_with_same_range() personality ptr undef {
; CHECK-LABEL: @invoke_with_same_range()
; CHECK: tail call i8 @invoke_with_range()
@@ -76,15 +84,6 @@ lpad:
resume { ptr, i32 } zeroinitializer
}
-define i8 @call_with_same_range() {
-; CHECK-LABEL: @call_with_same_range
-; CHECK: tail call i8 @call_with_range
- bitcast i8 0 to i8
- %out = call i8 @dummy(), !range !0
- ret i8 %out
-}
-
-
declare i8 @dummy();
declare i32 @__gxx_personality_v0(...)
|
define i8 @call_with_same_range() { | ||
; CHECK-LABEL: @call_with_same_range | ||
; CHECK: tail call i8 @call_with_range | ||
bitcast i8 0 to i8 | ||
%out = call i8 @dummy(), !range !0 | ||
ret i8 %out | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason we need to change the function order in this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the existing function merger pass processes functions based on hash values, which slightly changes the order of merged functions -- https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/IPO/MergeFunctions.cpp#L424-L433.
We could use CHECK-DAG for all lines, but that seems too loose. Instead, I adjusted the order of CHECK to match the updated behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looking good.
I also remember running into issues where there would be different hashes between 32-bit and 64-bit x86. Not sure if that's an issue anymore given everything is uint64_t
, but might be worth testing that the merge functions tests pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/13/builds/3149 Here is the relevant piece of the build log for the reference
|
This reverts commit b667d16.
This is largely NFC, and it prepares for #112638. - Use stable_hash instead of uint64_t - Rename update* to hash* functions. They compute stable_hash locally and return it. This is a patch for https://discourse.llvm.org/t/rfc-global-function-merging/82608.
This reverts commit 98ca9a6.
This is largely NFC, and it prepares for #112638. - Use stable_hash instead of uint64_t - Rename update* to hash* functions. They compute stable_hash locally and return it. This is a patch for https://discourse.llvm.org/t/rfc-global-function-merging/82608.
This computes a structural hash while allowing for selective ignoring of certain operands based on a custom function that is provided. Instead of a single hash value, it now returns FunctionHashInfo which includes a hash value, an instruction mapping, and a map to track the operand location and its corresponding hash value that is ignored. Depends on #112621. This is a patch for https://discourse.llvm.org/t/rfc-global-function-merging/82608.
This is largely NFC, and it prepares for llvm#112638. - Use stable_hash instead of uint64_t - Rename update* to hash* functions. They compute stable_hash locally and return it. This is a patch for https://discourse.llvm.org/t/rfc-global-function-merging/82608.
This reverts commit b667d16.
This is largely NFC, and it prepares for llvm#112638. - Use stable_hash instead of uint64_t - Rename update* to hash* functions. They compute stable_hash locally and return it. This is a patch for https://discourse.llvm.org/t/rfc-global-function-merging/82608.
This reverts commit 98ca9a6.
This is largely NFC, and it prepares for llvm#112638. - Use stable_hash instead of uint64_t - Rename update* to hash* functions. They compute stable_hash locally and return it. This is a patch for https://discourse.llvm.org/t/rfc-global-function-merging/82608.
This computes a structural hash while allowing for selective ignoring of certain operands based on a custom function that is provided. Instead of a single hash value, it now returns FunctionHashInfo which includes a hash value, an instruction mapping, and a map to track the operand location and its corresponding hash value that is ignored. Depends on llvm#112621. This is a patch for https://discourse.llvm.org/t/rfc-global-function-merging/82608.
Just to check, how "stable" is stable here? Is this just stable for a given LLVM build, or will it have to be stable across versions? |
Here "stable" refers to a specific LLVM version. We previously discussed this in #105849 (review). Consequently, I've updated the comment in the header of |
This is largely NFC, and it prepares for #112638.
This is a patch for https://discourse.llvm.org/t/rfc-global-function-merging/82608.