Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADT,CodeGen] Remove stable_hash_combine_string #100668

Merged

Conversation

MaskRay
Copy link
Member

@MaskRay MaskRay commented Jul 25, 2024

FNV, used by stable_hash_combine_string is extremely slow. For string
hashing with good avalanche effects, we prefer xxh3_64bits.

StableHashing.h might still be useful as it provides a stable
hash_combine while Hashing.h's might be non-deterministic (#96282).

Created using spr 1.3.5-bogner
@MaskRay MaskRay requested review from plotfi and kyulee-com July 25, 2024 23:00
@llvmbot
Copy link
Member

llvmbot commented Jul 25, 2024

@llvm/pr-subscribers-llvm-adt

Author: Fangrui Song (MaskRay)

Changes

FNV, used by stable_hash_combine_string is extremely slow. For string
hashing with good avalanche effects, we prefer xxh3_64bits.

StableHashing.h might still be useful as Hashing.h hashes might be
non-deterministic.


Full diff: https://github.com/llvm/llvm-project/pull/100668.diff

2 Files Affected:

  • (modified) llvm/include/llvm/ADT/StableHashing.h (-12)
  • (modified) llvm/lib/CodeGen/MachineStableHash.cpp (+4-4)
diff --git a/llvm/include/llvm/ADT/StableHashing.h b/llvm/include/llvm/ADT/StableHashing.h
index 884b5752d9bb0..f675f828f702e 100644
--- a/llvm/include/llvm/ADT/StableHashing.h
+++ b/llvm/include/llvm/ADT/StableHashing.h
@@ -95,18 +95,6 @@ inline stable_hash stable_hash_combine_array(const stable_hash *P, size_t C) {
     hashing::detail::stable_hash_append(Hash, P[I]);
   return Hash;
 }
-
-inline stable_hash stable_hash_combine_string(const StringRef &S) {
-  return stable_hash_combine_range(S.begin(), S.end());
-}
-
-inline stable_hash stable_hash_combine_string(const char *C) {
-  stable_hash Hash = hashing::detail::FNV_OFFSET_64;
-  while (*C)
-    hashing::detail::stable_hash_append(Hash, *(C++));
-  return Hash;
-}
-
 } // namespace llvm
 
 #endif
diff --git a/llvm/lib/CodeGen/MachineStableHash.cpp b/llvm/lib/CodeGen/MachineStableHash.cpp
index 5abfbd5981fba..d2e02a2d739c1 100644
--- a/llvm/lib/CodeGen/MachineStableHash.cpp
+++ b/llvm/lib/CodeGen/MachineStableHash.cpp
@@ -33,6 +33,7 @@
 #include "llvm/MC/MCSymbol.h"
 #include "llvm/Support/Alignment.h"
 #include "llvm/Support/ErrorHandling.h"
+#include "llvm/Support/xxhash.h"
 
 #define DEBUG_TYPE "machine-stable-hash"
 
@@ -100,8 +101,7 @@ stable_hash llvm::stableHashValue(const MachineOperand &MO) {
   case MachineOperand::MO_TargetIndex: {
     if (const char *Name = MO.getTargetIndexName())
       return stable_hash_combine(MO.getType(), MO.getTargetFlags(),
-                                 stable_hash_combine_string(Name),
-                                 MO.getOffset());
+                                 xxh3_64bits(Name), MO.getOffset());
     StableHashBailingTargetIndexNoName++;
     return 0;
   }
@@ -113,7 +113,7 @@ stable_hash llvm::stableHashValue(const MachineOperand &MO) {
 
   case MachineOperand::MO_ExternalSymbol:
     return hash_combine(MO.getType(), MO.getTargetFlags(), MO.getOffset(),
-                        stable_hash_combine_string(MO.getSymbolName()));
+                        xxh3_64bits(MO.getSymbolName()));
 
   case MachineOperand::MO_RegisterMask:
   case MachineOperand::MO_RegisterLiveOut: {
@@ -151,7 +151,7 @@ stable_hash llvm::stableHashValue(const MachineOperand &MO) {
   case MachineOperand::MO_MCSymbol: {
     auto SymbolName = MO.getMCSymbol()->getName();
     return hash_combine(MO.getType(), MO.getTargetFlags(),
-                        stable_hash_combine_string(SymbolName));
+                        xxh3_64bits(SymbolName));
   }
   case MachineOperand::MO_CFIIndex:
     return stable_hash_combine(MO.getType(), MO.getTargetFlags(),

@MaskRay MaskRay requested a review from lanza July 25, 2024 23:00
@dwblaikie
Copy link
Collaborator

Seems unfortunate to have direct calls to specific hash implementations littered around the codebase next time we want to replace them with something better...

I guess stable is intended to be stable across LLVM versions, etc? So it can't ever be changed? Could we add some sort of generation parameter to it and use that to pick the implementation, so that some clients can pick a newer generation?
Or add an API that matches the needs of these updated callers? What sort of determinism-but-not-stability do they need? Just something that's stable witihn a version, rather than an eternal stability that stable offers?

@plotfi
Copy link
Collaborator

plotfi commented Jul 26, 2024

@kyulee-com do these changes cause any build breaks downstream?

@MaskRay
Copy link
Member Author

MaskRay commented Jul 26, 2024

Seems unfortunate to have direct calls to specific hash implementations littered around the codebase next time we want to replace them with something better...

I guess stable is intended to be stable across LLVM versions, etc? So it can't ever be changed? Could we add some sort of generation parameter to it and use that to pick the implementation, so that some clients can pick a newer generation? Or add an API that matches the needs of these updated callers? What sort of determinism-but-not-stability do they need? Just something that's stable witihn a version, rather than an eternal stability that stable offers?

The argument is whether we should have a desgination "stable hashing", which is deterministic but might evolve (to better algorithms).

I think the answer is yes but I believe that the implementation should be polished first.
It should be efficient, and the interface should be what we are comfortable to support.

Currently, the API stable_hash_combine would be useful, but it should move away from the extremely slow byte-wise FNV.
I am not certain the name stable_hash_combine_string should be used for a stable string hash API, which may unify with StringMap.
I am inclined to remove it.

@kyulee-com
Copy link
Contributor

stable is designed to be consistent across different versions or builds for safe entity matching. How consistent is xxh3_64bits across versions?

This change seems conflicting with my upcoming changes detailed in b5d7f5f#diff-56094e62599ca13f28f0857547a76c2387917badcb618eda3ee1547c915888ce, based on https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753. However, optimistic transformations using stable hashes should not depend solely on hash stability, although it affects efficiency. I believe there's no upstream dependency yet, pending the RFC's implementation. So, I think updating the hash implementation now might be okay if it remains stable across versions, but continuous updates pose a risk as it will impact the effectiveness of these optimizations.

@MaskRay
Copy link
Member Author

MaskRay commented Jul 27, 2024

stable is designed to be consistent across different versions or builds for safe entity matching. How consistent is xxh3_64bits across versions?

This change seems conflicting with my upcoming changes detailed in b5d7f5f#diff-56094e62599ca13f28f0857547a76c2387917badcb618eda3ee1547c915888ce, based on discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753. However, optimistic transformations using stable hashes should not depend solely on hash stability, although it affects efficiency. I believe there's no upstream dependency yet, pending the RFC's implementation. So, I think updating the hash implementation now might be okay if it remains stable across versions, but continuous updates pose a risk as it will impact the effectiveness of these optimizations.

xxh3_64bits is a fixed algorithm and the hash values will not change.

To a wide audience, StableHashing.h is like a label that points to the current "good enough" deterministic hash. The underlying implementation might change.
If the proposed optimization has a specific dependency on the stability across compiler versions, using StableHashing.h is not suitable.

Actually, currently StableHashing.h only has one user. I hope that its uses can be eliminated entirely. Then we add a fast StableHashing.h, and add some mechanism to prevent reliance, then re add this header file.

Copy link
Contributor

@kyulee-com kyulee-com left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xxh3_64bits is a fixed algorithm and the hash values will not change.

LGTM

@MaskRay MaskRay merged commit c538434 into main Jul 28, 2024
9 checks passed
@MaskRay MaskRay deleted the users/MaskRay/spr/adtcodegen-remove-stable_hash_combine_string branch July 28, 2024 17:14
banach-space pushed a commit to banach-space/llvm-project that referenced this pull request Aug 7, 2024
FNV, used by stable_hash_combine_string is extremely slow. For string
hashing with good avalanche effects, we prefer xxh3_64bits.

StableHashing.h might still be useful as it provides a stable
hash_combine while Hashing.h's might be non-deterministic (llvm#96282).

Pull Request: llvm#100668
adrian-prantl pushed a commit to adrian-prantl/llvm-project that referenced this pull request Oct 21, 2024
FNV, used by stable_hash_combine_string is extremely slow. For string
hashing with good avalanche effects, we prefer xxh3_64bits.

StableHashing.h might still be useful as it provides a stable
hash_combine while Hashing.h's might be non-deterministic (llvm#96282).

Pull Request: llvm#100668

(cherry picked from commit c538434)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants