Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support externally-hashed keys #40

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

vlovich
Copy link
Contributor

@vlovich vlovich commented Mar 14, 2024

I have a use-case where:

  1. I want to use a different hash function (specifically xxh3 which is typically faster than wyhash)
  2. I have multiple hash functions to lookup the same key. I can thus amortize so that the hashing only happens once outside of Mphf.

This provides savings so that the lookup within the hash function is basically the same cost as a u64 regardless of the true size of the key input. Obviously the hash still has to be done somewhere, so it's not totally free but as mentioned the savings grow the more hash functions I need to interrogate and it lets me use a faster xxh3 hash.

I tried to keep the identical hash results if you use anything other than the new ExternallyHashed type while simultaneously carving out a fast path for it to shine (I've also in theory fast-pathed construction/lookups for integer keys where we can skip constructing the WyHash state object but I didn't bother benchmarking the actual improvement there).

@vlovich vlovich force-pushed the externally-hashed branch 4 times, most recently from 0b1c45c to 254f764 Compare March 14, 2024 19:47
@pmarks
Copy link
Contributor

pmarks commented Mar 15, 2024

Interesting... I'm trying to figure out if we could avoid switching from Hash to SeedableHash as the trait required for keys. This feel quite onerous to users of the library to have to implement a new trait, rather than using Hash which is much more likely to be implemented for some type one might want to use as a key. Hash is very convenient to derive, whereas SeedableHash would need to be implemented by hand, and would run into orphan issues for a type the user doesn't control. That feels like a big blocker to me.

@vlovich vlovich force-pushed the externally-hashed branch from 254f764 to 5978e48 Compare April 4, 2024 06:58
vlovich added 2 commits April 5, 2024 19:45
This is ~10% faster for u64 lookups. For lookups and construction the
time is constant regardless of the true length of the input (assuming
you can amortize the hashing cost somehow externally to this library).

For example, for a 128 byte string construction is ~2.5x faster and
lookups are ~1.7x faster.
Latest rayon requires at least Rust 1.63, regex requires 1.65, and
std::hint::black_box was stabilized in 1.66, so bump that.

Also bump the current stable version to validate with a newer Rust since
presumably that's the intention of the second version.
@vlovich vlovich force-pushed the externally-hashed branch from 5978e48 to dab764c Compare April 6, 2024 02:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants