-
Notifications
You must be signed in to change notification settings - Fork 892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Neater hashing interface #4524
base: main
Are you sure you want to change the base?
Neater hashing interface #4524
Conversation
hash_state_t hash_acc(hash_state_t h) const { | ||
hash_state_t st = h; | ||
st = mkhash(entries.size(), st); | ||
for (auto &it : entries) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is order sensitive now, so two dict
which are equal will fail to have identical hashes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- put the order-agnostic xor back
I don't expect dicts of dicts or dict comparison to be common and therefore important for performance so that should be fine
Cool, I got hit in the face with a downright gcc bug |
6f9aefc
to
8bb5c1f
Compare
de7d1b3
to
c3c3a7e
Compare
fcbf0d3
to
68e40d8
Compare
ead99bb
to
6a570b3
Compare
With ibex and jpeg synthesized with ORFS, I'm seeing a 1% performance regression with this PR. This is probably because we're actually using the seed function more directly, with less xorshifting involved. I wonder if a quick swap of the hashing function would change the result. However, I'm also seeing a 1% memory usage improvement with jpeg, which is pretty interesting |
Passes like |
Leaving a note so I don't forget it: We should provide a way for plugins to be compatible with both pre and post this change. I am thinking a define to advertise the new API. |
Co-authored-by: KrystalDelusion <93062060+KrystalDelusion@users.noreply.github.com>
We want to be able to plug-and-play hash functions to improve hashlib structure collision rates and hashing speed. Currently, in some ways, hashes are incorrectly handled: for deep structure hashing, each substructure constructs a hash from scratch, and these hashes are then combined with addition, XOR, sometimes xorshifted to make this less iffy, but overall, this is risky, as it may degrade various hash functions to varying degrees. It seems to me that the correct combination of hashes is to have a hash state that is mutated with each datum hashed in sequence. That's what this PR does.
unsigned int hash()
functions are replaced withhash_state_t hash_acc(hash_state_t h)
mkhash_add
is deprecated, since it relies on having a second argument value adjacency preserving alternative hash function. Such a requirement seems to be at odds with desirable hash function qualitieshash_state_t mkhash_init()
is now a function instead of a consthash_t mkhash_finish(hash_state_t h)
addedrun_hash
wrapper allows for "just give me the hash of this one thing" use cases, replacingunsigned int hash()
methods. It has the advantage for covering any type that hashash_ops
implemented. It's somewhat likemkhash
was, butmkhash
collides with the core hash function, and I want to keep that split off.As it is right now, the PR isn't at odds with global state for something like xxhash.
hash_state_t
may becomevoid
in that case. There's no provisions for SSE/multi-lane hashing at the moment.Hashable