Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Neater hashing interface #4524

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open

Neater hashing interface #4524

wants to merge 18 commits into from

Conversation

widlarizer
Copy link
Collaborator

@widlarizer widlarizer commented Aug 6, 2024

We want to be able to plug-and-play hash functions to improve hashlib structure collision rates and hashing speed. Currently, in some ways, hashes are incorrectly handled: for deep structure hashing, each substructure constructs a hash from scratch, and these hashes are then combined with addition, XOR, sometimes xorshifted to make this less iffy, but overall, this is risky, as it may degrade various hash functions to varying degrees. It seems to me that the correct combination of hashes is to have a hash state that is mutated with each datum hashed in sequence. That's what this PR does.

  • unsigned int hash() functions are replaced with hash_state_t hash_acc(hash_state_t h)
  • mkhash_add is deprecated, since it relies on having a second argument value adjacency preserving alternative hash function. Such a requirement seems to be at odds with desirable hash function qualities
  • hash_state_t mkhash_init() is now a function instead of a const
  • hash_t mkhash_finish(hash_state_t h) added
  • run_hash wrapper allows for "just give me the hash of this one thing" use cases, replacing unsigned int hash() methods. It has the advantage for covering any type that has hash_ops implemented. It's somewhat like mkhash was, but mkhash collides with the core hash function, and I want to keep that split off.

As it is right now, the PR isn't at odds with global state for something like xxhash. hash_state_t may become void in that case. There's no provisions for SSE/multi-lane hashing at the moment.

  • check performance impact since this PR isn't NFC as it changes how structures are hashed
  • clean up things left over from experiments with inheriting from Hashable
  • fix pyosys
  • finish the plugin compatibility part of the doc

hash_state_t hash_acc(hash_state_t h) const {
hash_state_t st = h;
st = mkhash(entries.size(), st);
for (auto &it : entries) {
Copy link
Member

@povik povik Aug 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is order sensitive now, so two dict which are equal will fail to have identical hashes.

Copy link
Collaborator Author

@widlarizer widlarizer Aug 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • put the order-agnostic xor back

I don't expect dicts of dicts or dict comparison to be common and therefore important for performance so that should be fine

@widlarizer
Copy link
Collaborator Author

Cool, I got hit in the face with a downright gcc bug

@widlarizer
Copy link
Collaborator Author

With ibex and jpeg synthesized with ORFS, I'm seeing a 1% performance regression with this PR. This is probably because we're actually using the seed function more directly, with less xorshifting involved. I wonder if a quick swap of the hashing function would change the result. However, I'm also seeing a 1% memory usage improvement with jpeg, which is pretty interesting

@widlarizer
Copy link
Collaborator Author

Passes like extract_fa and opt_dff take 5-10% longer with this change. This is definitely a problem

@povik
Copy link
Member

povik commented Oct 29, 2024

Leaving a note so I don't forget it: We should provide a way for plugins to be compatible with both pre and post this change. I am thinking a define to advertise the new API.

Co-authored-by: KrystalDelusion <93062060+KrystalDelusion@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants