Neater hashing interface #4524

widlarizer · 2024-08-06T10:50:29Z

We want to be able to plug-and-play hash functions to improve hashlib structure collision rates and hashing speed. Currently, in some ways, hashes are incorrectly handled: for deep structure hashing, each substructure constructs a hash from scratch, and these hashes are then combined with addition, XOR, sometimes xorshifted to make this less iffy, but overall, this is risky, as it may degrade various hash functions to varying degrees. It seems to me that the correct combination of hashes is to have a hash state that is mutated with each datum hashed in sequence. That's what this PR does.

unsigned int hash() functions are replaced with hash_state_t hash_acc(hash_state_t h)
mkhash_add is deprecated, since it relies on having a second argument value adjacency preserving alternative hash function. Such a requirement seems to be at odds with desirable hash function qualities
hash_state_t mkhash_init() is now a function instead of a const
hash_t mkhash_finish(hash_state_t h) added
run_hash wrapper allows for "just give me the hash of this one thing" use cases, replacing unsigned int hash() methods. It has the advantage for covering any type that has hash_ops implemented. It's somewhat like mkhash was, but mkhash collides with the core hash function, and I want to keep that split off.

As it is right now, the PR isn't at odds with global state for something like xxhash. hash_state_t may become void in that case. There's no provisions for SSE/multi-lane hashing at the moment.

check performance impact since this PR isn't NFC as it changes how structures are hashed
clean up things left over from experiments with inheriting from Hashable
fix pyosys
finish the plugin compatibility part of the doc

povik · 2024-08-12T13:10:06Z

kernel/hashlib.h

+	hash_state_t hash_acc(hash_state_t h) const {
+		hash_state_t st = h;
+		st = mkhash(entries.size(), st);
+		for (auto &it : entries) {


I think this is order sensitive now, so two dict which are equal will fail to have identical hashes.

put the order-agnostic xor back

I don't expect dicts of dicts or dict comparison to be common and therefore important for performance so that should be fine

widlarizer · 2024-08-22T18:20:10Z

Cool, I got hit in the face with a downright gcc bug

widlarizer · 2024-10-22T13:20:07Z

With ibex and jpeg synthesized with ORFS, I'm seeing a 1% performance regression with this PR. This is probably because we're actually using the seed function more directly, with less xorshifting involved. I wonder if a quick swap of the hashing function would change the result. However, I'm also seeing a 1% memory usage improvement with jpeg, which is pretty interesting

widlarizer · 2024-10-22T16:24:27Z

Passes like extract_fa and opt_dff take 5-10% longer with this change. This is definitely a problem

povik · 2024-10-29T08:40:53Z

Leaving a note so I don't forget it: We should provide a way for plugins to be compatible with both pre and post this change. I am thinking a define to advertise the new API.

docs/source/yosys_internals/hashing.rst

Co-authored-by: KrystalDelusion <93062060+KrystalDelusion@users.noreply.github.com>

povik reviewed Aug 12, 2024

View reviewed changes

widlarizer force-pushed the emil/hashlib-interface branch from 6f9aefc to 8bb5c1f Compare August 22, 2024 21:49

This was referenced Aug 27, 2024

proc_dff: process sync rules in reverse input order #4568

Closed

proc_dff: respect sync rule priorities when generating complex dffsrs #4569

Merged

widlarizer force-pushed the emil/hashlib-interface branch 2 times, most recently from de7d1b3 to c3c3a7e Compare September 3, 2024 11:31

povik mentioned this pull request Sep 9, 2024

clockgate: centralize clock enables out of FFs #4583

Merged

2 tasks

widlarizer force-pushed the emil/hashlib-interface branch from fcbf0d3 to 68e40d8 Compare October 1, 2024 13:12

widlarizer mentioned this pull request Oct 9, 2024

Break tests with a small hash function perturbation #4559

Closed

widlarizer added 5 commits October 18, 2024 12:01

hashlib: redo interface for flexibility

c852dd3

driver: add --hash-seed

76eacb4

abc: sort stats

15c51d7

hashlib: fix pyosys

1bfddea

hashlib: only include in one place

6a570b3

widlarizer force-pushed the emil/hashlib-interface branch from ead99bb to 6a570b3 Compare October 18, 2024 10:05

widlarizer added 2 commits October 18, 2024 12:34

hashlib: use hash_t across the board

2bc5ca0

hashlib: hash_t can be set to 64-bit

25cd9fb

widlarizer marked this pull request as ready for review October 18, 2024 21:02

widlarizer requested review from zachjs and whitequark as code owners October 18, 2024 21:02

widlarizer mentioned this pull request Oct 18, 2024

opt_merge: hashing performance and correctness #4677

Draft

hashlib: fudge always

d14d2dd

widlarizer added 4 commits October 30, 2024 10:48

hashlib: don't xorshift in between upper and lower word

88f0774

hashlib: allow forcing Hasher state, use it for IdString trivial hashing

df44003

hashlib: prevent naive hashing of IdString when hashing SigBit

9bdecc5

hash: solo hashing interface, override for SigBit

fb45749

widlarizer added 4 commits November 4, 2024 13:11

hashlib: restore hash_obj_ops for pointers to indexed types

363a902

hashlib: remove is_new from HasherDJB32, implement hash_top for IdString

cf086d6

hashlib: run_hash uses hash_top_ops, not hash_ops

3ddea7d

docs: document the ideas behind the hashing interface

59d8562

widlarizer requested a review from KrystalDelusion as a code owner November 6, 2024 17:14

Docs: Formatting and fixes

efab718

KrystalDelusion reviewed Nov 6, 2024

View reviewed changes

docs/source/yosys_internals/hashing.rst Outdated Show resolved Hide resolved

docs: formatting and fixes

0bcb41c

Co-authored-by: KrystalDelusion <93062060+KrystalDelusion@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Neater hashing interface #4524

Neater hashing interface #4524

widlarizer commented Aug 6, 2024 •

edited

Loading

povik Aug 12, 2024 •

edited

Loading

widlarizer Aug 12, 2024 •

edited

Loading

widlarizer commented Aug 22, 2024

widlarizer commented Oct 22, 2024

widlarizer commented Oct 22, 2024

povik commented Oct 29, 2024

Neater hashing interface #4524

Are you sure you want to change the base?

Neater hashing interface #4524

Conversation

widlarizer commented Aug 6, 2024 • edited Loading

povik Aug 12, 2024 • edited Loading

Choose a reason for hiding this comment

widlarizer Aug 12, 2024 • edited Loading

Choose a reason for hiding this comment

widlarizer commented Aug 22, 2024

widlarizer commented Oct 22, 2024

widlarizer commented Oct 22, 2024

povik commented Oct 29, 2024

widlarizer commented Aug 6, 2024 •

edited

Loading

povik Aug 12, 2024 •

edited

Loading

widlarizer Aug 12, 2024 •

edited

Loading