-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
regression: hashes
makes tables
100x slower on some inputs, eg oids
#11581
Comments
hashes
makes tables
100x slower on some inputshashes
makes tables
100x slower on some inputs, eg oids
see also #11767 (comment) in which I show the multibyte jenkins can give a 5X slowdown for inputs as small as |
I'm working on this, there are some things that need to be fixed first. |
hopefully won't conflict too much with #11767 PR that also touches hashes.nim
|
My implementation uses 32-bit version of Murmur3, and it already works in VM. The only remaining problem is JS backend. |
This should be fixed when #12022 is merged. |
Just a note - currently overall "best" known tests are to be found in https://github.com/rurban/smhasher (and even better with rurban/smhasher#62 ). As can be seen, murmur3 is moderately slow though very portable. Feel free to reconsider the choice. |
@dumblob no need to cross-post the same comment, if you do at least mention that you've posted the same in #12022 (comment) yes there are better hashes available, depending on what dimensions you care about. However:
Most importantly, it's already implemented. I don't think we should reconsider, we should merge @narimiran 's PR (once ready) because it solves an urgent serious performance regression. This doesn't mean the hash function can't be changed down the line though (we shouldn't promise hash compatibility across nim releases, that would be anti-performance), it just needs someone to write a PR that shows a better tradeoff. Ideally, it'd be pluggable so users can experiment easily with alternative hashes. |
Personally, I like wy hash which is also simple and about |
summary
#11203 (/cc @narimiran) introduced a performance regression where tables (and anything that uses hashes, eg StringTable etc) can become 100X slower (see tests below). This is the case for example when using oids's as keys.
details
The original implementation in hashes.nim (although not mentioned in the code) was using the jenkins one at a time hash function (https://en.wikipedia.org/wiki/Jenkins_hash_function), which (as its name suggests) takes 1 byte at a time and updates the hash.
This hash algorithm doesn't behave well at all when using n bytes at a time as done in #11203, for example:
The reason is, depending on the length of the input strings, the resulting avalanche effect is gone, and some input bits have no effect on some output bits of the hash when we process n bytes at a time, this can be shown by looking at the
hashImpl
code closely; eg ifx
is of length 16, we'd have just 2 calls to!&
and the low bits of the hash are independent of a majority of the input bits.proposal: use murmur3 instead of current hash (jenkins)
I implemented murmur3 in https://github.com/timotheecour/vitanim/blob/master/murmur/murmur.nim and IMO it is preferable to current hash, see rationale below.
(note: there are already packages for murmur3 but they're old and they don't implement the efficient multibyte
MurmurHash3_x64_128
)analysis: entropy and key collision
A reasonable measure of how good a hash is for a class of input keys is entropy; as defined by
sum_i -p(i) log(p(i))
wherep(i)
is the normalized collision count. I'm measuring this entropy here https://github.com/timotheecour/vitanim/blob/master/testcases/tests/t0128.nimalong with max number of collisions. It shows that for eg, input keys of the form
align($number, padding)
behave badly (low entropy, high max collision count) when padding is 8 thru 11 or 16 thru 19, etc.Ditto when using
oids
, which also behave badly with current hashMurmur3, in contrast, consistently produces entropy values very close to the ideal entropy of a uniform distribution, as shown in that test code, for all cases.
analysis: performance of hashing, and of key insertion and retrieval
This test file https://github.com/timotheecour/vitanim/blob/master/testcases/tests/t0129.nim compares performance of 3 hash functions:
It computes performance in 2 settings:
-d:danger
vs regular buildsIt also computes performance in 2 key settings:
-d:case_long_random_key
as produced bygetRandomString
(pkg/sysrandom
)oids
, which are problematic for current multibyte jenkins because of their length and common prefixthe numbers (see that test file) show that murmur3 is always either the fastest or on par with the fastest in all cases, whereas:
oids
inputsconclusion
side note: unaligned pointer
hashImpl
usesn = cast[ptr Hash](unsafeAddr x[i])[]
; isn't that undefined behavior? see https://www.viva64.com/en/w/V1032/ which mentions either undefined behavior or potential performance penalty;I didn't actually run into an issue related to that alignment aspect (and actually varying the offset
sPos
inhash*(sBuf: string, sPos, ePos: int)
gave 0 performance difference in my tests) but that could be machine specific; and indeed, when compiling manually the C file generated by this:passing in
clang -c -Wcast-align
(and removing-w
) we get a warning:note that D's hash.d takes into account alignment before cast for multibyte hash, see https://github.com/dlang/druntime/blob/000b0e8862a21d2754ae1389da4134dc6fea12c2/src/core/internal/hash.d#L691
The text was updated successfully, but these errors were encountered: