-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Swap murmurhash in as primary hash function. #1433
Conversation
Does it make sense to investigate a general purpose string/byte hashing algorithm? MH is pretty speedy as far as hash functions go but it is still ~100x slower than |
Good question ;). Poor man's solution could be this:
Should be much faster, too! I do think we should avoid developing our own hash function because it's |
... and it probably is a bit like crypto, you think it will be a one afternoon adventure and two years later you are still developing and discovering what others learnt many years ago. |
I was thinking that the way you'd design your hash function would be different if you only have 10 values or if you have 256. I'd be surprised if MH runs faster on byte arrays that only contain 10 different values than on one that contains 256 different values. |
On Mon, Sep 05, 2016 at 08:09:51AM -0700, Tim Head wrote:
Right - but if you hash four DNA characters at a time in an 8-bit string |
Starting from #1432 (simplification of hash function code), swap in murmurhash as the default hash function for khmer.
This will support k > 32. cc #1426
The tests that are broken are either checks on the exact hash values, OR are sensitive to collisions (a lot of the occupancy numbers are like this), OR are broken for unknown reasons - I'm not sure why the graph alignment code is broken, in particular. Still, we're down to 15 broken tests which is pretty good...