Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hashing.murmur3_128 is incompatible with C++ reference implementation for negative seed values #3493

Open
oertl opened this issue Jun 4, 2019 · 1 comment
Labels

Comments

@oertl
Copy link

oertl commented Jun 4, 2019

The reference implementation as mentioned in the Javadoc of Hashing.murmur3_128(int) (https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp) uses an unsigned 32-bit integer as seed. Since Java only has signed integers, half of all possible seed values are represented as negative values. Within the constructor of Murmur3_128Hasher the seed value is assigned to a long:
this.h1 = seed; this.h2 = seed;
Unfortunately, for negative values, this is not equivalent to the reference implementation using unsigned integers. To fix that the two assignment statements need to be changed to
this.h1 = seed & 0xFFFFFFFFL; this.h2 = seed & 0xFFFFFFFFL;

@kluever kluever added package=hash type=defect Bug, not working as expected labels Jun 4, 2019
@nick-someone
Copy link
Member

Thanks for bringing this to our attention! At first glance, this inconsistency between the Java implementation and the linked C++ implementation appears to be correct. We'll need to decide what to do here, since people can have persisted the hash values with an explicit negative seed before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants