Try using 32-bit MurmurHash instead of concatenated integers for LSH hashes #664
alexklibisz
started this conversation in
Ideas
Replies: 2 comments
-
Results from LocalBenchmarks.scala using current master branch:
The fashion-mnist index was 223mb. The sift index was 1gb.
|
Beta Was this translation helpful? Give feedback.
0 replies
-
Results after taking a first pass at using MurmurHash2 from Lucene library on this branch
(Should probably re-run this. Laptop really heated up here so CPU might've been throttled.)
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Several of the LSH implementations currently concatenate
k
hash values into an amplified hash value that is indexed and used for retrieval. This amplified hash value has(k + 1) * 4
bytes: thek
integer hash values and the hash table index (1 toL
).A more space-efficient implementation would use the MurmurHash algorithm to hash the ints to a single 32 bit value.
Here's an example of how this could work:
A more optimized solution would entirely avoid constructing the array and instead compute the amplified hash value incrementally as each smaller hash is computed.
Beta Was this translation helpful? Give feedback.
All reactions