-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace murmur hash with djb2 hash #203
Conversation
Also, I didn't look very hard for a faster hashing algorithm, so if you know of one that's even faster, let's try that! |
64df84d
to
d37a81a
Compare
src/util.js
Outdated
} | ||
/* eslint-enable no-fallthrough */ | ||
function djb2Hash(str) { | ||
let hash = 5381; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's this magic number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was apparently chosen through manual testing as the number that produced fewer collisions and better avalanching.
http://stackoverflow.com/a/10697529/18986
https://groups.google.com/forum/#!msg/comp.lang.c/lSKWXiuNOAk/zstZ3SRhCjgJ
http://stackoverflow.com/a/4825477/18986
JSPerf: https://jsperf.com/murmurhash2-vs-djb2 This implementation of djb2 seems to be about 2x faster than the implementation of MurmurHash2 in Chrome. In Firefox, they seem to be pretty equivalent. |
I should comment on this I guess, since I'm the one who pushed for murmurhash. It's a very fast (and good) hash when implemented in C, but maybe not in javascript. I don't have a problem moving to djb2, though it's not a great hash function. (The claim about "same number of collisions" depends on the dataset. I doubt it's true in general, though may be true for the types of strings we're hashing in aphrodite -- that could be tested experimentally, though may not be worth the trouble. After all, if a hash is twice as fast that can pay for a lot of collision-handling code!) The best thing to do would be to ask the js engine to use their internal hash function, something like |
This would be great, but I don't think it is an option. And, if we could do this, it would have to be the same hash function across all browsers to ensure that the hash is the same from server -> client. |
JS definitely doesn't expose anything like that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small change, and this LGTM! Echoing what craig said, I'm a little sad that we're switching to a worse hash, but for now since we're also prepending the name that you provide, the chance of collisions is very very low, so I'm happy with this.
In the future, we were thinking about a "prod" mode that would exclude the names and would also hash all of the concatenated class names together into one hash (e.g. "1a81283-o_O-19a238" -> "282ac923") which we might want to do better testing and seeing which hash we want.
src/util.js
Outdated
@@ -212,7 +164,7 @@ function murmurhash2_32_gc(str) { | |||
// this to produce consistent hashes browsers need to have a consistent | |||
// ordering of objects. Ben Alpert says that Facebook depends on this, so we | |||
// can probably depend on this too. | |||
export const hashObject = (object /* : ObjectMap */) /* : string */ => murmurhash2_32_gc(JSON.stringify(object)); | |||
export const hashObject = (object /* : ObjectMap */) /* : number */ => stringHash(JSON.stringify(object)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you stringify the output here in the same way that we did in murmurhash? I think a .toString(36)
at the end here is all that's needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! I also profiled this and it seems to be equivalent to not converting to a string.
In profiling StyleSheet.create, I noticed that much of the time was spent hashing. So, I found a faster hashing algorithm. The implementation was taken from: https://github.com/darkskyapp/string-hash According to this StackExchange post, this algorithm doesn't have as good of randomness, but it has about the same percentage of collisions. I don't think randomness matters for this application, so I think this is okay. http://softwareengineering.stackexchange.com/a/145633 Using similar methodology to Khan#202, this appears to make StylSheet.create ~15% faster (~220ms to ~185ms).
This is where I copied the code for this algorithm from, seems like we might as well just bring in the dependency for it.
This was replaced with a different hashing algorithm by #203, so I don't think we really need this note anymore.
In profiling StyleSheet.create, I noticed that much of the time was
spent hashing. So, I found a faster hashing algorithm.
The implementation was taken from:
https://github.com/darkskyapp/string-hash
According to this StackExchange post, this algorithm doesn't have as
good of randomness, but it has about the same percentage of collisions.
I don't think randomness matters for this application, so I think this
is okay.
http://softwareengineering.stackexchange.com/a/145633
Using similar methodology to #202, this appears to make StylSheet.create
~15% faster (~220ms to ~185ms).
cc @ljharb