Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace murmur hash with djb2 hash #203

Merged
merged 2 commits into from
Mar 8, 2017
Merged

Replace murmur hash with djb2 hash #203

merged 2 commits into from
Mar 8, 2017

Conversation

lencioni
Copy link
Collaborator

@lencioni lencioni commented Mar 4, 2017

In profiling StyleSheet.create, I noticed that much of the time was
spent hashing. So, I found a faster hashing algorithm.

The implementation was taken from:

https://github.com/darkskyapp/string-hash

According to this StackExchange post, this algorithm doesn't have as
good of randomness, but it has about the same percentage of collisions.
I don't think randomness matters for this application, so I think this
is okay.

http://softwareengineering.stackexchange.com/a/145633

Using similar methodology to #202, this appears to make StylSheet.create
~15% faster (~220ms to ~185ms).

cc @ljharb

@lencioni
Copy link
Collaborator Author

lencioni commented Mar 4, 2017

Also, I didn't look very hard for a faster hashing algorithm, so if you know of one that's even faster, let's try that!

@lencioni lencioni force-pushed the hash branch 2 times, most recently from 64df84d to d37a81a Compare March 4, 2017 18:45
src/util.js Outdated
}
/* eslint-enable no-fallthrough */
function djb2Hash(str) {
let hash = 5381;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this magic number?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was apparently chosen through manual testing as the number that produced fewer collisions and better avalanching.

http://stackoverflow.com/a/10697529/18986
https://groups.google.com/forum/#!msg/comp.lang.c/lSKWXiuNOAk/zstZ3SRhCjgJ
http://stackoverflow.com/a/4825477/18986

@lencioni
Copy link
Collaborator Author

lencioni commented Mar 5, 2017

JSPerf: https://jsperf.com/murmurhash2-vs-djb2

This implementation of djb2 seems to be about 2x faster than the implementation of MurmurHash2 in Chrome. In Firefox, they seem to be pretty equivalent.

@csilvers
Copy link
Member

csilvers commented Mar 6, 2017

I should comment on this I guess, since I'm the one who pushed for murmurhash. It's a very fast (and good) hash when implemented in C, but maybe not in javascript. I don't have a problem moving to djb2, though it's not a great hash function. (The claim about "same number of collisions" depends on the dataset. I doubt it's true in general, though may be true for the types of strings we're hashing in aphrodite -- that could be tested experimentally, though may not be worth the trouble. After all, if a hash is twice as fast that can pay for a lot of collision-handling code!)

The best thing to do would be to ask the js engine to use their internal hash function, something like hash in python. (It looks like chrome uses jenkinshash -- which is a predecessor to murmurhash and is pretty good -- http://stackoverflow.com/questions/30695564/what-kind-of-hashing-function-algorithm-does-javascript-associative-array-use.) I don't know if javascript exposes that though.

@lencioni
Copy link
Collaborator Author

lencioni commented Mar 6, 2017

The best thing to do would be to ask the js engine to use their internal hash function

This would be great, but I don't think it is an option. And, if we could do this, it would have to be the same hash function across all browsers to ensure that the hash is the same from server -> client.

@ljharb
Copy link

ljharb commented Mar 6, 2017

JS definitely doesn't expose anything like that.

Copy link
Contributor

@xymostech xymostech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small change, and this LGTM! Echoing what craig said, I'm a little sad that we're switching to a worse hash, but for now since we're also prepending the name that you provide, the chance of collisions is very very low, so I'm happy with this.

In the future, we were thinking about a "prod" mode that would exclude the names and would also hash all of the concatenated class names together into one hash (e.g. "1a81283-o_O-19a238" -> "282ac923") which we might want to do better testing and seeing which hash we want.

src/util.js Outdated
@@ -212,7 +164,7 @@ function murmurhash2_32_gc(str) {
// this to produce consistent hashes browsers need to have a consistent
// ordering of objects. Ben Alpert says that Facebook depends on this, so we
// can probably depend on this too.
export const hashObject = (object /* : ObjectMap */) /* : string */ => murmurhash2_32_gc(JSON.stringify(object));
export const hashObject = (object /* : ObjectMap */) /* : number */ => stringHash(JSON.stringify(object));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you stringify the output here in the same way that we did in murmurhash? I think a .toString(36) at the end here is all that's needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! I also profiled this and it seems to be equivalent to not converting to a string.

lencioni and others added 2 commits March 8, 2017 12:39
In profiling StyleSheet.create, I noticed that much of the time was
spent hashing. So, I found a faster hashing algorithm.

The implementation was taken from:

  https://github.com/darkskyapp/string-hash

According to this StackExchange post, this algorithm doesn't have as
good of randomness, but it has about the same percentage of collisions.
I don't think randomness matters for this application, so I think this
is okay.

  http://softwareengineering.stackexchange.com/a/145633

Using similar methodology to Khan#202, this appears to make StylSheet.create
~15% faster (~220ms to ~185ms).
This is where I copied the code for this algorithm from, seems like we
might as well just bring in the dependency for it.
@xymostech xymostech merged commit 21ef03b into Khan:master Mar 8, 2017
@lencioni lencioni deleted the hash branch July 20, 2017 22:25
lencioni added a commit that referenced this pull request Sep 12, 2017
This was replaced with a different hashing algorithm by
#203, so I don't think we really
need this note anymore.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants