-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-10116] [core] XORShiftRandom.hashSeed is random in high bits #8314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
a17530e
339ccc3
7c46724
ea5ff69
9451cba
134533c
1a64e9c
c197f83
0547528
96eb00d
e05a035
04214d9
5e35321
7ce14fa
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -60,9 +60,11 @@ private[spark] class XORShiftRandom(init: Long) extends JavaRandom(init) { | |
| private[spark] object XORShiftRandom { | ||
|
|
||
| /** Hash seeds to have 0/1 bits throughout. */ | ||
| private def hashSeed(seed: Long): Long = { | ||
| private[random] def hashSeed(seed: Long): Long = { | ||
| val bytes = ByteBuffer.allocate(java.lang.Long.SIZE).putLong(seed).array() | ||
| MurmurHash3.bytesHash(bytes) | ||
| val lowBits = MurmurHash3.bytesHash(bytes) | ||
| val highBits = MurmurHash3.bytesHash(bytes, lowBits) | ||
| (highBits.toLong << 32) | (lowBits.toLong & 0xFFFFFFFFL) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Although normally it might not be 100% valid to make a 64-bit hash out of two 32-bit hashes this way (I'm not clear that reusing the seed doesn't connect the bits in some subtle way), it's certainly a big improvement and probably entirely fine for this purpose. I'd still like to remove XORShiftRandom, but that can be for another day. |
||
| } | ||
|
|
||
| /** | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one really confused me ... I thought I could figure out what the right values should be by running the equivalent thing in scala, but I got totally different answers.
but whatever, these vals work :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because the default number of partitions is different in Scala/Python/R tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, great, glad there is a good explanation for this :)