-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-10116] [core] XORShiftRandom.hashSeed is random in high bits #8314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
LGTM. There are better / faster RNGs in standard libs like commons math. @mengxr is it worth Spark having its own still? |
|
Test build #41251 has finished for PR 8314 at commit
|
|
hmm, some of the errors are just checks against the expected sequence from the rng, which I can update (though some of these tests probably shouldn't require a "perfect" seed). But I'm a little perplexed by some of the failures, eg. |
|
Some unit tests and python doctests do depend on the seed, more or less sensitive. I don't think requiring exact output is that bad because it can at least notify us changes in behavior. In Python, the doctest is used to generate documentation. It is useful to show actual output rather than checking the bounds, e.g., https://github.com/apache/spark/blob/master/python/pyspark/sql/dataframe.py#L459. There is a trade-off between having meaningful probabilistic bounds vs. keeping unit tests small. For example, in Word2Vec, we can increase the training dataset size to reduce the variance of the model output and hence robust to random seed, but that increases the test time too. That being said, I can help make those tests less sensitive. Do you mind making JIRAs for each of them? Regarding @srowen 's question, if adding commons-math3 dependency is not an issue and its RNG performs similarly to the one here. I think we shouldn't maintain our own. However, I'm still a little worried about compatibility issues between commons-math3 releases. |
|
@mengxr commons math 3.x is already a dependency in core. I don't have benchmarks handy, but my experience with the RNGs is that they're at least "more than fast enough" for any purpose I've had. I don't think the RNGs are changing, and they implement particular RNG processes like Well19937 that should not change over time. The down-side to not using it is simply the higher probability of bugs, like this one, when implementing from scratch. I think we need to get this fix in in any event. @squito do you need a hand? I think simply loosening the tests is appropriate, even if it means bigger tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: .map { seed =>
|
@srowen @mengxr sorry for letting this lay around for so long, somehow I completely forgot to update it. so I updated the tests with a few magic values, but didn't update the algorithms themselves. I tried a brute force search over 1000 seeds for fixing those remaining issues is probably a bit beyond me, but this can't be merged till we get those fixes. How would you like me to proceed here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this was a better way to leave a bigger margin for the low count cases, rather than increasing the multiplier. As another random point, 4 * stdev + 4 also worked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really, this expression relies upon assuming that the binomial and Poisson distribution are well approximated by a normal distribution. When the expected value is in the 10s or 20s this probably isn't very true. This could be rewritten to properly compute the probability using PoissonDistribution and BinomialDistribution. However I think it would be faster to just make sure that the RDD size is not less than 1000 or so in the tests above. (Also, the parts where it computes the expected count with math.ceil are unnecessary: no reason to require these to be an integer, and they're another source of small errors. Let expected be a Double.)
|
Test build #43988 has finished for PR 8314 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although normally it might not be 100% valid to make a 64-bit hash out of two 32-bit hashes this way (I'm not clear that reusing the seed doesn't connect the bits in some subtle way), it's certainly a big improvement and probably entirely fine for this purpose. I'd still like to remove XORShiftRandom, but that can be for another day.
|
Test build #1950 has finished for PR 8314 at commit
|
|
@squito I think the test failures are legitimate although they are likely merely due to relying too much on the random sequence before. |
|
Hi @srowen -- I know the test failures are legitimate, but I'm not sure what to do about them.
I need a bit of guidance on how to fix things -- I dunno if the right solution is looser checks, or just updating the magic values assuming that the current implementation is correct. (And the right solution might be a bit beyond me abilities at the moment ...) |
|
For The |
|
I updated some of the tests -- I think just |
|
Test build #44458 has finished for PR 8314 at commit
|
|
PS it's really on me to get this one working and check up on the tests. I'm back on it today. Thank you for pushing it this far. I want to get this one in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expected values are 200, 300, 500. The ranges are very wide; really I'd do +/- 50. The final range should be a bit bigger; 430-570 is OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, sorry about that. I think I was playing around with those ranges with much smaller samples until I decided to just use 1000 elements and didn't think about fixing the checks when they passed. I'll update.
|
@yinxusen I'm trying to help debug the test failure in While looking at it I think we can fix one small thing in should be more like This isn't really the issue but something we could adjust. Of course this also makes the test fail. If we can't figure this out... I think we could hard-code this test to work with the new seed behavior, on the theory that it's probably a test issue. |
|
The last failure is in That is the output ought to look something like ... which is a little strange since this shows a fairly poor classifier's confusion matrix. There are 2 seeds in this test and setting them to a range of values succeeds in every case for me. This one might be a matter of picking a different seed? although it is a little funny that in this case the MLP classifier never predicted class 0. But that's a different issue. I also note that LogisticRegressionSuite uses a regular java.util.Random instead of XORShiftRandom. Might be worth adjusting, unless it causes more failures. |
|
@srowen Yes the word vectors chosen to work for seed 42. But the default value for Word2Vec in ML package has been deleted. So I set the seed as 42 in the test suite. |
|
how about I just hard-code new results for now, and I open two new jiras for fixing those cases properly? |
|
@squito I tend to agree with that approach on the grounds that the point here is fixing the seeding. Making the tests less hard-coded is a separate issue if it needs to be. Maybe the perceptron test just needs a slightly different seed; maybe @yinxusen has a good answer for updating word2vec, but I don't think we should block this particular fix indefinitely otherwise. |
|
@srowen good call on the extra seed for multilayer perceptron -- I found one pretty easily for the input data that made the test pass. Perhaps could still be more general, but at least its no worse than before. And I created https://issues.apache.org/jira/browse/SPARK-11502 for finding a better solution for Word2Vec. |
|
Test build #45025 has finished for PR 8314 at commit
|
|
Test build #45046 has finished for PR 8314 at commit
|
|
Jenkins, retest this please |
|
Test build #45093 has finished for PR 8314 at commit
|
|
Test build #45101 has finished for PR 8314 at commit
|
|
@squito very nice work here. I think the SparkR tests finally need a similar treatment and it's good to go. Thankfully I think the errors are quite plain about what they expect, and they do look like the expected differences you'd see from a seed change. |
|
Test build #45120 has finished for PR 8314 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one really confused me ... I thought I could figure out what the right values should be by running the equivalent thing in scala, but I got totally different answers.
case class Data(val key: String)
val data = (0 to 99).map{x => Data((x % 3).toString)}
val dataDF = sqlContext.createDataFrame(data)
scala> dataDF.stat.sampleBy("key", Map("0" -> 0.1, "1" -> 0.2), 0).groupBy("key").count().show()
+---+-----+
|key|count|
+---+-----+
| 0| 5|
| 1| 5|
+---+-----+but whatever, these vals work :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because the default number of partitions is different in Scala/Python/R tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, great, glad there is a good explanation for this :)
|
Test build #45143 has finished for PR 8314 at commit
|
|
Great let's merge this one to master and 1.6. Not sure about 1.5 given that this is a low-priority fix and it ends up changing test behavior in several places. |
https://issues.apache.org/jira/browse/SPARK-10116 This is really trivial, just happened to notice it -- if `XORShiftRandom.hashSeed` is really supposed to have random bits throughout (as the comment implies), it needs to do something for the conversion to `long`. mengxr mkolod Author: Imran Rashid <irashid@cloudera.com> Closes #8314 from squito/SPARK-10116. (cherry picked from commit 49f1a82) Signed-off-by: Sean Owen <sowen@cloudera.com>
|
Could you notify the developer community about such changes in advance and point out to probable problematic places? That would be extremely helpful. |
|
@avulanov what do you mean in this case? Just that default random behavior may change? That is a given though |
|
@srowen Yes, exactly! In my case, the tests based on RNG were failing on amplab's jenkins and at the same time were running correctly in my environment, because my version of Spark was 1 day older than the one on jenkins. Indeed, it worth updating to the latest version all the time :) However, it would be great to have notifications about such changes after which one must update. |
|
Yeah the problem was that the seed processing was actually wrong so had to change. I doubt it will change much. But stochastic behavior isn't part of the APIs so you would want to write tests that don't depend on a particular seed anyway. I dont think this is something special to note. |
|
I was thinking about removing the stochastic part from the tests. However, the issue is that I need to test that stochastic initialization of parameters for machine learning does work, i.e. the optimization will converge with such parameters. Could you suggest a better way of doing this as opposed to use the seed? |
https://issues.apache.org/jira/browse/SPARK-10116
This is really trivial, just happened to notice it -- if
XORShiftRandom.hashSeedis really supposed to have random bits throughout (as the comment implies), it needs to do something for the conversion tolong.@mengxr @mkolod