-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Flaky test_random.test_shuffle #10277
Comments
@asitstands Could you verify whether 0.01 threshold is too small for checking the uniform distribution? |
@reminisce The probability that the test fails is approximately 0.001 with the 0.01 criterion even if |
@asitstands Can you try feeding a fixed seed to the decorator |
@reminisce Seeding with |
Time consumption for tests is not that important, but I think we should ask another question: If the generation of random numbers takes such a significant amount of time, wouldn't this also harm runtime performance in real applications? |
@marcoabreu In the case of |
I see, thanks for elaborating, this definitely makes sense. I assume we can't improve this with parallelisation, right? |
@marcoabreu The implementation of |
Thanks for this great explanation and the detailed benchmarks, great job! In that case, I'd propose to go with the increase to 40000 samples - we can go even higher if you would like, the slaves have 72 vCPUs available (while I can't judge the GPU performance for this use-case) and since this task is IO-bound, we should not really see that much of an increase in time consumption as all threads can prepare the data in parallel and do something different while they're waiting for the memory controller. What do you think? While pinning the seed makes things more comfortable, we have found quite a lot of issues in the past due to the random seeding. |
As longer running time is allowed, I agree that using 40000 samples would be better than fixing the random seed. More strict tests are better as long as we have available resources for them. In the other hand, I think that 0.01 criterion and 40000 samples are practically enough. More strict test requires much more cost. I repoduced the failure with the reported seed and confirmed that the test is ok with 40000 samples. This test will fail again someday even with the increased number of samples, but the frequency is very low. How do you think @reminisce ? |
I think 1^-6 sounds quite reasonable and the time consumption is still within acceptable boundaries :) @reminisce please review the PR as you have more in-depth knowledge |
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/master/542/pipeline
https://issues.apache.org/jira/browse/MXNET-236
The text was updated successfully, but these errors were encountered: