-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace random number implementations in src/gnu
with std library and random123 gen
#1861
Conversation
Just a thought, would it be beneficial to add tests in a prequel PR? This way we'd have a |
I think that's a good idea. I'm just not sure how to test these random distributions. I was having in mind initializing a |
@nrnhines would you have some ideas? |
Use Random123 as the generator, pick 5 values from each distribution (restart generator for each distribution), and compare with the values prior to this PR. Don't worry much about other generators. mcellran4 may be machine independent but I'm not sure about the ones in the old gnu folder. |
5cb1257
to
cb6ccb1
Compare
Coming back to this PR, what we would like to do before updating the branch and merging it is compare the output of the refactored random number distribution functions of this PR with the output of the legacy functions. |
even in that case it is not so simple. I suggest to look into https://en.wikipedia.org/wiki/Chi-squared_test for example. However, there is no way to have a test that passes 100% of the time. With the standard 95% p we would have a test that fails once every 20 runs which is not ideal for a CI. Is it really necessary to test these libraries? Proving something with statistics is alsways complicated and prone to errors. "there are lies, damn lies and then there is statistics" |
If you want to compute a distance between two distributions, you can use something like a Wasserstein, or Earth movers, distance. If you have two histograms (or sampled distributions) this metric computes the minimum amount of "mass" you need to move in one histogram to get the other histogram. You can then use this as an error. Just like There's a Python implementation readily available for 1D distributions in scipy here: This is a optimal transport library I've used in the past, also deals with the multi-dimensional case: If you're just asking the question: Are these samples from the same distribution? Then Kolmogorov–Smirnov would seem to fit the bill: With regards to the
With a 100 samples, we can almost detect a 1% difference; a 10% difference has a p-value of virtually 0. What we want is that the tests (almost) never fail due to bad sampling luck; but fails reasonably often if the distributions are different. Pick, say, What I don't know is how sensitive these tests are to sampling from the same distribution but with different random number generators. |
what does "almost never fail" mean? Because even if we are using the very same distribution a p value test will fail once every 20 runs. Do you want to run something 100 times, do statistics with the p value 100 times every time there is a CI running? I dunno the sample size but we are talking about 10^{3+} of run tests. |
Spuriously, not once in 100'000 times using the above numbers. |
Thanks a lot @1uc and @cattabiani for the detailed answers and sorry for the delayed reply but I've only seen them now.
|
On our discussion in the latest NEURON dev meeting we concluded the following:
|
Impossible. The results will now depends of the stdlib use (more or less gnu or clang). We will never be able to keep the tests working. Quit. |
Current state of random distribution implementations that need to be changes in NEURON:
Other things that need to be taken care of:
RNG
andRandom
classes are?