Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace random number implementations in src/gnu with std library and random123 gen #1861

Closed
wants to merge 24 commits into from

Conversation

iomaganaris
Copy link
Member

@iomaganaris iomaganaris commented Jun 21, 2022

Current state of random distribution implementations that need to be changes in NEURON:

  • ACG (random number generator that needs removal?) this is a https://en.cppreference.com/w/cpp/numeric/random/linear_congruential_engine
  • MLCG (random number generator that needs removal?)
  • Isaac64
  • MCellRan4 (Should stay as it is?)
  • Random123 (Should stay as it is?)
  • Uniform
  • DiscUnif
  • Normal
  • LogNormal (need to figure out a way to turn log-variance to log-stddev to give as argument to std::lognormal_distribution)
  • Binomial
  • Poisson
  • Geometric
  • HyperGeometric
  • NegExp
  • Erlang
  • Weibull
  • RandomPlay

Other things that need to be taken care of:

  • Remove temporary classes/functions and add them to the proper ones
  • Maybe refactor the way RNG and Random classes are?
  • Add documentation
  • Add tests
  • Compare old with new implementations

@alexsavulescu
Copy link
Member

Just a thought, would it be beneficial to add tests in a prequel PR? This way we'd have a base to compare against?

@alkino alkino linked an issue Jun 23, 2022 that may be closed by this pull request
15 tasks
@iomaganaris
Copy link
Member Author

Just a thought, would it be beneficial to add tests in a prequel PR? This way we'd have a base to compare against?

I think that's a good idea. I'm just not sure how to test these random distributions. I was having in mind initializing a Random struct like the python scripts I used as tests and then compare different distribution's mean or variance with the expected ones and check whether they are close but this might be fragile and doesn't check the actual distribution. Any suggestion would be more than welcome for this

@alexsavulescu
Copy link
Member

Any suggestion would be more than welcome for this

@nrnhines would you have some ideas?

@nrnhines
Copy link
Member

how to test these random distributions

Use Random123 as the generator, pick 5 values from each distribution (restart generator for each distribution), and compare with the values prior to this PR.
To facilitate comparison, consider using nrn/test/cover/checkresult.py which can create the rand_dist.json file with the prior values and thereafter compare using those.

Don't worry much about other generators. mcellran4 may be machine independent but I'm not sure about the ones in the old gnu folder.

@alkino alkino force-pushed the magkanar/gnu_random123 branch from 5cb1257 to cb6ccb1 Compare June 24, 2022 11:57
@iomaganaris
Copy link
Member Author

Coming back to this PR, what we would like to do before updating the branch and merging it is compare the output of the refactored random number distribution functions of this PR with the output of the legacy functions.
Apart from calling all the functions a few thousand times, creating a histogram and comparing the new and old implementations manually, is there some other way we can automatically check them and create a unit test for that?
I'm also summoning for suggestions some people with more math expertise than me here @1uc @cattabiani

@cattabiani
Copy link
Member

even in that case it is not so simple. I suggest to look into https://en.wikipedia.org/wiki/Chi-squared_test for example. However, there is no way to have a test that passes 100% of the time. With the standard 95% p we would have a test that fails once every 20 runs which is not ideal for a CI. Is it really necessary to test these libraries? Proving something with statistics is alsways complicated and prone to errors. "there are lies, damn lies and then there is statistics"

@1uc
Copy link
Collaborator

1uc commented Sep 1, 2023

If you want to compute a distance between two distributions, you can use something like a Wasserstein, or Earth movers, distance. If you have two histograms (or sampled distributions) this metric computes the minimum amount of "mass" you need to move in one histogram to get the other histogram. You can then use this as an error. Just like abs(approx - exact).

There's a Python implementation readily available for 1D distributions in scipy here:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.wasserstein_distance.html

This is a optimal transport library I've used in the past, also deals with the multi-dimensional case:
https://pythonot.github.io/quickstart.html

If you're just asking the question: Are these samples from the same distribution? Then Kolmogorov–Smirnov would seem to fit the bill:
https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test

With regards to the p = 0.05 problem that @cattabiani mentioned. It's true that we'll probably not be able to write a test that never fails spuriously. However, we want to rule out programming errors. Almost all of them are going to completely change the distribution. Hence, the samples of the two distributions will be very different; and statistical tests will be very certain that the samples are from different distributions. For example:

In [1]: import scipy

In [2]: import numpy as np

In [3]: x = np.random.normal(1.0, 0.1, size=100)

In [4]: y = np.random.normal(1.01, 0.1, size=100)

In [5]: scipy.stats.ks_2samp(x, y)
Out[5]: KstestResult(statistic=0.08, pvalue=0.9084105017744525, statistic_location=0.8547517139777675, statistic_sign=1)

In [6]: y = np.random.normal(1.1, 0.1, size=100)

In [7]: scipy.stats.ks_2samp(x, y)
Out[7]: KstestResult(statistic=0.43, pvalue=1.1151678185620634e-08, statistic_location=1.0632304755992028, statistic_sign=1)

With a 100 samples, we can almost detect a 1% difference; a 10% difference has a p-value of virtually 0.

What we want is that the tests (almost) never fail due to bad sampling luck; but fails reasonably often if the distributions are different. Pick, say, p = 0.001 we'd expect 1:1000 CI runs to fail. If this is still too high, you can automatically rerun and report everything is green if 1:3 is green or 2:3 is green; whatever you prefer. Reasoning behind this is that the programming error you're looking for will cause the p-value to be 1e-6 or similar. They'll fail the test 3 times in a row; almost every time you run CI. Hence, they'll still make CI go red.

What I don't know is how sensitive these tests are to sampling from the same distribution but with different random number generators.

@cattabiani
Copy link
Member

what does "almost never fail" mean? Because even if we are using the very same distribution a p value test will fail once every 20 runs. Do you want to run something 100 times, do statistics with the p value 100 times every time there is a CI running? I dunno the sample size but we are talking about 10^{3+} of run tests.

@1uc
Copy link
Collaborator

1uc commented Sep 1, 2023

Spuriously, not once in 100'000 times using the above numbers.

@iomaganaris
Copy link
Member Author

iomaganaris commented Jan 16, 2024

Thanks a lot @1uc and @cattabiani for the detailed answers and sorry for the delayed reply but I've only seen them now.
I think what's needed to move forward with this PR is then to:

  1. Merge master to this branch.
  2. Clean code (remove some old indirection).
  3. Figure out whether LogNormal, HyperGeometric and RandomPlay are used anywhere (ModelDB) and need to be implemented. IMHO if not we shouldn't care about these in the following steps
  4. Create tests based on Luc's suggestions that test picking numbers for the currently implemented NEURON built-in random distributions from master to some reference implementations from Python or C++
  5. Update this branch to implement the selected distributions using the Random123 random number generator. Most of the implementations of the random number distributions are already there. Supposing that there are no big issues the tests added in step 4. should pass

@iomaganaris
Copy link
Member Author

On our discussion in the latest NEURON dev meeting we concluded the following:

  • All current random number generator implementations should be removed in favor of Random123
  • The current random number generator API should be kept but they should still use Random123 underneath
  • In version 8.x there should be a deprecation message for the random number generators to be removed in version 9 mentioning Random123

@alkino
Copy link
Member

alkino commented Nov 19, 2024

Impossible. The results will now depends of the stdlib use (more or less gnu or clang). We will never be able to keep the tests working.

Quit.

@alkino alkino closed this Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor src/gnu
7 participants