Replace random number implementations in `src/gnu` with std library and random123 gen #1861

iomaganaris · 2022-06-21T15:42:24Z

Current state of random distribution implementations that need to be changes in NEURON:

Other things that need to be taken care of:

Remove temporary classes/functions and add them to the proper ones
Maybe refactor the way RNG and Random classes are?
Add documentation
Add tests
Compare old with new implementations

alexsavulescu · 2022-06-22T21:42:26Z

Just a thought, would it be beneficial to add tests in a prequel PR? This way we'd have a base to compare against?

iomaganaris · 2022-06-23T10:57:24Z

Just a thought, would it be beneficial to add tests in a prequel PR? This way we'd have a base to compare against?

I think that's a good idea. I'm just not sure how to test these random distributions. I was having in mind initializing a Random struct like the python scripts I used as tests and then compare different distribution's mean or variance with the expected ones and check whether they are close but this might be fragile and doesn't check the actual distribution. Any suggestion would be more than welcome for this

alexsavulescu · 2022-06-23T11:06:24Z

Any suggestion would be more than welcome for this

@nrnhines would you have some ideas?

nrnhines · 2022-06-23T14:35:55Z

how to test these random distributions

Use Random123 as the generator, pick 5 values from each distribution (restart generator for each distribution), and compare with the values prior to this PR.
To facilitate comparison, consider using nrn/test/cover/checkresult.py which can create the rand_dist.json file with the prior values and thereafter compare using those.

Don't worry much about other generators. mcellran4 may be machine independent but I'm not sure about the ones in the old gnu folder.

iomaganaris · 2023-08-31T12:04:15Z

Coming back to this PR, what we would like to do before updating the branch and merging it is compare the output of the refactored random number distribution functions of this PR with the output of the legacy functions.
Apart from calling all the functions a few thousand times, creating a histogram and comparing the new and old implementations manually, is there some other way we can automatically check them and create a unit test for that?
I'm also summoning for suggestions some people with more math expertise than me here @1uc @cattabiani

cattabiani · 2023-08-31T13:16:30Z

even in that case it is not so simple. I suggest to look into https://en.wikipedia.org/wiki/Chi-squared_test for example. However, there is no way to have a test that passes 100% of the time. With the standard 95% p we would have a test that fails once every 20 runs which is not ideal for a CI. Is it really necessary to test these libraries? Proving something with statistics is alsways complicated and prone to errors. "there are lies, damn lies and then there is statistics"

1uc · 2023-09-01T12:09:17Z

If you want to compute a distance between two distributions, you can use something like a Wasserstein, or Earth movers, distance. If you have two histograms (or sampled distributions) this metric computes the minimum amount of "mass" you need to move in one histogram to get the other histogram. You can then use this as an error. Just like abs(approx - exact).

There's a Python implementation readily available for 1D distributions in scipy here:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.wasserstein_distance.html

This is a optimal transport library I've used in the past, also deals with the multi-dimensional case:
https://pythonot.github.io/quickstart.html

If you're just asking the question: Are these samples from the same distribution? Then Kolmogorov–Smirnov would seem to fit the bill:
https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test

With regards to the p = 0.05 problem that @cattabiani mentioned. It's true that we'll probably not be able to write a test that never fails spuriously. However, we want to rule out programming errors. Almost all of them are going to completely change the distribution. Hence, the samples of the two distributions will be very different; and statistical tests will be very certain that the samples are from different distributions. For example:

In [1]: import scipy

In [2]: import numpy as np

In [3]: x = np.random.normal(1.0, 0.1, size=100)

In [4]: y = np.random.normal(1.01, 0.1, size=100)

In [5]: scipy.stats.ks_2samp(x, y)
Out[5]: KstestResult(statistic=0.08, pvalue=0.9084105017744525, statistic_location=0.8547517139777675, statistic_sign=1)

In [6]: y = np.random.normal(1.1, 0.1, size=100)

In [7]: scipy.stats.ks_2samp(x, y)
Out[7]: KstestResult(statistic=0.43, pvalue=1.1151678185620634e-08, statistic_location=1.0632304755992028, statistic_sign=1)

With a 100 samples, we can almost detect a 1% difference; a 10% difference has a p-value of virtually 0.

What we want is that the tests (almost) never fail due to bad sampling luck; but fails reasonably often if the distributions are different. Pick, say, p = 0.001 we'd expect 1:1000 CI runs to fail. If this is still too high, you can automatically rerun and report everything is green if 1:3 is green or 2:3 is green; whatever you prefer. Reasoning behind this is that the programming error you're looking for will cause the p-value to be 1e-6 or similar. They'll fail the test 3 times in a row; almost every time you run CI. Hence, they'll still make CI go red.

What I don't know is how sensitive these tests are to sampling from the same distribution but with different random number generators.

cattabiani · 2023-09-01T12:28:17Z

what does "almost never fail" mean? Because even if we are using the very same distribution a p value test will fail once every 20 runs. Do you want to run something 100 times, do statistics with the p value 100 times every time there is a CI running? I dunno the sample size but we are talking about 10^{3+} of run tests.

1uc · 2023-09-01T13:11:45Z

Spuriously, not once in 100'000 times using the above numbers.

iomaganaris · 2024-01-16T15:44:49Z

Thanks a lot @1uc and @cattabiani for the detailed answers and sorry for the delayed reply but I've only seen them now.
I think what's needed to move forward with this PR is then to:

Merge master to this branch.
Clean code (remove some old indirection).
Figure out whether LogNormal, HyperGeometric and RandomPlay are used anywhere (ModelDB) and need to be implemented. IMHO if not we shouldn't care about these in the following steps
Create tests based on Luc's suggestions that test picking numbers for the currently implemented NEURON built-in random distributions from master to some reference implementations from Python or C++
Update this branch to implement the selected distributions using the Random123 random number generator. Most of the implementations of the random number distributions are already there. Supposing that there are no big issues the tests added in step 4. should pass

iomaganaris · 2024-01-22T11:08:44Z

On our discussion in the latest NEURON dev meeting we concluded the following:

All current random number generator implementations should be removed in favor of Random123
The current random number generator API should be kept but they should still use Random123 underneath
In version 8.x there should be a deprecation message for the random number generators to be removed in version 9 mentioning Random123

alkino · 2024-11-19T14:11:39Z

Impossible. The results will now depends of the stdlib use (more or less gnu or clang). We will never be able to keep the tests working.

Quit.

iomaganaris added 5 commits June 20, 2022 20:15

Initial work for binomial random123

cc2b32a

Working Binomial

2831dbc

Added Normal random123

99851e9

Reverted changes in hh.mod

453611b

Small cleanup of Binomial_random123.h

c1e9fb6

olupton added the hackron2022 label Jun 21, 2022

iomaganaris added 11 commits June 21, 2022 19:00

Added poisson random123

2f72136

Added Uniform and Discrete Uniform random123

8543b9b

Added Weibull

518bbfb

Trying to fix Erlang

5005791

Backup test for different random distributions

9746fd3

Fixing Erlang

c1379d7

Added test for plotting of master version

0fc831f

Added negexp

9057d63

Added Geom

d58d6d1

Fixed normal variance

b8a75e9

Fixed LogNormal

4322a1e

alkino linked an issue Jun 23, 2022 that may be closed by this pull request

Refactor src/gnu #1330

Closed

15 tasks

Nicolas Cornu added 2 commits June 23, 2022 16:31

Merge remote-tracking branch 'origin/master'

c112b36

Fix format of python files

c644715

Nicolas Cornu added 6 commits June 23, 2022 19:42

Simplify code

8afb4d4

Merge branch 'master' into magkanar/gnu_random123

61e3304

clang-format

86a596c

Remove useless exposition?

2ea965e

Clean old gnu random distributions

ad99e9e

Remove previous implem

cb6ccb1

alkino force-pushed the magkanar/gnu_random123 branch from 5cb1257 to cb6ccb1 Compare June 24, 2022 11:57

alkino removed the hackron2022 label Jan 16, 2024

alkino closed this Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace random number implementations in `src/gnu` with std library and random123 gen #1861

Replace random number implementations in `src/gnu` with std library and random123 gen #1861

iomaganaris commented Jun 21, 2022 •

edited

Loading

alexsavulescu commented Jun 22, 2022

iomaganaris commented Jun 23, 2022

alexsavulescu commented Jun 23, 2022

nrnhines commented Jun 23, 2022

iomaganaris commented Aug 31, 2023

cattabiani commented Aug 31, 2023

1uc commented Sep 1, 2023 •

edited

Loading

cattabiani commented Sep 1, 2023

1uc commented Sep 1, 2023

iomaganaris commented Jan 16, 2024 •

edited by alkino

Loading

iomaganaris commented Jan 22, 2024

alkino commented Nov 19, 2024

Replace random number implementations in src/gnu with std library and random123 gen #1861

Replace random number implementations in src/gnu with std library and random123 gen #1861

Conversation

iomaganaris commented Jun 21, 2022 • edited Loading

alexsavulescu commented Jun 22, 2022

iomaganaris commented Jun 23, 2022

alexsavulescu commented Jun 23, 2022

nrnhines commented Jun 23, 2022

iomaganaris commented Aug 31, 2023

cattabiani commented Aug 31, 2023

1uc commented Sep 1, 2023 • edited Loading

cattabiani commented Sep 1, 2023

1uc commented Sep 1, 2023

iomaganaris commented Jan 16, 2024 • edited by alkino Loading

iomaganaris commented Jan 22, 2024

alkino commented Nov 19, 2024

Replace random number implementations in `src/gnu` with std library and random123 gen #1861

Replace random number implementations in `src/gnu` with std library and random123 gen #1861

iomaganaris commented Jun 21, 2022 •

edited

Loading

1uc commented Sep 1, 2023 •

edited

Loading

iomaganaris commented Jan 16, 2024 •

edited by alkino

Loading