Random states (C++) #22

scarrazza · 2018-02-08T10:13:45Z

Issue by Zaharid
Friday Jun 23, 2017 at 09:49 GMT
Originally opened as https://github.com/NNPDF/libnnpdf/issues/9

We clearly need more localized random states. I think we should have high level functions (similar to the one in #7) that take a random_state as an argument. This is what sklearn does (e.g. http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC) and works well enough.

These "high level functions" are things like Minimize, GeneratePseudorreplicas or TrainValidSplit.

Ideas?

The text was updated successfully, but these errors were encountered:

Zaharid · 2019-01-18T10:38:30Z

@wilsonmr @tgiani we should have a look at this about now. In particular I am not sure if the current random states for data generation work consistently. We should be able to specify the data seed for two fits be sure that the replicas are the same.

wilsonmr · 2019-01-18T15:07:21Z

Ok so the point is to work out essentially each independent process which uses a random number and have a seperate seed for each random generator and pass the relevant random state to the relevant function?

Zaharid · 2019-01-18T15:20:24Z

We did a half backed thing for the alpha_s studies. The idea is to make it not half backed. Ie, to make sure that given the same data ordering and cuts, and the same seed, one gets the same fluctuations.

Zaharid · 2019-01-18T15:21:54Z

Do git grep dataseed to see about the current implementation.

Zaharid · 2019-01-18T15:23:45Z

I think we essentially want the same randomstate interface that numpy has, and want to make it interact in the same way it does for sklearn.

wilsonmr · 2019-01-21T11:11:38Z

Do we want to be able to produce the same sequence of numbers as numpy given same seed? I guess this would be nice with regards to closure tests etc.

I was playing the the gsl_rng library and if I use gsl_rng_mt19937 generator then every other number matches the numpy one for the same seed up to the 8th decimal place, would that be sufficient? I'm surprised they don't match up to like 16th place since they're both doubles.

I think in theory the rng state which is already implemented will work in the same way as the numpy RandomState although it's just a single long int. I tried using the get_state method on the numpy random state but this returns like 624 long ints and none of them appear to correspond to the one outputted by gsl, which I find confusing

EDIT: nevermind I wasn't outputing the state of the gsl rng

wilsonmr · 2019-01-21T14:14:44Z

So when we run a fit, am I right in thinking there is just once instance of the RNG we just reseed it at various points in the fit?

I was thinking initally that this wouldn't be so hard because we could just have basically a different instance of the RNG for each independent stream of random numbers but then I was messing around in python and it seems the rng instance is created once which then entangles all of the different calls of getRNG()->SetSeed() for example

Zaharid · 2019-01-21T15:00:07Z

Recent versions of the c++ standard have the mersanne twister, so we should use those (and we do in some places of buildmaster).

There are some subtleties when you think about it, particularly regarding vp. See #77, where I didn't really come up with a good idea. I believe than seeing how we organize the fit is easier in that there are only a limited number of configurations we care for (essentially correlated data streams and uncorrelated GA is the interesting one. Correlated everything for point-by-point reproducibility).

I suppose that numpy and friends have a global random instance that they use by default, if something else is not specified. The numpy random functions random.XX are really globalState.XX. This is useful to avoid seeding the state too frequently, or encountering unexpected correlations. I am not sure how much we care, given that the behaviour is going to be hardcoded.

nhartland changed the title ~~Random states~~ Random states (C++) Feb 20, 2018

nhartland mentioned this issue May 3, 2018

Too many issues #187

Closed

Zaharid closed this as completed May 3, 2018

Zaharid reopened this May 3, 2018

Zaharid mentioned this issue Oct 18, 2018

[WIP] Covariance matrices #279

Merged

Zaharid assigned rabah-khalek Oct 18, 2018

Zaharid closed this as completed Mar 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Random states (C++) #22

Random states (C++) #22

scarrazza commented Feb 8, 2018

Zaharid commented Jan 18, 2019

wilsonmr commented Jan 18, 2019

Zaharid commented Jan 18, 2019

Zaharid commented Jan 18, 2019

Zaharid commented Jan 18, 2019

wilsonmr commented Jan 21, 2019 •

edited

Loading

wilsonmr commented Jan 21, 2019

Zaharid commented Jan 21, 2019

Random states (C++) #22

Random states (C++) #22

Comments

scarrazza commented Feb 8, 2018

Zaharid commented Jan 18, 2019

wilsonmr commented Jan 18, 2019

Zaharid commented Jan 18, 2019

Zaharid commented Jan 18, 2019

Zaharid commented Jan 18, 2019

wilsonmr commented Jan 21, 2019 • edited Loading

wilsonmr commented Jan 21, 2019

Zaharid commented Jan 21, 2019

wilsonmr commented Jan 21, 2019 •

edited

Loading