-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random states (C++) #22
Comments
Ok so the point is to work out essentially each independent process which uses a random number and have a seperate seed for each random generator and pass the relevant random state to the relevant function? |
We did a half backed thing for the alpha_s studies. The idea is to make it not half backed. Ie, to make sure that given the same data ordering and cuts, and the same seed, one gets the same fluctuations. |
Do |
I think we essentially want the same randomstate interface that numpy has, and want to make it interact in the same way it does for sklearn. |
Do we want to be able to produce the same sequence of numbers as numpy given same seed? I guess this would be nice with regards to closure tests etc. I was playing the the gsl_rng library and if I use I think in theory the rng state which is already implemented will work in the same way as the numpy EDIT: nevermind I wasn't outputing the state of the gsl rng |
So when we run a fit, am I right in thinking there is just once instance of the RNG we just reseed it at various points in the fit? I was thinking initally that this wouldn't be so hard because we could just have basically a different instance of the RNG for each independent stream of random numbers but then I was messing around in python and it seems the rng instance is created once which then entangles all of the different calls of |
Recent versions of the c++ standard have the mersanne twister, so we should use those (and we do in some places of buildmaster). There are some subtleties when you think about it, particularly regarding vp. See #77, where I didn't really come up with a good idea. I believe than seeing how we organize the fit is easier in that there are only a limited number of configurations we care for (essentially correlated data streams and uncorrelated GA is the interesting one. Correlated everything for point-by-point reproducibility). I suppose that numpy and friends have a global random instance that they use by default, if something else is not specified. The numpy random functions random.XX are really globalState.XX. This is useful to avoid seeding the state too frequently, or encountering unexpected correlations. I am not sure how much we care, given that the behaviour is going to be hardcoded. |
Friday Jun 23, 2017 at 09:49 GMT
Originally opened as https://github.com/NNPDF/libnnpdf/issues/9
We clearly need more localized random states. I think we should have high level functions (similar to the one in #7) that take a
random_state
as an argument. This is what sklearn does (e.g. http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC) and works well enough.These "high level functions" are things like
Minimize
,GeneratePseudorreplicas
orTrainValidSplit
.Ideas?
The text was updated successfully, but these errors were encountered: