-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve RNG #67
Comments
For distributed simulations, it seems to me that what you want is to be able to deterministically parameterize an entire run by a single seed value. So for parallel computing, we'd want to ensure that you can do that. I'm not sure what the best way to do that is, however. |
Especially interesting in the case of elastic parallelism! |
How would we do parallel number generation? Just use different seeds to start off, and thus use different parts of the sequence of an RNG. The issue with some of these methods is that the series of numbers generated change with a change in the number of processors - not to mention elastic parallelism. |
Based on the thread on the mailing list, I don't think it is a good idea to seed an RNG from /dev/urandom as the default. It should be a seeding option. Also, it would be nice to have an RNG to generate random numbers from /dev/urandom altogether. Random numbers quality testsuite: http://www.iro.umontreal.ca/~simardr/testu01/tu01.html |
There are two use-cases for random number generation:
In the former case, you need to explicitly seed the your random number generator so what the generator automatically seeds itself with is irrelevant. In the latter case, you want the default to be to use a good source of entropy — like /dev/urandom — rather than some crappy default like using the current time. Do you have some better source of entropy that you have in mind for automatically seeding the random number generator when an explicit seed isn't given? Or are you objecting to automatically seeding the generator at all? Personally, if I just want to generate a bunch of random numbers in an ad hoc manner, I really don't want to be bothered to generate a seed — I just want to use it. |
In no system (C, matlab, etc.) is a random number seeded with a random seed of any type by default. -viral On Jun 22, 2011, at 12:48 PM, StefanKarpinski wrote:
|
The counter-argument to the "I just want to use it" here is that "I just don't want to be surprised". Seeding an RNG by default has the property that it surprises you. Even saving the random seed automatically is not good enough, because the same .j file will run differently on a different computer where the saved seed is not available. -viral On Jun 22, 2011, at 12:48 PM, StefanKarpinski wrote:
|
Maybe not in C or Matlab (I checked this), but consider:
All three of the most user-friendly mainstream languages around automatically seed your RNG for you. So the question here is do we emulate C and Matlab or Python, Perl and Ruby. The choice seems pretty clear to me. Honestly, the only viable well-behaved options are:
The latter seems unnecessarily pedantic and annoying. How does seeding an RNG by default cause surprise? |
Ok, I didn't know about perl, ruby and python. Neither of these are serious numerical languages, but at least there is precedent. Seeding an RNG by default differently on every run is surprising because it generates slightly different solutions every time. So, to get predictable answers, I now have to set a seed. When developing, one wants to know if the variations are due to changes made in the code, and possibly a bug, or due to a different stream of random numbers. -viral On Jun 23, 2011, at 12:27 AM, StefanKarpinski wrote:
|
You're right that predictability is needed for debugging, but that can never be achieved automatically --- to reproduce a run you still have to either re-seed manually or restart the system. If you're modifying your code and re-running it without re-seeding, you're going to get different answers every time in every system, unless you restart matlab between runs, which people don't do. If you start N julia processes to do a simulation, and they all start with the same seed, you'll be doing the exact same thing on each processor without realizing it. It might take a while to notice this. It seems this would hurt the naive user. The expert user will manage their own seeding no matter what we do. |
Are you referring to N julia processes started on the command line, or through julia. If they are part of a "Julia parallel machine" or some such thing, then julia will do the right thing. But, yes, if you are just running a bunch of disconnected julias, then one would have to seed. I am getting a little convinced. -viral On Jun 23, 2011, at 7:34 AM, JeffBezanson wrote:
|
Any idea what R does? -viral On Jun 23, 2011, at 12:27 AM, StefanKarpinski wrote:
|
Copying R here is a good idea. Stats people swear by it. Then again, R doesn't have the parallelism issue. |
Yes, that's what I thought too - but just a useful data point. -viral On Jun 23, 2011, at 7:44 AM, JeffBezanson wrote:
|
How portable is /dev/urandom. I believe all commonly used unix-y systems should have it today. Not sure what Android/iOS and embedded systems are like, but we have much more to worry about to target those platforms.. -viral On Jun 23, 2011, at 7:44 AM, JeffBezanson wrote:
|
Well, this page tells us all. Apparently there is also the Entropy Gathering Daemon for computers without /dev/urandom. EGD Sounds like something that might end the universe or something. http://en.wikipedia.org/wiki//dev/random I really think we ought to build the Testu01 testsuite into julia. -viral On Jun 23, 2011, at 7:44 AM, JeffBezanson wrote:
|
Bam:
So R does the same thing as Python, Perl, and Ruby. |
I'm not sure how useful testu01 really would be: it's a test of algorithms, not of implementations as such. We could, however, run our implementation through it since if there's a defect in the implementation, it would probably detect that defect as a deviation from apparent randomness. If it can't detect the defect, one could argue that it's not really a defect since the resulting RNG still appears random. |
One interesting thing I thought about (based on that best practices in bioinformatics paper I sent out) was combining one of the cheap, simple non-linear RNGs like the JKISS one described with MT to get something that has the same period as MT, is slightly more expensive, but is non-linear. But I'm a complete amateur in this area, so this might be an incredibly dumb idea. |
Entropy Gathering Daemon does sound incredibly ominous. I feel like that should be the evil monster in a kids' book. |
The device is universal, but the implementations differ a fair bit across OSes as per the Wikipedia page. Also, people have shown quality of /dev/random to be poor on some platforms. We could go with it anyways, knowing that it is still likely to be better than other options. -viral On Jun 23, 2011, at 8:58 AM, StefanKarpinski wrote:
|
We can implement the best practice on each platform. AFAIK, /dev/urandom is fine on both Linux and OS X. On Windows, we'll use that system random bits API and we can use EGD elsewhere or whatever. |
The RNG is also used by flisp. Is it ok to remove all the random stuff from flisp? I can then move it all out of support and into external. My proposal is to have external/random, which includes mt19937ar.c, mt19937-64.c, dsfmt-2.1, and our wrappers that are in random.c. All of these can then be compiled into librandom.so. rand_double() and rand_float() then will get called through ccall. We can then experiment with various ways to generate random numbers. If rand_double() and rand_float() are called through ccall, rather than as builtins, will this greatly affect performance? It seems that this library approach is much cleaner. |
rand_double and rand_float are already called through ccall. This is the best approach, because the compiler can inline the ccall into callers of rand() and then there's no overhead at all. I like your approach, let's do it. It's fine to remove the random stuff from flisp, I'll do that part. |
commit 24fc7ed enables the use of DSFMT. Closing this one. A new issue can be opened later when we want to use /dev/urandom and save the seed. |
Now we don't have the ability to seed with a source of entropy anymore. Maybe using the 64-bit system time isn't great, but at least it's something. At the very least we need to expose dSFMT's functions for seeding with more bits than 32. |
All still in transition phase. Will get rid of mt19937ar.c and mt19937-64 (64-bit version of mt19937) once everything works. Just need to expose a couple more dSFMT interfaces. Will try to implement /dev/urandom as well. -viral On Jul 9, 2011, at 2:54 AM, JeffBezanson wrote:
|
All of this is done in commit c4aa6c7 -viral On Jul 9, 2011, at 2:54 AM, JeffBezanson wrote:
|
Add a linear eachindex definition
Fix runtests: testset results print by default
Build compiled methods to handle llvmcall
We already use mersenne twister, but SFMT/dSFMT is supposed to be faster:
http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/index.html
We also need better seeding.
The work items are
The text was updated successfully, but these errors were encountered: