improve RNG #67

JeffBezanson · 2011-06-21T05:51:15Z

We already use mersenne twister, but SFMT/dSFMT is supposed to be faster:

http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/index.html

We also need better seeding.

The work items are

Switch to SFMT
Move RNG library to external/, so everything under src/ is our own code.
Be able to seed from /dev/urandom
Save the seed somewhere (wherever it came from) so any run can be reproduced even if you didn't seed manually

StefanKarpinski · 2011-06-21T13:53:56Z

For distributed simulations, it seems to me that what you want is to be able to deterministically parameterize an entire run by a single seed value. So for parallel computing, we'd want to ensure that you can do that. I'm not sure what the best way to do that is, however.

JeffBezanson · 2011-06-21T17:54:04Z

Especially interesting in the case of elastic parallelism!

ViralBShah · 2011-06-22T07:02:34Z

How would we do parallel number generation? Just use different seeds to start off, and thus use different parts of the sequence of an RNG. The issue with some of these methods is that the series of numbers generated change with a change in the number of processors - not to mention elastic parallelism.

ViralBShah · 2011-06-22T07:04:55Z

Based on the thread on the mailing list, I don't think it is a good idea to seed an RNG from /dev/urandom as the default. It should be a seeding option. Also, it would be nice to have an RNG to generate random numbers from /dev/urandom altogether.

Random numbers quality testsuite: http://www.iro.umontreal.ca/~simardr/testu01/tu01.html

StefanKarpinski · 2011-06-22T07:18:17Z

There are two use-cases for random number generation:

I need a deterministic, reproducible, seed-based series of pseudorandom numbers, e.g. for simulation.
I want actual random behavior but have to settle for pseudorandom behavior with a good random seed.

In the former case, you need to explicitly seed the your random number generator so what the generator automatically seeds itself with is irrelevant. In the latter case, you want the default to be to use a good source of entropy — like /dev/urandom — rather than some crappy default like using the current time.

Do you have some better source of entropy that you have in mind for automatically seeding the random number generator when an explicit seed isn't given? Or are you objecting to automatically seeding the generator at all? Personally, if I just want to generate a bunch of random numbers in an ad hoc manner, I really don't want to be bothered to generate a seed — I just want to use it.

ViralBShah · 2011-06-22T08:08:45Z

In no system (C, matlab, etc.) is a random number seeded with a random seed of any type by default.

-viral

On Jun 22, 2011, at 12:48 PM, StefanKarpinski wrote:

There are two use-cases for random number generation:

I need a deterministic, reproducible, seed-based series of pseudorandom numbers, e.g. for simulation.

I want actual random behavior but have to settle for pseudorandom behavior with a good random seed.

In the former case, you need to explicitly seed the your random number generator so what the generator automatically seeds itself with is irrelevant. In the latter case, you want the default to be to use a good source of entropy — like /dev/urandom — rather than some crappy default like using the current time.

Do you have some better source of entropy that you have in mind for automatically seeding the random number generator when an explicit seed isn't given? Or are you objecting to automatically seeding the generator at all? Personally, if I just want to generate a bunch of random numbers in an ad hoc manner, I really don't want to be bothered to generate a seed — I just want to use it.

Reply to this email directly or view it on GitHub:
#67 (comment)

ViralBShah · 2011-06-22T08:14:49Z

The counter-argument to the "I just want to use it" here is that "I just don't want to be surprised". Seeding an RNG by default has the property that it surprises you. Even saving the random seed automatically is not good enough, because the same .j file will run differently on a different computer where the saved seed is not available.

-viral

On Jun 22, 2011, at 12:48 PM, StefanKarpinski wrote:

There are two use-cases for random number generation:

I need a deterministic, reproducible, seed-based series of pseudorandom numbers, e.g. for simulation.

I want actual random behavior but have to settle for pseudorandom behavior with a good random seed.

In the former case, you need to explicitly seed the your random number generator so what the generator automatically seeds itself with is irrelevant. In the latter case, you want the default to be to use a good source of entropy — like /dev/urandom — rather than some crappy default like using the current time.

Do you have some better source of entropy that you have in mind for automatically seeding the random number generator when an explicit seed isn't given? Or are you objecting to automatically seeding the generator at all? Personally, if I just want to generate a bunch of random numbers in an ad hoc manner, I really don't want to be bothered to generate a seed — I just want to use it.

Reply to this email directly or view it on GitHub:
#67 (comment)

StefanKarpinski · 2011-06-22T18:57:41Z

In no system (C, matlab, etc.) is a random number seeded with a random seed of any type by default.

Maybe not in C or Matlab (I checked this), but consider:

$ perl -le 'print rand'
0.481756120528193
$ perl -le 'print rand'
0.305858596301736
$ ruby -le 'print rand'
0.533189391261214
$ ruby -le 'print rand'
0.222633436955373
$ python -c 'import random; print random.random()'
0.177964855511
bash-3.2$ python -c 'import random; print random.random()'
0.191501075101
bash-3.2$

All three of the most user-friendly mainstream languages around automatically seed your RNG for you. So the question here is do we emulate C and Matlab or Python, Perl and Ruby. The choice seems pretty clear to me. Honestly, the only viable well-behaved options are:

Automatically provide a good, random seed (i.e. use /dev/urandom)
Throw an error if random numbers are used without having seeded first

The latter seems unnecessarily pedantic and annoying. How does seeding an RNG by default cause surprise?

ViralBShah · 2011-06-23T01:14:51Z

Ok, I didn't know about perl, ruby and python. Neither of these are serious numerical languages, but at least there is precedent.

Seeding an RNG by default differently on every run is surprising because it generates slightly different solutions every time. So, to get predictable answers, I now have to set a seed. When developing, one wants to know if the variations are due to changes made in the code, and possibly a bug, or due to a different stream of random numbers.

-viral

On Jun 23, 2011, at 12:27 AM, StefanKarpinski wrote:

In no system (C, matlab, etc.) is a random number seeded with a random seed of any type by default.

Maybe not in C or Matlab (I checked this), but consider:
$ perl -le 'print rand'
0.481756120528193
$ perl -le 'print rand'
0.305858596301736
$ ruby -le 'print rand'
0.533189391261214
$ ruby -le 'print rand'
0.222633436955373
$ python -c 'import random; print random.random()'
0.177964855511
bash-3.2$ python -c 'import random; print random.random()'
0.191501075101
bash-3.2$ 
So the question here is do we emulate C and Matlab or Python, Perl and Ruby. The choice seems pretty clear to me. Honestly, the only viable well-behaved options are:

Automatically provide a good, random seed (i.e. use /dev/urandom)

Throw an error if random numbers are used without having seeded first

The latter seems unnecessarily pedantic and annoying. How does seeding an RNG by default cause surprise?

Reply to this email directly or view it on GitHub:
#67 (comment)

JeffBezanson · 2011-06-23T02:04:14Z

You're right that predictability is needed for debugging, but that can never be achieved automatically --- to reproduce a run you still have to either re-seed manually or restart the system. If you're modifying your code and re-running it without re-seeding, you're going to get different answers every time in every system, unless you restart matlab between runs, which people don't do.

If you start N julia processes to do a simulation, and they all start with the same seed, you'll be doing the exact same thing on each processor without realizing it. It might take a while to notice this. It seems this would hurt the naive user. The expert user will manage their own seeding no matter what we do.

ViralBShah · 2011-06-23T02:10:32Z

Are you referring to N julia processes started on the command line, or through julia. If they are part of a "Julia parallel machine" or some such thing, then julia will do the right thing. But, yes, if you are just running a bunch of disconnected julias, then one would have to seed.

I am getting a little convinced.

-viral

On Jun 23, 2011, at 7:34 AM, JeffBezanson wrote:

You're right that predictability is needed for debugging, but that can never be achieved automatically --- to reproduce a run you still have to either re-seed manually or restart the system. If you're modifying your code and re-running it without re-seeding, you're going to get different answers every time in every system, unless you restart matlab between runs, which people don't do.

If you start N julia processes to do a simulation, and they all start with the same seed, you'll be doing the exact same thing on each processor without realizing it. It might take a while to notice this. It seems this would hurt the naive user. The expert user will manage their own seeding no matter what we do.

Reply to this email directly or view it on GitHub:
#67 (comment)

ViralBShah · 2011-06-23T02:11:07Z

Any idea what R does?

-viral

On Jun 23, 2011, at 12:27 AM, StefanKarpinski wrote:

In no system (C, matlab, etc.) is a random number seeded with a random seed of any type by default.

Maybe not in C or Matlab (I checked this), but consider:
$ perl -le 'print rand'
0.481756120528193
$ perl -le 'print rand'
0.305858596301736
$ ruby -le 'print rand'
0.533189391261214
$ ruby -le 'print rand'
0.222633436955373
$ python -c 'import random; print random.random()'
0.177964855511
bash-3.2$ python -c 'import random; print random.random()'
0.191501075101
bash-3.2$ 
So the question here is do we emulate C and Matlab or Python, Perl and Ruby. The choice seems pretty clear to me. Honestly, the only viable well-behaved options are:

Automatically provide a good, random seed (i.e. use /dev/urandom)

Throw an error if random numbers are used without having seeded first

The latter seems unnecessarily pedantic and annoying. How does seeding an RNG by default cause surprise?

Reply to this email directly or view it on GitHub:
#67 (comment)

JeffBezanson · 2011-06-23T02:14:24Z

Copying R here is a good idea. Stats people swear by it. Then again, R doesn't have the parallelism issue.

ViralBShah · 2011-06-23T02:18:32Z

Yes, that's what I thought too - but just a useful data point.

-viral

On Jun 23, 2011, at 7:44 AM, JeffBezanson wrote:

Copying R here is a good idea. Stats people swear by it. Then again, R doesn't have the parallelism issue.

Reply to this email directly or view it on GitHub:
#67 (comment)

ViralBShah · 2011-06-23T02:25:27Z

How portable is /dev/urandom. I believe all commonly used unix-y systems should have it today. Not sure what Android/iOS and embedded systems are like, but we have much more to worry about to target those platforms..

-viral

On Jun 23, 2011, at 7:44 AM, JeffBezanson wrote:

Copying R here is a good idea. Stats people swear by it. Then again, R doesn't have the parallelism issue.

Reply to this email directly or view it on GitHub:
#67 (comment)

ViralBShah · 2011-06-23T02:27:34Z

Well, this page tells us all. Apparently there is also the Entropy Gathering Daemon for computers without /dev/urandom. EGD Sounds like something that might end the universe or something.

http://en.wikipedia.org/wiki//dev/random

I really think we ought to build the Testu01 testsuite into julia.

-viral

On Jun 23, 2011, at 7:44 AM, JeffBezanson wrote:

Copying R here is a good idea. Stats people swear by it. Then again, R doesn't have the parallelism issue.

Reply to this email directly or view it on GitHub:
#67 (comment)

StefanKarpinski · 2011-06-23T03:18:28Z

Bam:

$ echo 'runif(1)' | R
> runif(1)
[1] 0.9168643
> 
$ echo 'runif(1)' | R
> runif(1)
[1] 0.3805377
>

So R does the same thing as Python, Perl, and Ruby.

StefanKarpinski · 2011-06-23T03:28:55Z

/dev/urandom looks pretty damned universal. But that's not really the point: the point is to automatically choose use the best practices instead of making the naive user do something non-default in order to adhere to best practices.

I'm not sure how useful testu01 really would be: it's a test of algorithms, not of implementations as such. We could, however, run our implementation through it since if there's a defect in the implementation, it would probably detect that defect as a deviation from apparent randomness. If it can't detect the defect, one could argue that it's not really a defect since the resulting RNG still appears random.

StefanKarpinski · 2011-06-23T03:33:30Z

One interesting thing I thought about (based on that best practices in bioinformatics paper I sent out) was combining one of the cheap, simple non-linear RNGs like the JKISS one described with MT to get something that has the same period as MT, is slightly more expensive, but is non-linear. But I'm a complete amateur in this area, so this might be an incredibly dumb idea.

StefanKarpinski · 2011-06-23T03:37:24Z

Apparently there is also the Entropy Gathering Daemon for computers without /dev/urandom. EGD Sounds like something that might end the universe or something.

Entropy Gathering Daemon does sound incredibly ominous. I feel like that should be the evil monster in a kids' book.

ViralBShah · 2011-06-23T14:53:00Z

The device is universal, but the implementations differ a fair bit across OSes as per the Wikipedia page. Also, people have shown quality of /dev/random to be poor on some platforms. We could go with it anyways, knowing that it is still likely to be better than other options.

-viral

On Jun 23, 2011, at 8:58 AM, StefanKarpinski wrote:

/dev/urandom looks pretty damned universal. But that's not really the point: the point is to automatically choose use the best practices instead of making the naive user do something non-default in order to adhere to best practices.

I'm not sure how useful testu01 really would be: it's a test of algorithms, not of implementations as such. We could, however, run our implementation through it since if there's a defect in the implementation, it would probably detect that defect as a deviation from apparent randomness. If it can't detect the defect, one could argue that it's not really a defect since the resulting RNG still appears random.

Reply to this email directly or view it on GitHub:
#67 (comment)

StefanKarpinski · 2011-06-23T15:55:21Z

We can implement the best practice on each platform. AFAIK, /dev/urandom is fine on both Linux and OS X. On Windows, we'll use that system random bits API and we can use EGD elsewhere or whatever.

ViralBShah · 2011-07-02T17:35:16Z

The RNG is also used by flisp. Is it ok to remove all the random stuff from flisp? I can then move it all out of support and into external. My proposal is to have external/random, which includes mt19937ar.c, mt19937-64.c, dsfmt-2.1, and our wrappers that are in random.c. All of these can then be compiled into librandom.so. rand_double() and rand_float() then will get called through ccall.

We can then experiment with various ways to generate random numbers.

If rand_double() and rand_float() are called through ccall, rather than as builtins, will this greatly affect performance? It seems that this library approach is much cleaner.

JeffBezanson · 2011-07-02T17:39:36Z

rand_double and rand_float are already called through ccall. This is the best approach, because the compiler can inline the ccall into callers of rand() and then there's no overhead at all.

I like your approach, let's do it. It's fine to remove the random stuff from flisp, I'll do that part.

Addresses issue #67

ViralBShah · 2011-07-08T13:58:37Z

commit 24fc7ed enables the use of DSFMT. Closing this one. A new issue can be opened later when we want to use /dev/urandom and save the seed.

JeffBezanson · 2011-07-08T21:24:55Z

Now we don't have the ability to seed with a source of entropy anymore. Maybe using the 64-bit system time isn't great, but at least it's something. At the very least we need to expose dSFMT's functions for seeding with more bits than 32.
Also I see that mt19937ar.c is still there; is it used for anything? And what is mt19937-64?

ViralBShah · 2011-07-09T08:37:20Z

All still in transition phase. Will get rid of mt19937ar.c and mt19937-64 (64-bit version of mt19937) once everything works. Just need to expose a couple more dSFMT interfaces. Will try to implement /dev/urandom as well.

-viral

On Jul 9, 2011, at 2:54 AM, JeffBezanson wrote:

Now we don't have the ability to seed with a source of entropy anymore. Maybe using the 64-bit system time isn't great, but at least it's something. At the very least we need to expose dSFMT's functions for seeding with more bits than 32.
Also I see that mt19937ar.c is still there; is it used for anything? And what is mt19937-64?

Reply to this email directly or view it on GitHub:
#67 (comment)

ViralBShah · 2011-07-09T10:28:28Z

All of this is done in commit c4aa6c7

-viral

On Jul 9, 2011, at 2:54 AM, JeffBezanson wrote:

Now we don't have the ability to seed with a source of entropy anymore. Maybe using the 64-bit system time isn't great, but at least it's something. At the very least we need to expose dSFMT's functions for seeding with more bits than 32.
Also I see that mt19937ar.c is still there; is it used for anything? And what is mt19937-64?

Reply to this email directly or view it on GitHub:
#67 (comment)

Add a linear eachindex definition

Fix runtests: testset results print by default

Build compiled methods to handle llvmcall

ViralBShah added a commit that referenced this issue Jun 29, 2011

Build dSFMT (issue #67)

1782442

ghost assigned ViralBShah Jul 2, 2011

ViralBShah added a commit that referenced this issue Jul 3, 2011

Move random number generation from src/support to external/random

d181c0c

Addresses issue #67

ViralBShah closed this as completed Jul 8, 2011

ViralBShah mentioned this issue Mar 19, 2012

WIP: libuv rewrite and Windows port #521

Merged

rfourquet mentioned this issue Oct 17, 2017

make MersenneTwister() randomly seeded #16984

Merged

StefanKarpinski pushed a commit that referenced this issue Feb 8, 2018

Merge pull request #67 from mbauman/eachindex

ce27e68

Add a linear eachindex definition

cmcaine pushed a commit to cmcaine/julia that referenced this issue Sep 24, 2020

Merge pull request JuliaLang#67 from exercism/fix-runtests

eae6e73

Fix runtests: testset results print by default

Keno pushed a commit that referenced this issue Oct 9, 2023

Merge pull request #67 from JuliaDebug/teh/llvmcall

8ee5829

Build compiled methods to handle llvmcall

ehehela mentioned this issue Sep 26, 2024

EXCEPTION_ACCESS_VIOLATION at 0x79e26d8 #55883

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve RNG #67

improve RNG #67

JeffBezanson commented Jun 21, 2011

StefanKarpinski commented Jun 21, 2011

JeffBezanson commented Jun 21, 2011

ViralBShah commented Jun 22, 2011

ViralBShah commented Jun 22, 2011

StefanKarpinski commented Jun 22, 2011

ViralBShah commented Jun 22, 2011

ViralBShah commented Jun 22, 2011

StefanKarpinski commented Jun 22, 2011

ViralBShah commented Jun 23, 2011

JeffBezanson commented Jun 23, 2011

ViralBShah commented Jun 23, 2011

ViralBShah commented Jun 23, 2011

JeffBezanson commented Jun 23, 2011

ViralBShah commented Jun 23, 2011

ViralBShah commented Jun 23, 2011

ViralBShah commented Jun 23, 2011

StefanKarpinski commented Jun 23, 2011

StefanKarpinski commented Jun 23, 2011

StefanKarpinski commented Jun 23, 2011

StefanKarpinski commented Jun 23, 2011

ViralBShah commented Jun 23, 2011

StefanKarpinski commented Jun 23, 2011

ViralBShah commented Jul 2, 2011

JeffBezanson commented Jul 2, 2011

ViralBShah commented Jul 8, 2011

JeffBezanson commented Jul 8, 2011

ViralBShah commented Jul 9, 2011

ViralBShah commented Jul 9, 2011

improve RNG #67

improve RNG #67

Comments

JeffBezanson commented Jun 21, 2011

StefanKarpinski commented Jun 21, 2011

JeffBezanson commented Jun 21, 2011

ViralBShah commented Jun 22, 2011

ViralBShah commented Jun 22, 2011

StefanKarpinski commented Jun 22, 2011

ViralBShah commented Jun 22, 2011

ViralBShah commented Jun 22, 2011

StefanKarpinski commented Jun 22, 2011

ViralBShah commented Jun 23, 2011

JeffBezanson commented Jun 23, 2011

ViralBShah commented Jun 23, 2011

ViralBShah commented Jun 23, 2011

JeffBezanson commented Jun 23, 2011

ViralBShah commented Jun 23, 2011

ViralBShah commented Jun 23, 2011

ViralBShah commented Jun 23, 2011

StefanKarpinski commented Jun 23, 2011

StefanKarpinski commented Jun 23, 2011

StefanKarpinski commented Jun 23, 2011

StefanKarpinski commented Jun 23, 2011

ViralBShah commented Jun 23, 2011

StefanKarpinski commented Jun 23, 2011

ViralBShah commented Jul 2, 2011

JeffBezanson commented Jul 2, 2011

ViralBShah commented Jul 8, 2011

JeffBezanson commented Jul 8, 2011

ViralBShah commented Jul 9, 2011

ViralBShah commented Jul 9, 2011