-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First draw from ecuyer1988
leads to bias in certain RNG distributions
#92
Labels
Comments
On Feb 24, 2023, at 18:30, Brian Ward ***@***.***> wrote:
This was first noted in the Stan project in stan-dev/stan#3167 <stan-dev/stan#3167>
When using ecuyer1988 as an engine and a "small" seed (less than about 10,000), the first draw of the poisson and uniform distributions is biased high.
Similar defects with obsolete ancient RNGs have been reported time and time again. In boost, the modern, industrial strength RNG is the MIXMAX.
Along with sequences indistinguishable from true random numbers, attention has been paid to seeding, such that the sequence
obtained from seed(1) starts out perfectly randomized and is completely independent from seed(2), seed(3) etc, including the case if you
want to test the quality by interleaving the two or more streams.
Another defect, reported for obsolete RNGs is the lack of fine resolution. Most or all of the old 32-bit RNG will no produce numbers closer to each other
than about 10^-9, although afaik this issue is supposed to have been taken care of with the use of std::random distributions.
Cheers,
Kostas
=========================================
Institute of Nuclear and Particle Physics
NCSR Demokritos
http://inspirehep.net/author/profile/K.G.Savvidy.1
https://mixmax.hepforge.org
|
For our use case it is important to be able to discard a huge number efficiently, so options like MINMAX are not very suitable for us |
On 2/24/23 12:09, Brian Ward wrote:
For our use case it is important to be able to discard a huge number efficiently, so options like MINMAX are not very suitable for us
Theoretically, most generators support fast discard. It's rarely
implemented for anything that isn't an LCG or LFSR, though, as it
can be quite complex. I never really looked at the mathematical
structure of MIXMAX, so I have no idea whether a fast discard is
possible for it. I did implement optimized discard for mt19937.
|
On 2/24/23 09:30, Brian Ward wrote:
In the Poisson distribution, the behavior seems to go away if you have a rate greater than 10
That's not surprising: poisson_distribution has two independent
implementations with the cutoff being 10.
|
On 2/24/23 11:56, Kostas Savvidis wrote:
> On Feb 24, 2023, at 18:30, Brian Ward ***@***.***> wrote:
> When using ecuyer1988 as an engine and a "small" seed (less than about 10,000), the first draw of the poisson and uniform distributions is biased high.
>
Similar defects with obsolete ancient RNGs have been reported time and time again. In boost, the modern, industrial strength RNG is the MIXMAX.
I agree with Kostas here. ecuyer1988 is not a very high quality PRNG.
On top of that, most older PRNGs provide no distribution guarantees
whatsoever for the usage you're describing.
|
On Sat, 25 Feb 2023 at 1:27 AM, swatanabe ***@***.***> wrote:
On 2/24/23 11:56, Kostas Savvidis wrote:
>
>> On Feb 24, 2023, at 18:30, Brian Ward ***@***.***> wrote:
>> When using ecuyer1988 as an engine and a "small" seed (less than about
10,000), the first draw of the poisson and uniform distributions is biased
high.
>>
>
> Similar defects with obsolete ancient RNGs have been reported time and
time again. In boost, the modern, industrial strength RNG is the MIXMAX.
I agree with Kostas here. ecuyer1988 is not a very high quality PRNG.
On top of that, most older PRNGs provide no distribution guarantees
whatsoever for the usage you're describing.
I suspect that the authors of Stan implemented seeding several parallel
streams/threads (or “chains” in MCMC lingo) using discard. Some while ago
another user of Monte-Carlo had been asking me for arbitrary skip in
MIXMAX. But this is a misunderstanding. When a user needs parallel streams
from MIXMAX they should just seed using the default method from consecutive
seeds. If you need eight streams you can seed them with 1,2,3…8 or with
1001,1002,1003,…1008. No other generator currently gives you that kind of
satety.
Cheers,
Kostas
.
|
This was referenced Jan 18, 2024
3 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This was first noted in the Stan project in stan-dev/stan#3167
When using
ecuyer1988
as an engine and a "small" seed (less than about 10,000), the first draw of thepoisson
anduniform
distributions is biased high. For example, see:The behavior seems to improve as you increase the size of the seed, but is still noticeable even for very large seeds (e.g. those drawn form
uniform(10_000,1_000_000)
. In the Poisson distribution, the behavior seems to go away if you have a rate greater than 10The text was updated successfully, but these errors were encountered: