`create_rng` with chain=0 appears to return biased first draw #3167

cescalara · 2023-02-23T19:36:00Z

Summary:

When using poisson_rng inside the transformed parameter block, the results do not follow the expected Poisson distribution. The issue only seems to appear for rate values of lambda < 10, e.g. poisson_rng(9.9). The issue only appears inside the transformed data block, and using poisson_rng inside the generated quantities block is fine.

Description:

If I compile and run the Stan code shown below as a simulation (i.e. fixed_param=True) via cmdstanpy I get very different output depending on which block the function is called. When calling poisson_rng in the transformed data block, the output doesn't make sense. I also add the python code I use and a plot demonstrating this.

Stan code: poisson.stan

data {

    real Nex;

}

transformed data {

    int N_td;

    N_td = poisson_rng(Nex);

}

generated quantities {

    int N_out_td;
    int N_gq;

    N_out_td = N_td; 
    N_gq = poisson_rng(Nex);

}

python code:

import numpy as np
from scipy import stats
from matplotlib import pyplot as plt
from cmdstanpy import CmdStanModel

poisson_model = "poisson.stan"
model = CmdStanModel(stan_file=poisson_model)

data = {"Nex": 9.9}
output_N_td = []
output_N_gq = []

for _ in range(100):
    seed = np.random.randint(1, 10000)
    sim = model.sample(data=data, iter_sampling=1, chains=1, fixed_param=True, seed=seed)
    N_td = sim.stan_variable("N_out_td")[0]
    N_gq = sim.stan_variable("N_gq")[0]
    output_N_td.append(N_td)
    output_N_gq.append(N_gq)

fig, ax = plt.subplots()
max_x = 30
x = np.arange(max_x)
bins = np.linspace(0, max_x, max_x+1)-0.5
ax.hist(output_N_td, bins=bins, density=True, label="Output from transformed data", alpha=0.5)
ax.hist(output_N_gq, bins=bins, density=True, label="Output from generated quantities", alpha=0.5)
ax.scatter(x, stats.poisson(data["Nex"]).pmf(x), color="k", label="Poisson PMF")
ax.set_title("Histograms of 100 samples, Nex=9.9")
ax.legend()
fig.savefig("figures/stan_poisson_db.png")

Figure:

Additional Information:

I often use poisson_rng in the transformed data block to implement a simulation of mock data from a Poisson process.

Current Version:

INSTALLED VERSIONS
---------------------
python: 3.9.12 (main, Jun  1 2022, 11:38:51) 
[GCC 7.5.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-135-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
cmdstan_folder: /home/iwsatlas1/capel/.cmdstan/cmdstan-2.31.0
cmdstan: (2, 31)
cmdstanpy: 1.1.0
pandas: 1.4.3
xarray: 2022.6.0
tqdm: 4.64.0
numpy: 1.21.5

I'm curious if others are able to reproduce this, or I am missing something. In case this is the wrong place to post the issue, I can also move it elsewhere. Thanks a lot!

The text was updated successfully, but these errors were encountered:

WardBrian · 2023-02-23T19:57:25Z

Very odd. I can confirm I can reproduce the behavior you see.

Investigating now

WardBrian · 2023-02-23T20:11:51Z

Hm, inserting an extra rng call in the tranformed data block changes the behavior:

data {
  real Nex;
}
transformed data {
  real ignored = uniform_rng(0,1);

  int N_td;

  N_td = poisson_rng(Nex);
}
generated quantities {
  int N_out_td;
  int N_gq;

  N_out_td = N_td;
  N_gq = poisson_rng(Nex);
}

I suspect the bug you've found is not in cmdstanpy, but somewhere deeper in Stan. I will transfer this issue when I figure out where it should go!

WardBrian · 2023-02-23T20:58:12Z

Looking at the code now, it seems we

use a different RNG (with the same seed) inside the model constructor as everywhere else
do not supply the chain ID to advance the RNG in the constructor, so all chains would have the same RNG in the transformed data block (we just pass chain id 0).

Both of those seem like potential issues, but neither really explain whatever is going on here. (Edit: 2 is intentional: https://discourse.mc-stan.org/t/transformed-data-rng-varies-by-chain/569)

I've also confirmed that setting the RNG to any other chain (by editing poison.hpp) fixes the problem, and setting id=0 breaks the RNG for generated quantities as well.

Very odd!

WardBrian · 2023-02-23T21:02:49Z

Oh also, it has nothing to do with poisson - here's the same observation using uniform_rng(0,1):

cescalara · 2023-02-23T21:23:05Z

Thanks for looking into this! Happy to help out if I can. I thought it was particularly weird that setting lambda>10 seemed to fix things...

WardBrian · 2023-02-23T21:36:04Z

It seems like this issue may be even below Stan, but it's possible we can fix it ourselves.

Here's the equivalent of what we're doing in pure C++ with Boost: https://godbolt.org/z/h98Gbdf9j

If you change out seeds, you'll see that the first value printed is always very high, but the following ones look normal.

Very odd! We're still looking into this but we will definitely make sure something is changed for the next version of Stan! For now, inserting a dummy real = uniform_rng(0,1) call seems to fix subsequent things.

WardBrian · 2023-02-23T21:38:32Z

I've transferred this to the Stan repository because this is most likely where it will be fixed, by editing https://github.com/stan-dev/stan/blob/develop/src/stan/services/util/create_rng.hpp

ahartikainen · 2023-02-23T21:48:42Z

If you change out seeds, you'll see that the first value printed is always very high, but the following ones look normal.

In Stan or in C++?

WardBrian · 2023-02-23T21:51:07Z

Just in C++. The godbolt link above shows what I'm describing using only the boost parts we use in Stan.

ahartikainen · 2023-02-23T21:53:58Z

Just in C++. The godbolt link above shows what I'm describing using only the boost parts we use in Stan.

Are you sure? I did not see anything peculiar?

WardBrian · 2023-02-23T21:57:58Z

If you vary the seed, you'll see things like:

Seed: 233

Program stdout

12
2
2

The first number printed is usually >10 regardless of seed, which is quite unlikely for poission(3). Furthermore, if I change the code so that there is always an rng.discard(1) call, this behavior vanishes

WardBrian · 2023-02-23T22:08:06Z

Hmm, the behavior is even weirder, and it seems to also depend on small seeds.

The original python code used a random number between 1 and 10,000 as a seed. If you make this much bigger (I used seed = np.random.randint(10000, 10000000)) then the graphs start looking correct.

ahartikainen · 2023-02-23T22:20:28Z

Ok, now I see it

WardBrian · 2023-02-24T15:22:27Z

Here's another godbolt which shows the issue more precisely:
https://godbolt.org/z/e9veEn5T5

bob-carpenter · 2023-02-24T16:05:49Z

Could you also file an issue with Boost? I know we'll need to work around this in the short term, but it's probably something they want to know, too, as giving things small seeds by hand is a very common practice. My guess is that it's a bad interaction between the linear congruential pRNG we use and the implementation of the Poisson RNG.

WardBrian · 2023-02-24T16:14:53Z

It doesn't seem to be unique to Poisson, here is the same code but with a uniform(0,1) rng: https://godbolt.org/z/zfT579dox
The first draw is always close to 1, the second draw appears to be from the correct distribution.

That said, it also isn't universal - beta and normal both look correct in both situations.

bob-carpenter · 2023-02-24T16:26:11Z

That makes sense. For our pRNG, there's an underlying 256-bit representation of the state of the RNG. I don't know how state is initialized from a 32-bit uint seed, but my guess is that a small number will somehow fail to get a good 256-bit initialization for the way the pRNGs are implemented for Poisson, uniform, etc.

Also, I had no idea C++11 had a lot of this implemented. For example,

https://en.cppreference.com/w/cpp/numeric/random/poisson_distribution

WardBrian · 2023-02-24T16:32:10Z

Boost issue opened.

The easiest work-around for us seems to be just always discarding at least 1 draw when we create the RNG. If we want to maximize backwards compatibility of seeds, we can add 1 to the discard amount if and only if the chain ID is 0, otherwise we can just unconditionally add 1 and state in the release notes that random behavior will have changed this version for seeded runs.

bob-carpenter · 2023-02-24T16:42:26Z

@mitzimorris said this issue may have been introduced relatively recently when we changed the default ID of the first chain from 0 to 1 (to avoid having to advance when not required, but it looks like it was required!).

WardBrian · 2023-02-24T16:45:01Z

I think changing the default ID to 1 actually made this bug less common. Now it is only apparently in two scenarios

The user manually sets the ID to 0 and then runs something like standalone generated quantities with a poisson or uniform variable
The constructor of the model class hard-codes the chain ID to 0, so the transformed data block will see this for those distributions

bob-carpenter · 2023-02-24T16:47:20Z

Sorry, I got that backwards. It changed from 1 to 0.

bob-carpenter · 2023-02-24T16:48:56Z

Also, thanks much for reporting, @cescalara. We're all still marveling at how you managed to discover this bug given how subtle and seed-dependent it is.

WardBrian · 2023-02-24T16:51:19Z

Looking through the history, it used to discard DISCARD_STRIDE * (chain - 1), until this PR in 2017: #2313, when it just discarded chain as it does now. The multi-chain PR #3033 edited this file while in progress but ultimately kept this behavior.

I believe the model class has used chain id 0 since the era of that 2017 PR, so the behavior would have been noticeable as long ago as that.

rok-cesnovar · 2023-02-24T17:23:53Z

Until we get a boost bugfix, my vote would be to discard std::max(1, DISCARD_STRIDE * chain) and announce it in release notes and everywhere else. Once/if boost does a bugfix, we then switch back to what we currently have.

The only drawback (at least that I see) is backward compatibility for anyone using chain ID = 0. But anyone using that would have been using a biased rng and there is no reason to leave a bug in for backward compatiblity.

kotika · 2023-02-24T20:24:48Z

Noone should be using ecuyer1988 RNG as it is obsolete. The replacement
boost::ecuyer1988 -> boost::random::mixmax
should solve your issues.
MIXMAX is a modern, industrial strength RNG. It has been made the default in major physics Monte-Carlo packages, GEANT4 and CMS-SW software.
https://github.com/Geant4/geant4/blob/master/source/externals/clhep/src/MixMaxRng.cc
cms-sw/cmssw@e79b467

Along with sequences indistinguishable statistically from true random numbers, attention has been paid to seeding, such that the sequence
obtained from seed(1) starts out perfectly randomized and is completely independent from seed(2), seed(3) etc, including the case if you
want to test the quality by interleaving the two or more streams.

WardBrian · 2023-02-24T20:43:43Z

It appears MIXMAX defines discard as

void discard(boost::uint64_t nsteps) { for(boost::uint64_t j = 0; j < nsteps; ++j)  (*this)(); } ///< discard n steps, required in boost::random

We regularly call discard with an argument on the order of 2^50, so this wouldn’t exactly be a drop in replacement for our use case and would require reconsidering how we use the random number generators

kotika · 2023-02-24T22:34:43Z

@WardBrian , @bob-carpenter
Most RNG work on a single trajectory, where seeding can place the state of the RNG at a "random" point.
The RNG called ecuyer1988 has the period of about 10^18, so advancing on it for 2^50 steps would exhaust the period and create the danger that random numbers simply repeat between different draws. Moreover, it is certain that with a generator like ecuyer1988 will produce identical stream of random numbers from pairs of completely different seeds (with a lag of e.g. 1).

In MIXMAX, the trajectory is astronomically long and
the state of seed(k+1) is the state of seed(k) advanced by exactly 2^512 steps. Perhaps if I understood what task you accomplish with discard, I could find some solution for you.

WardBrian · 2023-02-27T15:02:42Z

The RNG called ecuyer1988 has the period of about 10^18

Table 32.6 in the Boost documentation claims ecuyer1988 has a period of approximately 2^61

Your comment boostorg/random#92 (comment) is correct about why we discard, it is for parallel MCMC chains. We currently create each chain with the same seed but discard (2^50 * chain_id) elements.

The behavior you're describing for MIXMAX sounds like something we could reasonably use. If I'm understanding correctly we could then seed each chain with (user_supplied_seed + chain_id)

WardBrian transferred this issue from stan-dev/cmdstanpy Feb 23, 2023

WardBrian changed the title ~~Unexpected behaviour when using poisson_rng in transformed data~~ create_rng with chain=0 appears to return biased first draw Feb 23, 2023

WardBrian mentioned this issue Feb 24, 2023

First draw from ecuyer1988 leads to bias in certain RNG distributions boostorg/random#92

Closed

WardBrian mentioned this issue Feb 24, 2023

Always discard at least 1 entry in new RNGs #3168

Merged

3 tasks

bob-carpenter closed this as completed in #3168 Feb 24, 2023

WardBrian mentioned this issue Feb 28, 2023

Experiment: Switch RNG from ecuyer1988 to mixmax #3169

Closed

3 tasks

WardBrian mentioned this issue Mar 30, 2023

Breaking change: Implement param_constrain in a thread-safe way roualdes/bridgestan#98

Merged

This was referenced Jan 18, 2024

Sample with seed=0 and seed=1 gives identical draws stan-dev/cmdstan#1241

Closed

Move away from the boost::ecuyer1988 pRNG #3256

Closed

This was referenced Jan 30, 2024

Refactor: Avoid hardcoding boost::ecuyer1988 #3263

Merged

Change default RNG to boost::mixmax #3264

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`create_rng` with chain=0 appears to return biased first draw #3167

`create_rng` with chain=0 appears to return biased first draw #3167

cescalara commented Feb 23, 2023

WardBrian commented Feb 23, 2023

WardBrian commented Feb 23, 2023

WardBrian commented Feb 23, 2023 •

edited

Loading

WardBrian commented Feb 23, 2023

cescalara commented Feb 23, 2023 •

edited

Loading

WardBrian commented Feb 23, 2023

WardBrian commented Feb 23, 2023

ahartikainen commented Feb 23, 2023

WardBrian commented Feb 23, 2023

ahartikainen commented Feb 23, 2023

WardBrian commented Feb 23, 2023 •

edited

Loading

WardBrian commented Feb 23, 2023

ahartikainen commented Feb 23, 2023

WardBrian commented Feb 24, 2023

bob-carpenter commented Feb 24, 2023

WardBrian commented Feb 24, 2023

bob-carpenter commented Feb 24, 2023

WardBrian commented Feb 24, 2023

bob-carpenter commented Feb 24, 2023

WardBrian commented Feb 24, 2023

bob-carpenter commented Feb 24, 2023

bob-carpenter commented Feb 24, 2023

WardBrian commented Feb 24, 2023

rok-cesnovar commented Feb 24, 2023

kotika commented Feb 24, 2023 •

edited

Loading

WardBrian commented Feb 24, 2023

kotika commented Feb 24, 2023 •

edited

Loading

WardBrian commented Feb 27, 2023

create_rng with chain=0 appears to return biased first draw #3167

create_rng with chain=0 appears to return biased first draw #3167

Comments

cescalara commented Feb 23, 2023

Summary:

Description:

Additional Information:

Current Version:

WardBrian commented Feb 23, 2023

WardBrian commented Feb 23, 2023

WardBrian commented Feb 23, 2023 • edited Loading

WardBrian commented Feb 23, 2023

cescalara commented Feb 23, 2023 • edited Loading

WardBrian commented Feb 23, 2023

WardBrian commented Feb 23, 2023

ahartikainen commented Feb 23, 2023

WardBrian commented Feb 23, 2023

ahartikainen commented Feb 23, 2023

WardBrian commented Feb 23, 2023 • edited Loading

WardBrian commented Feb 23, 2023

ahartikainen commented Feb 23, 2023

WardBrian commented Feb 24, 2023

bob-carpenter commented Feb 24, 2023

WardBrian commented Feb 24, 2023

bob-carpenter commented Feb 24, 2023

WardBrian commented Feb 24, 2023

bob-carpenter commented Feb 24, 2023

WardBrian commented Feb 24, 2023

bob-carpenter commented Feb 24, 2023

bob-carpenter commented Feb 24, 2023

WardBrian commented Feb 24, 2023

rok-cesnovar commented Feb 24, 2023

kotika commented Feb 24, 2023 • edited Loading

WardBrian commented Feb 24, 2023

kotika commented Feb 24, 2023 • edited Loading

WardBrian commented Feb 27, 2023

`create_rng` with chain=0 appears to return biased first draw #3167

`create_rng` with chain=0 appears to return biased first draw #3167

WardBrian commented Feb 23, 2023 •

edited

Loading

cescalara commented Feb 23, 2023 •

edited

Loading

WardBrian commented Feb 23, 2023 •

edited

Loading

kotika commented Feb 24, 2023 •

edited

Loading

kotika commented Feb 24, 2023 •

edited

Loading