Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected determinancy from rng functions across model runs #568

Closed
cmgoold opened this issue Jun 14, 2022 · 3 comments · Fixed by #569
Closed

Unexpected determinancy from rng functions across model runs #568

cmgoold opened this issue Jun 14, 2022 · 3 comments · Fixed by #569

Comments

@cmgoold
Copy link

cmgoold commented Jun 14, 2022

Summary:

I'm generating random numbers as part of a simulation based calibration. I'm finding that subsequent runs (every ~3-5 runs) return the same number from the rng calls. This doesn't happen in CmdStan so I think it's localised to cmdstanpy.

Description:

Here's an example Stan model called normal-rng.stan:

/* normal-rng.stan */
transformed data{
  real x_;
  x_ = std_normal_rng();
}

generated quantities{
  real x = x_;
  // to check if the same behaviour occurs in GQs
  real x_gq = std_normal_rng();
}

And some sample code to run the model:

from cmdstanpy import CmdStanModel

mod = CmdStanModel(stan_file="normal-rng.stan")

fits = [mod.sample(data={}) for i in range(10)]

[fit.stan_variables()["x"][0] for fit in fits]

# Example output
[
0.627496, -0.21658899999999995,  -0.21658899999999995, -0.21658899999999995,
-0.0290533,-0.0290533,-0.0290533,-0.0290533,-0.022001700000000006,
-0.022001700000000006
]

Current Version:

cmdstanpy version 1.0.0 and cmdstan version 2.29.2

@WardBrian
Copy link
Member

WardBrian commented Jun 14, 2022

Thanks for reporting this @cmgoold! This is really odd, it looks like different random seeds are being used:

python test.py 2>&1 | grep "seed ="
  seed = 33865
  seed = 19553
  seed = 82406
  seed = 30985
  seed = 19331
  seed = 29939
  seed = 21069
  seed = 91468
  seed = 14912
  seed = 67544

Okay, odd. If we set the output folder to . and look at the CSVs generated, the answer starts to pop up:

ls -l1 | grep csv
-rw-rw-r--  1 brian brian   23586 Jun 14 09:38 normal-rng-20220614093838_1.csv
-rw-rw-r--  1 brian brian   23570 Jun 14 09:38 normal-rng-20220614093838_2.csv
-rw-rw-r--  1 brian brian   23576 Jun 14 09:38 normal-rng-20220614093838_3.csv
-rw-rw-r--  1 brian brian   23559 Jun 14 09:38 normal-rng-20220614093838_4.csv
-rw-rw-r--  1 brian brian   22581 Jun 14 09:38 normal-rng-20220614093839_1.csv
-rw-rw-r--  1 brian brian   22628 Jun 14 09:38 normal-rng-20220614093839_2.csv
-rw-rw-r--  1 brian brian   22604 Jun 14 09:38 normal-rng-20220614093839_3.csv
-rw-rw-r--  1 brian brian   22580 Jun 14 09:38 normal-rng-20220614093839_4.csv

There are only 8 CSV files, not 40. The timestamps start to give away why - they're being overwritten by each other!

If we change your script to

fits = [mod.sample(data={}, output_dir='.', chain_ids=i*4 + 1) for i in range(10)]

to ensure each run has a unique filename, it works:

[0.226037, 1.11019, -0.362359, -0.512991, -0.624099, -1.81625, -1.64503, 0.568551, 1.17152, 1.43194]

We often have issues reported that end up being these file clobbering issues. I wonder if we could fix that on the cmdstanpy side. Thoughts @mitzimorris? We could generate new temp directories per-run

@cmgoold
Copy link
Author

cmgoold commented Jun 14, 2022

Thanks for the quick reply @WardBrian! That makes sense -- thanks for tracking down the issue.

@mitzimorris
Copy link
Member

we could certainly generate new temp directories per-run - temp is temp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants