`generate_quantities` handles `save_warmup=True` "wrong". #1011

funko-unko · 2021-05-11T15:00:56Z

Summary:

If save_warmup=True, generate_quantities takes the first num_samples warmup and samples to generate the gqs.

https://github.com/stan-dev/cmdstan/blob/develop/src/cmdstan/command.hpp

    size_t num_cols = param_names.size();
    size_t num_rows = fitted_params.metadata.num_samples;
    // check that all parameter names are in sample, in order
    if (num_cols + hmc_fixed_cols > fitted_params.header.size()) {
      msg << "Mismatch between model and fitted_parameters csv file \"" << fname
          << "\"" << std::endl;
      throw std::invalid_argument(msg.str());
    }
    for (size_t i = 0; i < num_cols; ++i) {
      if (param_names[i].compare(fitted_params.header[i + hmc_fixed_cols])
          != 0) {
        msg << "Mismatch between model and fitted_parameters csv file \""
            << fname << "\"" << std::endl;
        throw std::invalid_argument(msg.str());
      }
    }
    return_code = stan::services::standalone_generate(
        model,
        fitted_params.samples.block(0, hmc_fixed_cols, num_rows, num_cols),
        random_seed, interrupt, logger, sample_writer);

Has to be changed into something that handles this differently, also incorporating the thinning parameter.

Current Version:

v2.26.1

The text was updated successfully, but these errors were encountered:

funko-unko · 2021-05-11T15:05:39Z

Also, here is an MWE:

Stan

parameters {
  real x;
}
model {
  x ~ normal(0,1);
}
generated quantities{
  real y = x;
}

Python

import cmdstanpy

model = cmdstanpy.CmdStanModel(stan_file='gq.stan')
fit = model.sample(save_warmup=True, iter_sampling=1, chains=1)
gq = model.generate_quantities(dict(), fit)
print(gq.sample_plus_quantities[['x', 'y']])

Output

          x         y
0 -0.330773  0.362201

mitzimorris · 2021-05-11T17:42:22Z

is this a problem with CmdStanPy or CmdStan?
in CmdStan, for method generate_quantities, warmup doesn't make sense.
as far a generate quantities is concerned, you want to add additional quantities of interest to your sample, based on an existing sample and data. if that existing sample includes warmup draws, they will be used. CmdStan doesn't scrape information out of the input CSV files to determine what's in the sample; nor should it.

funko-unko · 2021-05-11T18:11:27Z

Hm, whether it's a problem with CmdStanPy or CmdStan is a matter of perspective or of what's expected.

CmdStan just takes the first num_samples draws from the output csv amd runs the gq block. If the fit was generated with save_warmup=True, these may turn out to only be warmup samples.

CmdStanPy joins the first num_samples samples (excluding warmup) with the generated quantities that CmdStan provides. If save_warmup=False, then it should join the correct draws with each other. If save_warmup=True, then it joins sample-params with warmup-gq.

There are three options to fix this:

Don't change CmdStan's behavior. Instead, let CmdStanPy join the first num_samples draws potentially including warmup with the generated gqs.
Change CmdStan's behavior and CmdStanPy's behavior. Let CmdStan generate gqs for all draws, potentially including warmup draws, and let CmdStanPy join all draws and gqs.

Finally, and I think this should be the preferred way to fix this:

Don't change CmdStanPy's behavior. Instead, let CmdStan always generate gqs for the samples (excluding warmup), regardless of the setting for save_warmup

I believe there are also issues if thin!=1. But that's another issue.

Edit:

Or actually, provide parameters to CmdStan(Py) whether or not to generate gqs for the warmup draws.

mitzimorris · 2021-05-11T18:31:51Z

CmdStan just takes the first num_samples draws from the output csv amd runs the gq block. If the fit was generated with save_warmup=True, these may turn out to only be warmup samples.

not quite. CmdStan takes the fitted_params csv file and hands that off to the generate_quantities service method. however many draws there are in that csv file, that's the new set of outputs. (I implemented this feature and added it to CmdStan - I just checked the code, and nothing seems to have been changed).

my bad - read CmdStan code more carefully - CmdStan only uses the post-warmup draws.

cmdstan/src/cmdstan/command.hpp

Lines 269 to 290 in c83ae03

    
           // read sample from cmdstan csv output file 
        
           std::string fname(fitted_params_file->value()); 
        
           std::ifstream stream(fname.c_str()); 
        
           if (fname != "" && (stream.rdstate() & std::ifstream::failbit)) { 
        
             msg << "Can't open specified file, \"" << fname << "\"" << std::endl; 
        
             throw std::invalid_argument(msg.str()); 
        
           } 
        
           stan::io::stan_csv fitted_params; 
        
           stan::io::stan_csv_reader::read_metadata(stream, fitted_params.metadata, 
        
                                                    &msg); 
        
           if (!stan::io::stan_csv_reader::read_header(stream, fitted_params.header, 
        
                                                       &msg, false)) { 
        
             msg << "Error reading fitted param names from sample csv file \"" << fname 
        
                 << "\"" << std::endl; 
        
             throw std::invalid_argument(msg.str()); 
        
           } 
        
           stan::io::stan_csv_reader::read_adaptation(stream, fitted_params.adaptation, 
        
                                                      &msg); 
        
           fitted_params.timing.warmup = 0; 
        
           fitted_params.timing.sampling = 0; 
        
           stan::io::stan_csv_reader::read_samples(stream, fitted_params.samples, 
        
                                                   fitted_params.timing, &msg);

I'll check on what CmdStanPy is doing - it sounds like that's where add'l config is needed.

funko-unko · 2021-05-11T18:37:07Z

CmdStan only uses the post-warmup draws.

No, I checked. It uses the first num_samples draws it encounters, which may be warmup draws.

mitzimorris · 2021-05-11T18:38:23Z

if you look at the code - line 285 is going to advance the stream reader to the adaptation message - past warmup samples. then line 289 reads the post-warmup draws.

funko-unko · 2021-05-11T18:40:03Z

Slightly extended MWE from above, same Stan file

import cmdstanpy

model = cmdstanpy.CmdStanModel(stan_file='gq.stan')
fit = model.sample(save_warmup=True, iter_sampling=1, chains=1)
print(fit.draws_pd(inc_warmup=True)['x'])
gq = model.generate_quantities(dict(), fit)
print(gq.sample_plus_quantities[['x', 'y']])

Output:

0      -0.543328
1      -0.177736
2      -0.177736
3      -1.121910
4      -1.329880
          ...   
996    -0.850899
997    -0.799468
998    -0.616761
999    -0.483609
1000   -1.144480
Name: x, Length: 1001, dtype: float64
         x         y
0 -1.14448 -0.543328

Maybe some version is out of date, let me check.

Edit: Wait, how do I check and what do I have to check? CmdStan's version I guess?

funko-unko · 2021-05-11T18:51:02Z

Ah. I had a mini heart attack. I changed the cmdstanpath to use the version that I downloaded from github, and suddenly it worked. So I thought I was wrong and wasted your time, but actually I already fixed the bug in my CmdStan code.

This is the header of my csv file, does this confirm that everything is up to date?

# stan_version_major = 2
# stan_version_minor = 26
# stan_version_patch = 1
# model = gq_model
# method = generate_quantities
#   generate_quantities
#     fitted_params = /tmp/tmp9ocuyis7/gq-202105112050-1-dkj8s3x5.csv
# id = 1
# data
#   file = /tmp/tmp9ocuyis7/oinkjs25.json
# init = 2 (Default)
# random
#   seed = 66815
# output
#   file = /tmp/tmp9ocuyis7/gq-202105112050-1-kybspozc.csv
#   diagnostic_file =  (Default)
#   refresh = 100 (Default)
#   sig_figs = -1 (Default)
#   profile_file = profile.csv (Default)
# stanc_version = stanc3 7de91659
# stancflags =

Can you run the MWE and confirm that x!=y?

mitzimorris · 2021-05-11T19:01:06Z

this is fubar and you are right. eating words.

in CmdStan 2.26.1:

> ./examples/bernoulli/bernoulli sample algorithm=hmc save_warmup=1 output file=fp.csv data file=examples/bernoulli/bernoulli.data.json

model b2.stan:

data {
  int<lower=0> N;
  int<lower=0,upper=1> y[N];
}
parameters {
  real<lower=0,upper=1> theta;
}
model {
  theta ~ beta(1,1);  // uniform prior on interval 0,1
  y ~ bernoulli(theta);
}
generated quantities {
  real theta_copy = theta;
}

and run this with file fp.csv as the fitted_params arg:

./examples/bernoulli/b2 generate_quantities fitted_params=fp.csv output file=gq.csv data file=examples/bernoulli/bernoulli.data.json

simple sanity check:

wc -l *.csv
    2052 fp.csv
    1022 gq.csv

so far so good, but - first draw of fp.csv, which is warmup:

lp__,accept_stat__,stepsize__,treedepth__,n_leapfrog__,divergent__,energy__,theta
-13.2719,0.0211121,1,1,3,0,15.5608,0.747871

first draw of gq.csv:

theta_copy
0.747871

looks like the Stan csv reader isn't doing what it should be doing. will investigate.

funko-unko · 2021-05-11T19:03:11Z

Okay, good, my worst fears are vanquished.

I did think that the reader was supposed to read all of the samples, but I only looked so far into it.

(I think there might be an additional issue in handling the thinning parameter, but I haven't tried yet)

mitzimorris · 2021-05-11T19:04:38Z

many thanks. probably never properly unit tested. will fix. (but have to write grant proposal first) - you're welcome to try - the question is - why doesn't read_adaptation advance the stream reader properly? in other words, wtf?

mitzimorris · 2021-05-12T04:13:13Z

answer to above question: stan::io::stan_csv_reader::read_samples isn't doing what I though it was.
it reads from the beginning of the file and assembles all draws into the Eigen Xd matrix.
the metadata has fields num_warmup, save_warmup, and num_samples.

therefore, two options:

a) use all draws
b) add argument to flag whether or not to use warmup draws.

a) is a whole lot easier. given that by default, warmup draws are not saved, if warmup draws are present, then that's a deliberate choice, and standalone GQ should use all draws.

also, this would avoid problems with thinning as well.

funko-unko · 2021-05-12T05:48:04Z

Yes, that sounds reasonable.

BTW, I have only looked at the adaptation-reading code shortly, but it looks like it stops reading as soon it encounters a non-comment line.

Ie reading of the adaptation does not seem to work when warmup draws have been saved. But I have made no tests and my understanding is very shallow.

mitzimorris · 2021-05-12T13:38:23Z

I have only looked at the adaptation-reading code shortly, but it looks like it stops reading as soon it encounters a non-comment line.

right, and that's what gave me the impression that the get_samples method would then only pick up the post-warmup draws. but I was wrong.

Ie reading of the adaptation does not seem to work when warmup draws have been saved. But I have made no tests and my understanding is very shallow.

it's not your understanding that's shallow - it's the code. no comments. haven't checked out the unit tests - probably not enough of them, either.

mitzimorris · 2021-05-12T13:44:47Z

going with option a) then, draws in the fitted_params csv file will be used, all of them.
CmdStanPy will have to document this, and could also print info message to user that warmup draws are present.

(off topic, but noted here: if CmdStanPy wants to get fancier, it could - I suggest we don't and push back on Python users to do their own data munging)

funko-unko · 2021-05-12T15:46:24Z

Yes, I would not want to deal with data wrangling in C...

mitzimorris mentioned this issue May 16, 2021

Release 2.27 checklist #1005

Closed

20 tasks

rok-cesnovar closed this as completed May 25, 2021

mitzimorris mentioned this issue Jul 21, 2021

code cleanup - generate_quantities stan-dev/cmdstanpy#409

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`generate_quantities` handles `save_warmup=True` "wrong". #1011

`generate_quantities` handles `save_warmup=True` "wrong". #1011

funko-unko commented May 11, 2021

funko-unko commented May 11, 2021

mitzimorris commented May 11, 2021

funko-unko commented May 11, 2021 •

edited

Loading

mitzimorris commented May 11, 2021 •

edited

Loading

funko-unko commented May 11, 2021

mitzimorris commented May 11, 2021

funko-unko commented May 11, 2021 •

edited

Loading

funko-unko commented May 11, 2021 •

edited

Loading

mitzimorris commented May 11, 2021 •

edited

Loading

funko-unko commented May 11, 2021

mitzimorris commented May 11, 2021 •

edited

Loading

mitzimorris commented May 12, 2021 •

edited

Loading

funko-unko commented May 12, 2021 •

edited

Loading

mitzimorris commented May 12, 2021 •

edited

Loading

mitzimorris commented May 12, 2021 •

edited

Loading

funko-unko commented May 12, 2021

generate_quantities handles save_warmup=True "wrong". #1011

generate_quantities handles save_warmup=True "wrong". #1011

Comments

funko-unko commented May 11, 2021

Summary:

Current Version:

funko-unko commented May 11, 2021

mitzimorris commented May 11, 2021

funko-unko commented May 11, 2021 • edited Loading

mitzimorris commented May 11, 2021 • edited Loading

funko-unko commented May 11, 2021

mitzimorris commented May 11, 2021

funko-unko commented May 11, 2021 • edited Loading

funko-unko commented May 11, 2021 • edited Loading

mitzimorris commented May 11, 2021 • edited Loading

funko-unko commented May 11, 2021

mitzimorris commented May 11, 2021 • edited Loading

mitzimorris commented May 12, 2021 • edited Loading

funko-unko commented May 12, 2021 • edited Loading

mitzimorris commented May 12, 2021 • edited Loading

mitzimorris commented May 12, 2021 • edited Loading

funko-unko commented May 12, 2021

`generate_quantities` handles `save_warmup=True` "wrong". #1011

`generate_quantities` handles `save_warmup=True` "wrong". #1011

funko-unko commented May 11, 2021 •

edited

Loading

mitzimorris commented May 11, 2021 •

edited

Loading

funko-unko commented May 11, 2021 •

edited

Loading

funko-unko commented May 11, 2021 •

edited

Loading

mitzimorris commented May 11, 2021 •

edited

Loading

mitzimorris commented May 11, 2021 •

edited

Loading

mitzimorris commented May 12, 2021 •

edited

Loading

funko-unko commented May 12, 2021 •

edited

Loading

mitzimorris commented May 12, 2021 •

edited

Loading

mitzimorris commented May 12, 2021 •

edited

Loading