Loading PPC data into arviz in pymc3 #1282

kyleabeauchamp · 2020-07-06T16:44:26Z

Currently, the easiest way to get data from pymc3 into arviz is via pm.sample(return_inferencedata=True), which gives back an arviz InferenceData object. However, pm.sample_posterior_predictive() only supports dictionary returns.

It's unclear what the right approach is, as at this point the InferenceData object is already created. I guess we either need an easy, dimension-aware way to add the PPC data later or create a new object with it.

The text was updated successfully, but these errors were encountered:

kyleabeauchamp · 2020-07-06T19:35:33Z

I think one more thing to add here: in my use case, I am using pm.set_data() and pm.sample_posterior_predictive() to run PPC calculations on new data that was not present during the original parameter sampling, so it's important that I be able to do this work post-hoc.

OriolAbril · 2020-07-07T10:42:12Z

To add posterior predictive samples to an inference data object with pymc3, I think the best approach is:

...
    idata = pm.sample(..., return_inference_data=True)
...
    ppc = pm.sample_posterior_predictive(..., keep_size=True)
az.concat(idata, az.from_dict(posterior_predictive=ppc), inplace=True)

There is also issue #1239 to make this last line less convoluted.

As Robert pointed out, to add predictions (out of sample posterior predictive samples) the way to go should be from_pymc3_predictions as this will add the samples returned by sample_posterior_predictive to predictions groups as well as the data used to generate them to constant_data_predictions (assuming you are using pm.Data container).

I commented in the issue you linked as there seems to be a bug with keep_size is a dataset is passed so most of this won't work for now 😕

kyleabeauchamp · 2020-07-11T17:11:32Z

Thanks, the az.concat idea is working, and I'm testing out the PR branch of pymc3 that fixes the keep_size issue. So I think I'm now on track again :).

canyon289 · 2020-07-19T22:00:59Z

Do we still need to keep this issue open?

rpgoldman · 2020-07-19T22:25:15Z

I wonder if we should make automating this an option in the PyMC3 predictive samplings. E.g., add an add_to_inference_data argument as a kwarg to those functions (which would require that they be invoked with an inference data argument).

@OriolAbril 's method is definitely the right one, but it seems inconvenient to make the user have to know to use concat and from_dict. Also, TBH, this seems like a mismatch of what concat does in other contexts. For example, what pandas concat does seems quite different from this, and even xarray's concat the same, because those data structures (frames and datasets) are homogenous in a way that an InferenceData is not (because of the groups).

To me, what az.concat does seems a lot more like "insertion" than "concatenation."

OriolAbril · 2020-07-19T22:43:42Z

It would probably be helpful to have PyMC3 add the results to idata directly.

Regarding az.concat it does several tasks, some more aligned with concat than others. Its main use (to me) is combining two different inference data objects with strictly different groups, hence the concatenation, the groups from one and from the other are concatenated and nothing else. It can then also be used to concat inferencedata with the same variables and groups along the chain or draw dimension in order to combine different runs of the same model. I think it is not too different from the xarray concat.

kyleabeauchamp · 2020-07-19T23:18:28Z

I'm fine to close this ticket for now. I was able to get a script running to do PPC analysis in arviz with the currently available functions. I'll open new tickets if I see any new gaps.

rpgoldman · 2020-07-19T23:46:44Z

I’m likely to open a merge request on PyMC3 for this.

rpgoldman · 2020-07-20T00:43:30Z

Here's a start: pymc-devs/pymc#4021 -- has only been applied to fast_sample_posterior_predictive so far.

kwarg used is add_to_inference_data rather than return_inference_data. Perhaps the latter would be better. Please comment on that MR.

kyleabeauchamp mentioned this issue Jul 6, 2020

sample_posterior_predictive flattens chains and draws pymc-devs/pymc#4004

Closed

kyleabeauchamp closed this as completed Jul 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading PPC data into arviz in pymc3 #1282

Loading PPC data into arviz in pymc3 #1282

kyleabeauchamp commented Jul 6, 2020

kyleabeauchamp commented Jul 6, 2020

OriolAbril commented Jul 7, 2020 •

edited

Loading

kyleabeauchamp commented Jul 11, 2020

canyon289 commented Jul 19, 2020

rpgoldman commented Jul 19, 2020

OriolAbril commented Jul 19, 2020

kyleabeauchamp commented Jul 19, 2020

rpgoldman commented Jul 19, 2020

rpgoldman commented Jul 20, 2020

Loading PPC data into arviz in pymc3 #1282

Loading PPC data into arviz in pymc3 #1282

Comments

kyleabeauchamp commented Jul 6, 2020

kyleabeauchamp commented Jul 6, 2020

OriolAbril commented Jul 7, 2020 • edited Loading

kyleabeauchamp commented Jul 11, 2020

canyon289 commented Jul 19, 2020

rpgoldman commented Jul 19, 2020

OriolAbril commented Jul 19, 2020

kyleabeauchamp commented Jul 19, 2020

rpgoldman commented Jul 19, 2020

rpgoldman commented Jul 20, 2020

OriolAbril commented Jul 7, 2020 •

edited

Loading