Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading PPC data into arviz in pymc3 #1282

Closed
kyleabeauchamp opened this issue Jul 6, 2020 · 9 comments
Closed

Loading PPC data into arviz in pymc3 #1282

kyleabeauchamp opened this issue Jul 6, 2020 · 9 comments

Comments

@kyleabeauchamp
Copy link
Contributor

Currently, the easiest way to get data from pymc3 into arviz is via pm.sample(return_inferencedata=True), which gives back an arviz InferenceData object. However, pm.sample_posterior_predictive() only supports dictionary returns.

It's unclear what the right approach is, as at this point the InferenceData object is already created. I guess we either need an easy, dimension-aware way to add the PPC data later or create a new object with it.

@kyleabeauchamp
Copy link
Contributor Author

I think one more thing to add here: in my use case, I am using pm.set_data() and pm.sample_posterior_predictive() to run PPC calculations on new data that was not present during the original parameter sampling, so it's important that I be able to do this work post-hoc.

@OriolAbril
Copy link
Member

OriolAbril commented Jul 7, 2020

To add posterior predictive samples to an inference data object with pymc3, I think the best approach is:

...
    idata = pm.sample(..., return_inference_data=True)
...
    ppc = pm.sample_posterior_predictive(..., keep_size=True)
az.concat(idata, az.from_dict(posterior_predictive=ppc), inplace=True)

There is also issue #1239 to make this last line less convoluted.

As Robert pointed out, to add predictions (out of sample posterior predictive samples) the way to go should be from_pymc3_predictions as this will add the samples returned by sample_posterior_predictive to predictions groups as well as the data used to generate them to constant_data_predictions (assuming you are using pm.Data container).

I commented in the issue you linked as there seems to be a bug with keep_size is a dataset is passed so most of this won't work for now 😕

@kyleabeauchamp
Copy link
Contributor Author

Thanks, the az.concat idea is working, and I'm testing out the PR branch of pymc3 that fixes the keep_size issue. So I think I'm now on track again :).

@canyon289
Copy link
Member

Do we still need to keep this issue open?

@rpgoldman
Copy link
Contributor

I wonder if we should make automating this an option in the PyMC3 predictive samplings. E.g., add an add_to_inference_data argument as a kwarg to those functions (which would require that they be invoked with an inference data argument).

@OriolAbril 's method is definitely the right one, but it seems inconvenient to make the user have to know to use concat and from_dict. Also, TBH, this seems like a mismatch of what concat does in other contexts. For example, what pandas concat does seems quite different from this, and even xarray's concat the same, because those data structures (frames and datasets) are homogenous in a way that an InferenceData is not (because of the groups).

To me, what az.concat does seems a lot more like "insertion" than "concatenation."

@OriolAbril
Copy link
Member

It would probably be helpful to have PyMC3 add the results to idata directly.

Regarding az.concat it does several tasks, some more aligned with concat than others. Its main use (to me) is combining two different inference data objects with strictly different groups, hence the concatenation, the groups from one and from the other are concatenated and nothing else. It can then also be used to concat inferencedata with the same variables and groups along the chain or draw dimension in order to combine different runs of the same model. I think it is not too different from the xarray concat.

@kyleabeauchamp
Copy link
Contributor Author

I'm fine to close this ticket for now. I was able to get a script running to do PPC analysis in arviz with the currently available functions. I'll open new tickets if I see any new gaps.

@rpgoldman
Copy link
Contributor

I’m likely to open a merge request on PyMC3 for this.

@rpgoldman
Copy link
Contributor

Here's a start: pymc-devs/pymc#4021 -- has only been applied to fast_sample_posterior_predictive so far.

kwarg used is add_to_inference_data rather than return_inference_data. Perhaps the latter would be better. Please comment on that MR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants