Skip to content

New example for auto-imputation aka handle missing values with a simple dataset and full workflow #721

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jonsedar opened this issue Nov 9, 2024 · 0 comments
Labels
proposal New notebook proposal still up for discussion

Comments

@jonsedar
Copy link
Contributor

jonsedar commented Nov 9, 2024

Notebook proposal

Title: GLM-missing-numeric-values

New example for auto-imputation aka handle missing values with a simple dataset and full workflow.

I propose this is a present-day solution, which should hopefully be generally applicable, and if/when we get newer
functionality, let's extend this notebook with a new Section 2 to demonstrate and compare that. Related further
discussion in pymc-devs/pymc#6626, and pymc-devs/pymc#7204

Why should this notebook be added to pymc-examples?

Our problem statement is that when faced with data with missing values, we want to:

  1. Infer the missing values for the in-sample dataset and sample full posterior parameters
  2. Predict the endogenous feature and the missing values for an out-of-sample dataset

This notebook takes the opportunity to:

  • Demonstrate a general method using a numpy.masked_array, often mentioned in pymc folklore but rarely demonstrated
  • Demonstrate a reasonably complete Bayesian workflow {cite:p}gelman2020bayesian including data creation

This notebook is a partner to another pymc-examples notebook Missing_Data_Imputation.ipynb
which goes into more detail of taxonomies and a much more complicated dataset and tutorial-style worked example.

Suggested categories:

  • Level: Intermediate
  • Diataxis type: Reference

Related notebooks

This notebook is a partner to another pymc-examples notebook Missing_Data_Imputation.ipynb
which goes into more detail of taxonomies and a much more complicated dataset and tutorial-style worked example.

References

Related further discussion in pymc-devs/pymc#6626, and pymc-devs/pymc#7204

Also already in references.bib

@book{enders2022,
title = {Applied Missing Data Analysis},
author = {Enders K, Craig},
year = {2022},
publisher = {The Guilford Press}
}

@Article{gelman2020bayesian,
title = {Bayesian workflow},
author = {Gelman, Andrew and Vehtari, Aki and Simpson, Daniel and Margossian, Charles C and Carpenter, Bob and Yao, Yuling and Kennedy, Lauren and Gabry, Jonah and B{"u}rkner, Paul-Christian and Modr{'a}k, Martin},
journal = {arXiv preprint arXiv:2011.01808},
year = {2020},
url = {https://arxiv.org/abs/2011.01808}
}

@jonsedar jonsedar added the proposal New notebook proposal still up for discussion label Nov 9, 2024
jonsedar added a commit to jonsedar/pymc-examples that referenced this issue Nov 9, 2024
jonsedar added a commit to jonsedar/pymc-examples that referenced this issue Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal New notebook proposal still up for discussion
Projects
None yet
Development

No branches or pull requests

2 participants