New example for auto-imputation aka handle missing values with a simple dataset and full workflow #721

jonsedar · 2024-11-09T07:01:24Z

Notebook proposal

Title: GLM-missing-numeric-values

New example for auto-imputation aka handle missing values with a simple dataset and full workflow.

I propose this is a present-day solution, which should hopefully be generally applicable, and if/when we get newer
functionality, let's extend this notebook with a new Section 2 to demonstrate and compare that. Related further
discussion in pymc-devs/pymc#6626, and pymc-devs/pymc#7204

Why should this notebook be added to pymc-examples?

Our problem statement is that when faced with data with missing values, we want to:

Infer the missing values for the in-sample dataset and sample full posterior parameters
Predict the endogenous feature and the missing values for an out-of-sample dataset

This notebook takes the opportunity to:

Demonstrate a general method using a numpy.masked_array, often mentioned in pymc folklore but rarely demonstrated
Demonstrate a reasonably complete Bayesian workflow {cite:p}gelman2020bayesian including data creation

This notebook is a partner to another pymc-examples notebook Missing_Data_Imputation.ipynb
which goes into more detail of taxonomies and a much more complicated dataset and tutorial-style worked example.

Suggested categories:

Level: Intermediate
Diataxis type: Reference

Related notebooks

This notebook is a partner to another pymc-examples notebook Missing_Data_Imputation.ipynb
which goes into more detail of taxonomies and a much more complicated dataset and tutorial-style worked example.

References

Related further discussion in pymc-devs/pymc#6626, and pymc-devs/pymc#7204

Also already in references.bib

@book{enders2022,
title = {Applied Missing Data Analysis},
author = {Enders K, Craig},
year = {2022},
publisher = {The Guilford Press}
}

@Article{gelman2020bayesian,
title = {Bayesian workflow},
author = {Gelman, Andrew and Vehtari, Aki and Simpson, Daniel and Margossian, Charles C and Carpenter, Bob and Yao, Yuling and Kennedy, Lauren and Gabry, Jonah and B{"u}rkner, Paul-Christian and Modr{'a}k, Martin},
journal = {arXiv preprint arXiv:2011.01808},
year = {2020},
url = {https://arxiv.org/abs/2011.01808}
}

The text was updated successfully, but these errors were encountered:

+ ran pre-commit checks etc

jonsedar added the proposal New notebook proposal still up for discussion label Nov 9, 2024

jonsedar added a commit to jonsedar/pymc-examples that referenced this issue Nov 9, 2024

+ added new NB to demonstrate missing values, per issue pymc-devs#721

eebfadb

+ ran pre-commit checks etc

jonsedar mentioned this issue Nov 9, 2024

New example notebook for auto-imputation aka handle missing values with a simple dataset and full workflow #722

Closed

3 tasks

jonsedar added a commit to jonsedar/pymc-examples that referenced this issue Dec 16, 2024

+ added new NB to demonstrate missing values, per issue pymc-devs#721

13cea3c

+ ran pre-commit checks etc

jonsedar mentioned this issue Dec 16, 2024

New example: Auto-imputation aka handling missing numeric covariates #753

Merged

3 tasks

fonnesbeck closed this as completed Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

New example for auto-imputation aka handle missing values with a simple dataset and full workflow #721

New example for auto-imputation aka handle missing values with a simple dataset and full workflow #721

jonsedar commented Nov 9, 2024

Uh oh!

New example for auto-imputation aka handle missing values with a simple dataset and full workflow #721

New example for auto-imputation aka handle missing values with a simple dataset and full workflow #721

Comments

jonsedar commented Nov 9, 2024

Notebook proposal

Why should this notebook be added to pymc-examples?

Related notebooks

References