Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pooling rules for creating synthetic data with mice #436

Merged
merged 7 commits into from
Oct 8, 2021
Merged

Pooling rules for creating synthetic data with mice #436

merged 7 commits into from
Oct 8, 2021

Conversation

thomvolker
Copy link
Member

@thomvolker thomvolker commented Oct 7, 2021

As discussed with @gerkovink, the pool.syn() and pool.scalar.syn() pooling functions apply the rules developed by Reiter (2003) to combine analyses on multiply imputed synthetic datasets. Note that these rules only apply to synthetic versions of completely observed datasets. If the data to synthesize contains missing values, different pooling rules apply that require a two-step approach to imputation (first impute missingness, than synthesize all m imputed datasets). Developing a one-step approach would be something for future research.

@gerkovink
Copy link
Member

@stefvanbuuren Can we prioritise this PR?

@stefvanbuuren
Copy link
Member

Thanks for the PR.

There's a lot of duplicated code. I will look into the possibility to integrate this functionality as an extra argument to the regular pool() function.

@thomvolker
Copy link
Member Author

I completely agree that this PR is mostly duplicate code. The reason to still write an additional function was to protect uninformed users against using wrong pooling rules. Still, an additional argument is probably more elegant.

@stefvanbuuren stefvanbuuren merged commit c8ebb9f into amices:master Oct 8, 2021
@stefvanbuuren
Copy link
Member

mice 3.13.15 adds a new rule argument to pool() and pool.scalar() and redefines pool.syn() and pool.scalar.syn() as wrappers. This removes almost all duplication and is extendable as other pooling rule come along.

Use pool.syn() and pool.scalar.syn() in code for synthetic data, and reserve pool() and pool.scalar() for missing data uses.

@gerkovink
Copy link
Member

Nice indeed to separate the workflow between pool() and pool.syn()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants