Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

md.pattern #101

Open
vankesteren opened this issue Mar 25, 2020 · 7 comments
Open

md.pattern #101

vankesteren opened this issue Mar 25, 2020 · 7 comments

Comments

@vankesteren
Copy link

vankesteren commented Mar 25, 2020

Would it be possible to include a plot for patterns of missingness similar to the md.pattern functionality in the mice package in R?

Here's an example from that package:
image

this plot tells us the following:
13 observations have 0 missing values
3 observations have missing values on chl only
10 observations have missing values on chl
etc...

the patterns are easily visible and compact: the plot scales with the number of missingness patterns, not with the number of rows in the dataframe!

@ResidentMario
Copy link
Owner

This is a neat idea! Will see what I can do.

@SultanOrazbayev
Copy link

I am both interested in the feature and interested in contributing to this. This would be especially handy with data that exceeds memory (so would be great to make this dask compatible).

@SultanOrazbayev
Copy link

@vankesteren: while the PR is reviewed, it would be great if you could do an independent test-drive of the new pattern function.

@vankesteren
Copy link
Author

I'll see what I can do!

@vankesteren
Copy link
Author

Looks great! Here is the pattern function applied to the same dataset:

image

I do have the following suggestions:

  • there is some information missing:
    • add the number of missing values in the pattern (right margin of the md.pattern R plot above)
    • There are no column counts (bottom row in the md.pattern plot)
  • add a method for visualisation or a suggestion on how to visualise this in the documentation
  • call the mvcount column simply count (or to avoid overlap with the column names, maybe something like _count_?). mvcount in my head goes immediately to "multivariate count"

@SultanOrazbayev
Copy link

Thanks for the suggestions!

re: adding number of missing values: do you have a suggestion for the name of this column? values_missing?

@vankesteren
Copy link
Author

yeah, that works! or maybe n_missing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants