Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to impute a categorical variable with MICE but prevent it from taking some values? #196

Closed
alexiasampri opened this issue Oct 7, 2019 · 3 comments

Comments

@alexiasampri
Copy link

I have a categorical variable, var1 , that can take on values of W, B, A, M, N or P. There are some NAs that I want to impute using the mice package in R, but I know that the missing values cannot be "W" or "B" because those people said that they do not belong in that category. I want to impute var1 but force mice to only choose from everything except B or W .

Here is sample code for you to use:

df=data.frame(age=c(24,37,58,65,70,84, 56, 36, 48,23,15), 
    var1 =c("B","W", NA, "A",NA, "P","N", NA, "M",NA, "B"), 
    var1categ=c(0,0, 1, 1, 1,1,1,1,1,1, 0),
    ht = c(156, 169, 180, 175, 168, 165, 171, 158, 160, 175, 160))

imp=mice(df, remove_collinear = FALSE)

Thank you for your help and please let me know if you need more information.

@stefvanbuuren
Copy link
Member

An easy way is perhaps to impute the subset of df without categories "B" and "W".

@alexiasampri
Copy link
Author

Thanks for getting back to me. The problem with that is that I don't want to follow this approach because I lose power. The actual dataset is much bigger. df mentioned above is just an example. Preferably I want to do it with either post processing or create a function in mice. But I don't really know how to it. I also know that for integers I can use squeeze. Is there anything similar for categorical variables?

@stefvanbuuren
Copy link
Member

Yes, I understand.

Another way is to start with the mice.impute.polyreg() function. At some point, you see the line post <- predict(fit, xy[wy, , drop = FALSE], type = "probs"), which contains the probabilities per categories. You can then nullify the probabilities of the categories that you want to exclude, and perhaps you need to restandardise so that they add up to 1. If all is well, the method will then only draw from the permitted categories.

Sorry, I don't have examples that implements this approach, but in principle it should work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants