-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About em problem #98
Comments
I am not very clear about the details of SPSS EM implementation, but I read the source code of the em from impyute. I found that the implementation is very simple. It is to continuously resample the Gaussian distribution formed by the mean and variance of the current column until the gap with the last filling value is very small. This method may not be effective when dealing with data with more complex characteristics. |
Maybe the data is not normally distributed or not missing randomly. The normal distribution formed by the mean and standard deviation of the existing data in every column(feature) could not represent the data's true distribution, and bias was introduced in the first iteration. |
I also do not think the implementation here at impyute is correct, as it does not use any covariance structure and just uses the mean and standard deviation of the current column. Murphy's "Machine Learning: a statistical perspective", chapter 11.6. shows how to use the EM-algorithm for derivating the sufficient statistics in the normal case. Is the algorithm converging actually for any |
I created NAN in my data set randomly, and i want to compare the performance of EM methods in SPSS and impyute .
and i got
spss_em*
MSE_spss: 22.177916455492653
r_spss: 0.721709731654166
impyute_em
MSE_impyute: 289.1830722478248
r_impyute: 0.002467765572835078
the em from impyute seems to not work very well , and i do not know why
The text was updated successfully, but these errors were encountered: