About em problem #98

ROOKLO · 2020-09-20T13:01:19Z

I created NAN in my data set randomly, and i want to compare the performance of EM methods in SPSS and impyute .
and i got
spss_em*
MSE_spss: 22.177916455492653
r_spss: 0.721709731654166
impyute_em
MSE_impyute: 289.1830722478248
r_impyute: 0.002467765572835078
the em from impyute seems to not work very well , and i do not know why

BaoxueLi · 2021-01-22T02:56:03Z

I am not very clear about the details of SPSS EM implementation, but I read the source code of the em from impyute. I found that the implementation is very simple. It is to continuously resample the Gaussian distribution formed by the mean and variance of the current column until the gap with the last filling value is very small. This method may not be effective when dealing with data with more complex characteristics.

ROOKLO · 2021-01-22T08:12:18Z

I am not very clear about the details of SPSS EM implementation, but I read the source code of the em from impyute. I found that the implementation is very simple. It is to continuously resample the Gaussian distribution formed by the mean and variance of the current column until the gap with the last filling value is very small. This method may not be effective when dealing with data with more complex characteristics.

Maybe the data is not normally distributed or not missing randomly. The normal distribution formed by the mean and standard deviation of the existing data in every column(feature) could not represent the data's true distribution, and bias was introduced in the first iteration.

mkrtl · 2021-02-25T09:17:27Z

I also do not think the implementation here at impyute is correct, as it does not use any covariance structure and just uses the mean and standard deviation of the current column. Murphy's "Machine Learning: a statistical perspective", chapter 11.6. shows how to use the EM-algorithm for derivating the sufficient statistics in the normal case. Is the algorithm converging actually for any delta?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About em problem #98

About em problem #98

ROOKLO commented Sep 20, 2020

BaoxueLi commented Jan 22, 2021

ROOKLO commented Jan 22, 2021

mkrtl commented Feb 25, 2021

About em problem #98

About em problem #98

Comments

ROOKLO commented Sep 20, 2020

BaoxueLi commented Jan 22, 2021

ROOKLO commented Jan 22, 2021

mkrtl commented Feb 25, 2021