You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey @solegalli it's been a while since you opened this issue, but I just replied to the other issue you opened. It's a bug in the sense that it should be divided by 2**(1/2) instead of 2. But it was done like this because the features are one-hot encoded, so when computing the euclidean distance between two observations with a different value in a categorical feature the summation of the squared differences would be Med**2. However, the way it is implemented, the importance of the categorical features are halved when compared to the SMOTENC implementation proposed by Chawla et al. But honestly I'm not even sure if I'm correct about this possible bug, this seems like something so simple that I'm afraid I might be saying something stupid...
Link to the reply of the other issue (where I also described this problem in a bit more detail I believe): #860 (comment)
In this line, when adding the median(std) to the OHE matrix to estimate the distance of categorical features, the median is divided by 2.
Is this a bug? or is this intentional? and if intentional, why?
thanks a lot!
The text was updated successfully, but these errors were encountered: