-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DisSim implementation correct? #109
Comments
Hi David, thanks for your interest in the package. I am not very familiar with the einsum notation, so I cannot comment on these parts. What differences do you see in how the norms are calculated? Note, btw, that the three terms in (8) are squared Euclidean distance, so I assume there should be no |
Sorry, I posted the Euclidean distance version as that was the version I am interested in, but you are right, the default should be squared (and it is!). I am mainly concerned about the |
I wonder if there's really something like a "Euclidean distance version". If there is, I assume it should be something like np.sqrt(squared_dist - sq_norm_dist_x - sq_norm_dist_y) instead of dist - np.sqrt(norm_dist_x) - np.sqrt(norm_dist_y) In the end, in genereal Could you point me to where to find this PRs always welcome :) |
The authors likely use squared Euclidean distance because it's minimally cheaper / easier to compute, otherwise I don't see an argument. In my case, the input distance matrix is Euclidean, thus it makes more sense to treat the centroid distances as Euclidean as well. Your assumption is also what is implemented in your library, but I believe the assumption might be wrong.
should be right, if I meant the
|
Interesting. It's been a while since I last read the paper and, unfortunately, I don't really have time to look into that in much detail. If the squared Euclidean distances are indeed just a matter of a tiny performance boost, I totally follow your argument. Also, we're on the same page in terms of if the implemented formula can be interpreted as "Euclidean" distance. As in: Not really. In this link the default also is In any case, have you tried to evaluate your implementation? Say if it reduces hubness better or leads to higher accuracy in some task? |
Hi! Thanks for you work with the package. I'm currently trying to understand the implementation of DisSim, which seems to differ from the paper. If
idx
anddist
contain the indices and distances to then-1
neighbors, equation 8 from the paper would be something like:The implemented version returns something like
np.sqrt(dist - norm_dist_x - norm_dist_y)
, where the norms are generated slightly differently. Is this intended and only a documentation problem?The text was updated successfully, but these errors were encountered: