-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WMD Earth Mover distance problems #981
Comments
Mmm... I've been looking at the code In word2vec.py
If I replace it with:
The return value is 0.75 Do we need to normalize the distance matrix? |
Argh. I just noticed that the blog is deviating from the paper, and this implementation returns the value from the paper. Though, the code would be simpler if used |
@josepablog Thanks for your suggestion. We don't have a dependency on sklearn so unfortunately can't use |
Computing euclidean distances is trivial (one-liner) and should not need any external dependencies or inefficiencies. @tmylk Also, since the distance is symmetric (and zero along diagonal), isn't this doing double work? The whole nested loop looks inefficient and should be using numpy broadcasting and array operations. |
Thanks for this awesome implementation! @josepablog, @tmylk , @piskvorky - I see that this issue is closed but the wmdistance is still doing the ineffcient loops. Could you please share any updates since your last post? Have you or anyone else been able to make wmdistance work more accurately and faster(say, using numpy vectorized operations)? |
@narvind2003 not that I know of. CC @menshikh-iv Such optimizations look trivial though, do you want to give it a try? |
@narvind2003 yes, we can help. |
I'm not sure if this fully addresses the `# TODO: Update to better match & share code with most_similar()` at line piskvorky#981 or not, so I've left it in.
* Allow supplying a string-key as the negative arg. to most_similar() * Allow a single vector as a positive or negative arg. to most_similar() * Update comments * Accept single arguments when positive and negative are both supplied * Update most_similar_cosmul to match most_similar I'm not sure if this fully addresses the `# TODO: Update to better match & share code with most_similar()` at line #981 or not, so I've left it in. * minor code cleanup * add unit tests * Update CHANGELOG.md * remove redundant variable declaration * enforce consistency * respond to review feedback * Update keyedvectors.py Co-authored-by: Michael Penkov <misha.penkov@gmail.com> Co-authored-by: Michael Penkov <m@penkov.dev>
* Allow supplying a string-key as the negative arg. to most_similar() * Allow a single vector as a positive or negative arg. to most_similar() * Update comments * Accept single arguments when positive and negative are both supplied * Update most_similar_cosmul to match most_similar I'm not sure if this fully addresses the `# TODO: Update to better match & share code with most_similar()` at line piskvorky#981 or not, so I've left it in. * minor code cleanup * add unit tests * Update CHANGELOG.md * remove redundant variable declaration * enforce consistency * respond to review feedback * Update keyedvectors.py Co-authored-by: Michael Penkov <misha.penkov@gmail.com> Co-authored-by: Michael Penkov <m@penkov.dev>
* Allow supplying a string-key as the negative arg. to most_similar() * Allow a single vector as a positive or negative arg. to most_similar() * Update comments * Accept single arguments when positive and negative are both supplied * Update most_similar_cosmul to match most_similar I'm not sure if this fully addresses the `# TODO: Update to better match & share code with most_similar()` at line piskvorky#981 or not, so I've left it in. * minor code cleanup * add unit tests * Update CHANGELOG.md * remove redundant variable declaration * enforce consistency * respond to review feedback * Update keyedvectors.py Co-authored-by: Michael Penkov <misha.penkov@gmail.com> Co-authored-by: Michael Penkov <m@penkov.dev>
I'm not able to successfully run the sample WMD code:
This returns 1.25. This is troublesome because it is more than 1, and because the correct answer should be ~0.74
The text was updated successfully, but these errors were encountered: