Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature suggestion: relative cosine similarity for word2vec #2175

Closed
viplexke opened this issue Sep 7, 2018 · 14 comments
Closed

Feature suggestion: relative cosine similarity for word2vec #2175

viplexke opened this issue Sep 7, 2018 · 14 comments
Labels
difficulty easy Easy issue: required small fix feature Issue described a new feature

Comments

@viplexke
Copy link

viplexke commented Sep 7, 2018

Hi all,
Based on this paper, do you think it worths the effort to implement relative cosine similarity measure?
https://ufal.mff.cuni.cz/pbml/105/art-leeuwenberg-et-al.pdf

Note that I'm not suggesting this as a potential contributor but as a grateful user.
Thank you,
Viktor

@menshikh-iv menshikh-iv added the feature Issue described a new feature label Sep 10, 2018
@menshikh-iv
Copy link
Contributor

Thanks for request @viplexke, I quickly looked at the article: IMO doesn't look very useful for including it to gensim. Also, the formula for relative cosine similarity looks pretty simple (i.e. any person who need this can implement it self).

CC: @gojomo @piskvorky

@gojomo
Copy link
Collaborator

gojomo commented Sep 17, 2018

It's interesting that it seems to help highlight synonyms, as opposed to other kinds of related words. If it could be done as a single short method in KeyedVectors, I think it'd be a good contribution – even though it's not hard for other to implement, sometimes people only discover new techniques by browing APIs. (Any implementation should cite this origin paper.)

@menshikh-iv menshikh-iv added the difficulty easy Easy issue: required small fix label Sep 18, 2018
@ailsamm
Copy link

ailsamm commented Oct 31, 2018

@gojomo @menshikh-iv How exactly would I go about implementing this "as a single short method in KeyedVectors" (as you say @gojomo ). Excuse the question - I'm very new to Gensim. Thanks!

@gojomo
Copy link
Collaborator

gojomo commented Oct 31, 2018

Section 3.5 of the referenced paper introduced the "relative cosine similarity" measure, which is essentially a measure of how-much-more-similar to word-A that word-B is, compared to the top-N-other-most-similar-to-word-A words. Essentially, it seems they observed that when one word was much "more similar" to a target word than the next N, it was especially likely to be a true synonym. (This "better than the others" was more reliable than any absolute cutoff of cosine-similarity; see the paper for the details and full reasoning.)

So given two words and a top-n value, and a set of word-vectors, this new measure can be calculated. That naturally suggests a single new method on KeyedVectors with a signature like:

def relative_cosine_similarity(word_a, word_b, topn=10):
    ...

A pull-request that implements this method, matching the definition in the paper, with an explanatory doc-comment (with link to the paper) and some tests (which manage to somewhat confirm expected behavior along the lines of that described in the paper) would be a useful contribution.

@ailsamm
Copy link

ailsamm commented Nov 1, 2018

@gojomo Thanks so much! I got it working now.

@rawannasser
Copy link

Hi dear @ailsamm I need to implement the same measure, can you provide me the code please since it works with you
I really need it :(

@ailsamm
Copy link

ailsamm commented Nov 13, 2018

@rawannasser Sure! How should I send it?

@rawannasser
Copy link

@ailsamm
Thanks so much!
I don't know if I can write my email here but maybe you can upload it in your GitHub?

@gojomo
Copy link
Collaborator

gojomo commented Nov 15, 2018

@ailsamm Can you submit your implementation as a pull-request for potential integration to the project?

@jenishah
Copy link
Contributor

jenishah commented Dec 5, 2018

Hi,
Is anyone working on this?
If not, I would like to take this up.

@piskvorky
Copy link
Owner

@jenishah go ahead please :)

@rsdel2007
Copy link
Contributor

If anyone is not working on this can I contribute?

@menshikh-iv
Copy link
Contributor

feel free to contribute @rsdel2007

@rawannasser
Copy link

@rsdel2007 yes, please

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficulty easy Easy issue: required small fix feature Issue described a new feature
Projects
None yet
Development

No branches or pull requests

8 participants