Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

streamlining most_similar_cosmul and evaluate_word_analogies #2656

Merged
merged 5 commits into from
Mar 22, 2022

Conversation

n3hrox
Copy link
Contributor

@n3hrox n3hrox commented Oct 28, 2019

Closes: #2535

This is my first PR for gensim so all comments are welcome.
To be honest I have no idea how to test restrict_vocab for most_similar_cosmul or most_similar for evaluate_word_analogies. I wanted to write something similar to already existing tests for these keywords but did not find any (nor tests for restrict_vocab keyword in case of most_similar function and nor most_similar keyword in case of accuracy function)

Summary:

  • Added new restrict_vocab parameter to most_similar_cosmul
  • Improved most_similar_cosmul shorthand to handle both positive and negative cases
  • Parameterized similarity function in evaluate_word_analogies

@gojomo
Copy link
Collaborator

gojomo commented Nov 26, 2019

+1 (having looked over code, but not tested functionality)

@mpenkov mpenkov changed the title streamlining most_similar_cosmul streamlining most_similar_cosmul and evaluate_word_analogies Dec 2, 2019
Copy link
Collaborator

@mpenkov mpenkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution. I left you some minor comments. Please have a look.

# allow calls like most_similar_cosmul('dog'), as a shorthand for most_similar_cosmul(['dog'])
positive = [positive]

if isinstance(negative, string_types):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this enables behavior like:

most_similar_cosmul('dog', 'cat')

where dog is positive and cat is negative. That's helpful shorthand, but without documentation, people won't find out about it.

Can you please add a paragraph to the docstring explaining the above shorthand?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this just makes the special type-testing treatment of negative match that of positive (in both most_similar() and most_similar_cosmul()) – but that special treatment, while used extensively in examples, isn't currently documented even in the most_similar() case! I'd suggest that treating negative symmetrically with positive is a good idea, and should also be done in most_similar() for consistency, and both of their doc-comments should be improved/harmonized to explain this behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mpenkov added shorthand in docstring for clarity, is it okay or is there some other place for docs as well that this should be added?

gensim/models/keyedvectors.py Show resolved Hide resolved
@mpenkov
Copy link
Collaborator

mpenkov commented Jan 23, 2020

@n3hrox Ping! Are you able to finish this PR?

@mpenkov mpenkov added the stale Waiting for author to complete contribution, no recent effort label Jan 23, 2020
@n3hrox
Copy link
Contributor Author

n3hrox commented Jan 23, 2020

@mpenkov I will try to come back to this during weekend. I waited really long for this to be reviewed, started new job and had completely no time during Dec/Jan

@n3hrox n3hrox requested a review from mpenkov January 27, 2020 19:27
@n3hrox
Copy link
Contributor Author

n3hrox commented Jan 27, 2020

@mpenkov I adjusted PR accordingly, please re-review

@n3hrox n3hrox closed this Jan 28, 2022
@piskvorky
Copy link
Owner

piskvorky commented Jan 28, 2022

@mpenkov what happened here? PR was marked Stale, but it looks like @n3hrox did respond (2 years ago…). Was this good to merge, should I reopen?

@mpenkov
Copy link
Collaborator

mpenkov commented Jan 29, 2022

Yeah, looks like they responded right after we marked it as stale, and then we didn't follow up.

@mpenkov
Copy link
Collaborator

mpenkov commented Jan 29, 2022

I think the correct action is to reopen and push this over the line ourselves.

@n3hrox Sorry for the delay. This fell off our radar.

@mpenkov mpenkov reopened this Jan 29, 2022
@mpenkov mpenkov removed the stale Waiting for author to complete contribution, no recent effort label Jan 29, 2022
@mpenkov mpenkov self-assigned this Jan 29, 2022
@piskvorky piskvorky added this to the Next release milestone Feb 19, 2022
@codecov
Copy link

codecov bot commented Mar 20, 2022

Codecov Report

Merging #2656 (719bd0e) into develop (a936521) will decrease coverage by 0.01%.
The diff coverage is 71.42%.

@@             Coverage Diff             @@
##           develop    #2656      +/-   ##
===========================================
- Coverage    79.53%   79.52%   -0.02%     
===========================================
  Files           68       68              
  Lines        11781    11785       +4     
===========================================
+ Hits          9370     9372       +2     
- Misses        2411     2413       +2     
Impacted Files Coverage Δ
gensim/models/keyedvectors.py 82.73% <71.42%> (+0.09%) ⬆️
gensim/utils.py 71.54% <0.00%> (-0.33%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a936521...719bd0e. Read the comment docs.

@mpenkov
Copy link
Collaborator

mpenkov commented Mar 22, 2022

Merging. Thank you for your contribution and your patience @n3hrox !

@mpenkov mpenkov merged commit ac3bbcd into piskvorky:develop Mar 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

streamlining most_similar_cosmul
4 participants