streamlining most_similar_cosmul and evaluate_word_analogies #2656

n3hrox · 2019-10-28T19:36:22Z

Closes: #2535

This is my first PR for gensim so all comments are welcome.
To be honest I have no idea how to test restrict_vocab for most_similar_cosmul or most_similar for evaluate_word_analogies. I wanted to write something similar to already existing tests for these keywords but did not find any (nor tests for restrict_vocab keyword in case of most_similar function and nor most_similar keyword in case of accuracy function)

Summary:

Added new restrict_vocab parameter to most_similar_cosmul
Improved most_similar_cosmul shorthand to handle both positive and negative cases
Parameterized similarity function in evaluate_word_analogies

gojomo · 2019-11-26T23:16:41Z

+1 (having looked over code, but not tested functionality)

mpenkov

Thank you for your contribution. I left you some minor comments. Please have a look.

mpenkov · 2019-12-02T20:48:35Z

gensim/models/keyedvectors.py

            # allow calls like most_similar_cosmul('dog'), as a shorthand for most_similar_cosmul(['dog'])
            positive = [positive]

+        if isinstance(negative, string_types):


If I understand correctly, this enables behavior like:

most_similar_cosmul('dog', 'cat')

where dog is positive and cat is negative. That's helpful shorthand, but without documentation, people won't find out about it.

Can you please add a paragraph to the docstring explaining the above shorthand?

Note that this just makes the special type-testing treatment of negative match that of positive (in both most_similar() and most_similar_cosmul()) – but that special treatment, while used extensively in examples, isn't currently documented even in the most_similar() case! I'd suggest that treating negative symmetrically with positive is a good idea, and should also be done in most_similar() for consistency, and both of their doc-comments should be improved/harmonized to explain this behavior.

@mpenkov added shorthand in docstring for clarity, is it okay or is there some other place for docs as well that this should be added?

gensim/models/keyedvectors.py

mpenkov · 2020-01-23T08:07:07Z

@n3hrox Ping! Are you able to finish this PR?

n3hrox · 2020-01-23T10:32:06Z

@mpenkov I will try to come back to this during weekend. I waited really long for this to be reviewed, started new job and had completely no time during Dec/Jan

n3hrox · 2020-01-27T19:27:56Z

@mpenkov I adjusted PR accordingly, please re-review

piskvorky · 2022-01-28T17:00:35Z

@mpenkov what happened here? PR was marked Stale, but it looks like @n3hrox did respond (2 years ago…). Was this good to merge, should I reopen?

mpenkov · 2022-01-29T02:17:31Z

Yeah, looks like they responded right after we marked it as stale, and then we didn't follow up.

mpenkov · 2022-01-29T02:18:48Z

I think the correct action is to reopen and push this over the line ourselves.

@n3hrox Sorry for the delay. This fell off our radar.

…similar-cosmul

codecov · 2022-03-20T03:28:06Z

Codecov Report

Merging #2656 (719bd0e) into develop (a936521) will decrease coverage by 0.01%.
The diff coverage is 71.42%.

@@             Coverage Diff             @@
##           develop    #2656      +/-   ##
===========================================
- Coverage    79.53%   79.52%   -0.02%     
===========================================
  Files           68       68              
  Lines        11781    11785       +4     
===========================================
+ Hits          9370     9372       +2     
- Misses        2411     2413       +2

Impacted Files	Coverage Δ
gensim/models/keyedvectors.py	`82.73% <71.42%> (+0.09%)`	⬆️
gensim/utils.py	`71.54% <0.00%> (-0.33%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a936521...719bd0e. Read the comment docs.

mpenkov · 2022-03-22T01:05:24Z

Merging. Thank you for your contribution and your patience @n3hrox !

streamlining most_similar_cosmul

3648a07

mpenkov changed the title ~~streamlining most_similar_cosmul~~ streamlining most_similar_cosmul and evaluate_word_analogies Dec 2, 2019

mpenkov requested changes Dec 2, 2019

View reviewed changes

mpenkov added the stale Waiting for author to complete contribution, no recent effort label Jan 23, 2020

Fix PR requested changes and add unit test

a36fae4

n3hrox requested a review from mpenkov January 27, 2020 19:27

n3hrox closed this Jan 28, 2022

mpenkov reopened this Jan 29, 2022

mpenkov removed the stale Waiting for author to complete contribution, no recent effort label Jan 29, 2022

mpenkov self-assigned this Jan 29, 2022

piskvorky added this to the Next release milestone Feb 19, 2022

mpenkov added 3 commits February 26, 2022 15:23

Merge remote-tracking branch 'upstream/develop' into streamline-most-…

df97d95

…similar-cosmul

fix merge artifacts

5ebebaf

Merge remote-tracking branch 'upstream/develop' into streamline-most-…

719bd0e

…similar-cosmul

mpenkov approved these changes Mar 22, 2022

View reviewed changes

mpenkov merged commit ac3bbcd into piskvorky:develop Mar 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

streamlining most_similar_cosmul and evaluate_word_analogies #2656

streamlining most_similar_cosmul and evaluate_word_analogies #2656

n3hrox commented Oct 28, 2019 •

edited by mpenkov

Loading

gojomo commented Nov 26, 2019

mpenkov left a comment

mpenkov Dec 2, 2019

gojomo Dec 2, 2019

n3hrox Jan 26, 2020

mpenkov commented Jan 23, 2020

n3hrox commented Jan 23, 2020

n3hrox commented Jan 27, 2020

piskvorky commented Jan 28, 2022 •

edited

Loading

mpenkov commented Jan 29, 2022

mpenkov commented Jan 29, 2022

codecov bot commented Mar 20, 2022

mpenkov commented Mar 22, 2022

streamlining most_similar_cosmul and evaluate_word_analogies #2656

streamlining most_similar_cosmul and evaluate_word_analogies #2656

Conversation

n3hrox commented Oct 28, 2019 • edited by mpenkov Loading

gojomo commented Nov 26, 2019

mpenkov left a comment

Choose a reason for hiding this comment

mpenkov Dec 2, 2019

Choose a reason for hiding this comment

gojomo Dec 2, 2019

Choose a reason for hiding this comment

n3hrox Jan 26, 2020

Choose a reason for hiding this comment

mpenkov commented Jan 23, 2020

n3hrox commented Jan 23, 2020

n3hrox commented Jan 27, 2020

piskvorky commented Jan 28, 2022 • edited Loading

mpenkov commented Jan 29, 2022

mpenkov commented Jan 29, 2022

codecov bot commented Mar 20, 2022

Codecov Report

mpenkov commented Mar 22, 2022

n3hrox commented Oct 28, 2019 •

edited by mpenkov

Loading

piskvorky commented Jan 28, 2022 •

edited

Loading