-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
streamlining most_similar_cosmul and evaluate_word_analogies #2656
streamlining most_similar_cosmul and evaluate_word_analogies #2656
Conversation
+1 (having looked over code, but not tested functionality) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution. I left you some minor comments. Please have a look.
gensim/models/keyedvectors.py
Outdated
# allow calls like most_similar_cosmul('dog'), as a shorthand for most_similar_cosmul(['dog']) | ||
positive = [positive] | ||
|
||
if isinstance(negative, string_types): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, this enables behavior like:
most_similar_cosmul('dog', 'cat')
where dog
is positive and cat
is negative. That's helpful shorthand, but without documentation, people won't find out about it.
Can you please add a paragraph to the docstring explaining the above shorthand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this just makes the special type-testing treatment of negative
match that of positive
(in both most_similar()
and most_similar_cosmul()
) – but that special treatment, while used extensively in examples, isn't currently documented even in the most_similar()
case! I'd suggest that treating negative
symmetrically with positive
is a good idea, and should also be done in most_similar()
for consistency, and both of their doc-comments should be improved/harmonized to explain this behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mpenkov added shorthand in docstring for clarity, is it okay or is there some other place for docs as well that this should be added?
@n3hrox Ping! Are you able to finish this PR? |
@mpenkov I will try to come back to this during weekend. I waited really long for this to be reviewed, started new job and had completely no time during Dec/Jan |
@mpenkov I adjusted PR accordingly, please re-review |
Yeah, looks like they responded right after we marked it as stale, and then we didn't follow up. |
I think the correct action is to reopen and push this over the line ourselves. @n3hrox Sorry for the delay. This fell off our radar. |
Codecov Report
@@ Coverage Diff @@
## develop #2656 +/- ##
===========================================
- Coverage 79.53% 79.52% -0.02%
===========================================
Files 68 68
Lines 11781 11785 +4
===========================================
+ Hits 9370 9372 +2
- Misses 2411 2413 +2
Continue to review full report at Codecov.
|
Merging. Thank you for your contribution and your patience @n3hrox ! |
Closes: #2535
This is my first PR for gensim so all comments are welcome.
To be honest I have no idea how to test
restrict_vocab
formost_similar_cosmul
ormost_similar
forevaluate_word_analogies
. I wanted to write something similar to already existing tests for these keywords but did not find any (nor tests forrestrict_vocab
keyword in case ofmost_similar
function and normost_similar
keyword in case ofaccuracy
function)Summary:
restrict_vocab
parameter tomost_similar_cosmul
most_similar_cosmul
shorthand to handle both positive and negative casesevaluate_word_analogies