-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix str()
method in WmdSimilarity
#3282
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #3282 +/- ##
===========================================
- Coverage 81.39% 79.53% -1.86%
===========================================
Files 122 68 -54
Lines 21107 11783 -9324
===========================================
- Hits 17179 9372 -7807
+ Misses 3928 2411 -1517
Continue to review full report at Codecov.
|
Hi @piskvorky , I have checked the only failed build wheel (3.6, ubuntu-latest, x64), but it seems to be unrelated to this PR. The error is in test_word2vec.py, about word similarity ranks. The correct rank is 2 but it returns 3. Thank you so much! |
@austereantelope seems related to your recent PR change – please check. |
@piskvorky yes I noticed this issue, this new PR #3281 addresses it |
@austereantelope Hi! I am wondering why would tests on Ubuntu with different python version have different check results for test_word2vec.py (some pass, some do not). Do you find out the underlying reason? Thank you so much! |
@DingQK this test will have different results not neccessarily because of the different python version, the underlying reason is the test is non-deterministic meaning if I ran this test multiple times I might get different results (regardless of the environment). The source of the non-determinisim might be randomness in the algorithm and in this case I believe the test is using multiple workers which also might introduce some non-determinisim |
@austereantelope Thank you so much for your explanation! Maybe the tests need to consider this non-deterministic characteristic (in test_word2vec.py). |
Hi @piskvorky , According to the kind explanation of @austereantelope, if the test is indeterministic, perhaps rerunning the failed build wheel (3.6, ubuntu-latest, x64) might help? The fix on wmdsimilarity's str() method seems irrelevant to this error. Thank you so much! |
I don't see a way to re-run just one job in Github Actions (@mpenkov is this possible?). So I re-ran the entire CI workflow. |
@DingQK also, can you add a unit test? Tests are under https://github.com/RaRe-Technologies/gensim/tree/develop/gensim/test |
@piskvorky You mean a unit test solely for this str() method in wmdsimilarity (for example test_wmdsimilarity.py)? I can do that! |
Yes. Ideally add the new test among existing WmdSimilarity tests (if they already exist). Thanks! |
Hi @piskvorky , While adding unit test, I found out an inconsistency between WmdSimilarity usage in documentation and in unit tests. In document, model = Word2Vec(common_texts, vector_size=20, min_count=1)
index = WmdSimilarity(common_texts, model) whereas in the definition of function and unit test, self.cls = similarities.WmdSimilarity
self.w2v_model = Word2Vec(TEXTS, min_count=1).wv
index = self.cls(TEXTS, self.w2v_model) Maybe the documentation needs an update? And which usage should I follow in adding the unit test? Thank you so much! |
What documentation? And what inconsistency? I don't follow. |
@piskvorky Sorry for my unclear description. The documentation I am referring to is https://radimrehurek.com/gensim/similarities/docsim.html#gensim.similarities.docsim.WmdSimilarity. The documentation's directory in this repo is In the documentation, the second input of the If this is an inconsistency, which usage should I follow and may I fix this issue? Thank you so much for your time! |
I see what you mean. I'm not sure which one is correct – can you check the code please? |
Hi @piskvorky , I check the code closely and I believe the documentation needs an update as the second input is class KeyedVectors. I updated the documentation and the annotation code. Also, I add a unit test for this str method in test_similarities.py. Thank you so much! |
Looks good, thanks! I left one minor request in the code review; otherwise ready for review & merge under @mpenkov's watchful eye. |
I have changed the code as requested in my latest commit. It is my pleasure to contribute! And thank you so much for your patient and kind help! @piskvorky |
Merged. Thank you for your patience @DingQK !! |
This PR fixes #3269 , by updating str() function through the using of latest names. The test result is as follows: