-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes 'not enough arguments for format string' error #3286
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good - reflecting original unrealized intent!
Great. @gilbertfrancois please fix the flake tests (trailing whitespace not allowed) and we're good to merge. |
Co-authored-by: Radim Řehůřek <me@radimrehurek.com>
Co-authored-by: Radim Řehůřek <me@radimrehurek.com>
Done. Thanks @piskvorky for the suggestions. |
Are some tests failing because of this patch? I can't discover in the logs what the problem is. |
I see segmentation faults in the logs. Although that's weird, for an added log parameter… @gilbertfrancois could you investigate further? Was it my @mpenkov what's the way to run tests locally? I cannot find it anymore on our wiki / https://github.com/RaRe-Technologies/gensim/wiki/Developer-page :( |
I could setup the environment and I am running the tests now. I'll keep you posted about the progress. |
As far as I can see is that the failing tests are not related to the change in this PR. I ran the following experiments using gensim-4.1.2, py37-linux, passes all tests With the current master branch, the tests with ldamulticore takes a lot of time and for py37 it never finishes. However, when running the test directly with unittest instead of pytest and tox, it passes and does the tests very fast. So I had the impression that it was not related to the code, but to the testing method. Moreover, the code of ldamodel and ldamulticore has hardly changed since release 4.1.2. I saw that the testing has changed since the last release. When I remove coverage testing from
I think it is safe to merge this PR and open a new issue to solve the testing + coverage for ensemble_lda and ldamulticore. For reference, I've attached the logs (with verbose option) for the different runs. Fail: Success: |
Thanks so much for your investigation @gilbertfrancois! CC @mpenkov – let's whip the CI testing back into shape. Are the coverage tests good for anything? Do we need them? |
I do think coverage serves a useful purpose, by giving a metric on how much of the written code is tested. But it seems that measuring coverage in combination with multiprocessing is creating problems. I would suggest to temporarily stop measuring coverage or exclude measuring coverage for tests that make use of multiprocessing (e.g. ensemble_lda, ldamulticore), until a solution is found. I found this stackoverflow page that is mentioning problems with coverage + multiprocessing. https://stackoverflow.com/questions/28297497/python-code-coverage-and-multiprocessing Most important is to have good working unittests. |
I personally don't find that particularly useful. But let's wait for @mpenkov 's judgement – he might remember why we added these coverage tests in the first place :) |
@mpenkov I see that you include/exclude platforms and/or python versions for coverage. I'm not sure if that is the right way to fix this problem. For multiprocessing code, coverage needs to be run with the argument |
At the moment, I'm just trying to get CI to pass with some consistency. The coverage stuff appears to be what's causing the build fails (mysteriously...). It sounds like getting coverage to work properly will take more effort, and I'm not sure that this PR is the place to do it. |
Codecov Report
@@ Coverage Diff @@
## develop #3286 +/- ##
===========================================
+ Coverage 79.03% 79.53% +0.50%
===========================================
Files 68 68
Lines 11781 11781
===========================================
+ Hits 9311 9370 +59
+ Misses 2470 2411 -59
Continue to review full report at Codecov.
|
For Doc2Vec, when building the vocabulary and the
raw_int
andcorpus_count
are not the same, the log gives a warning. However, the log string expects 4 arguments, but only 3 are given, so the log produces an error, shown below.The fix was to give the difference of
raw_int
andcorpus_count
as the 3rd argument.Part of the stack trace: