-
Notifications
You must be signed in to change notification settings - Fork 152
Use '--psm' instead of '-psm' as the option was deprecated. #100
Use '--psm' instead of '-psm' as the option was deprecated. #100
Conversation
This recently changed in the official tesseract engine [0]. '-psm' is not allowed as an option anymore and '--psm' has to be used instead. [0] tesseract-ocr/tesseract@ee201e1
Not so simple, unfortunately. With Tesseract 3.03: $ tesseract --psm 9
Tesseract Open Source OCR Engine v3.03 with Leptonica
Cannot open input file: --psm AFAIK, Tesseract 4 is still not stable (and not available in Debian stable anyway). Therefore support for all versions of Tesseract > 3 must be maintained. |
Ah yes, that makes sense. I shall have a look and update the pull request. Thanks for checking! |
It turns out that for versions before the current 4 beta only '-psm' is allowed, and the latest build only allows '--psm'.
def psm_parameter(): | ||
"""Return the psm option string depending on the Tesseract version.""" | ||
version = get_version() | ||
if version[0] <= 3: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could have used Python ternary operator: return "--psm" if version[0] > 3 else "-psm"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed this in the updated pull request.
Thanks :) |
Also the Python2 tests are broken because of a dependency loop in the modules ( |
(I should have run the tests before merging ...) |
you can run the tests and the checks with |
Actually, Python 3 tests are broken too. |
Oh no! Can you revert the merge on master and I'll update the pull request again? I had not actually tried out the latest commit yet as I'm developing on a different machine from which I have my document scans running. |
Reverted: 7189c69 |
By the way, Pyocr tests are unfortunately unreliable: OCR output differ too much from one system to another. --> you can (and must) run the tests to make sure that Pyocr seems to work, but in any case you will always have failed tests. You will have to look at the error messages: If PyOCR works, error messages will show that the tests failed due to the exact content returned by the OCR. That's something I'll have to fix later. :/ |
Actually, the code style errors returned by |
This recently changed in the official tesseract engine [0].
-psm
is not allowed as an option anymore and--psm
has to be used instead.This fixes #99.
Thanks!
[0] tesseract-ocr/tesseract@ee201e1