Skip to content
This repository has been archived by the owner on Jun 14, 2018. It is now read-only.

Use '--psm' instead of '-psm' as the option was deprecated. #100

Conversation

ddddavidmartin
Copy link
Contributor

This recently changed in the official tesseract engine [0]. -psm is not allowed as an option anymore and --psm has to be used instead.

This fixes #99.

Thanks!

[0] tesseract-ocr/tesseract@ee201e1

This recently changed in the official tesseract engine [0]. '-psm' is not
allowed as an option anymore and '--psm' has to be used instead.

[0] tesseract-ocr/tesseract@ee201e1
@jflesch
Copy link
Member

jflesch commented Jun 7, 2018

Not so simple, unfortunately.

With Tesseract 3.03:

$ tesseract --psm 9
Tesseract Open Source OCR Engine v3.03 with Leptonica
Cannot open input file: --psm

AFAIK, Tesseract 4 is still not stable (and not available in Debian stable anyway). Therefore support for all versions of Tesseract > 3 must be maintained.
The best way to work around that problem would be to call get_version() and decide what to do based on the version of Tesseract. Similar fixes already exist in can_detect_orientation() and detect_orientation().

@ddddavidmartin
Copy link
Contributor Author

Ah yes, that makes sense. I shall have a look and update the pull request. Thanks for checking!

It turns out that for versions before the current 4 beta only '-psm' is
allowed, and the latest build only allows '--psm'.
def psm_parameter():
"""Return the psm option string depending on the Tesseract version."""
version = get_version()
if version[0] <= 3:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could have used Python ternary operator: return "--psm" if version[0] > 3 else "-psm"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this in the updated pull request.

@jflesch
Copy link
Member

jflesch commented Jun 11, 2018

Thanks :)

@jflesch jflesch merged commit 2c41670 into openpaperwork:master Jun 11, 2018
@jflesch
Copy link
Member

jflesch commented Jun 11, 2018

@jflesch
Copy link
Member

jflesch commented Jun 11, 2018

Also the Python2 tests are broken because of a dependency loop in the modules (tesseract tries to import builder who tries to import tesseract.psm_parameter) ...

@jflesch
Copy link
Member

jflesch commented Jun 11, 2018

(I should have run the tests before merging ...)

@jflesch
Copy link
Member

jflesch commented Jun 11, 2018

you can run the tests and the checks with make test (requires tox) and make check (requires pyflake8)

@jflesch
Copy link
Member

jflesch commented Jun 11, 2018

Actually, Python 3 tests are broken too.

@ddddavidmartin
Copy link
Contributor Author

Oh no! Can you revert the merge on master and I'll update the pull request again?

I had not actually tried out the latest commit yet as I'm developing on a different machine from which I have my document scans running.

jflesch added a commit that referenced this pull request Jun 11, 2018
…_psm_option_string":

- Breaks tests (module import loop)
- Breaks style checks

This reverts commit 2c41670, reversing
changes made to 31fb5f1.
@jflesch
Copy link
Member

jflesch commented Jun 11, 2018

Reverted: 7189c69

@jflesch
Copy link
Member

jflesch commented Jun 11, 2018

By the way, Pyocr tests are unfortunately unreliable: OCR output differ too much from one system to another. --> you can (and must) run the tests to make sure that Pyocr seems to work, but in any case you will always have failed tests. You will have to look at the error messages: If PyOCR works, error messages will show that the tests failed due to the exact content returned by the OCR.

That's something I'll have to fix later. :/

@jflesch
Copy link
Member

jflesch commented Jun 11, 2018

Actually, the code style errors returned by make check are my mistake. I'm fixing them right now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants