-
Notifications
You must be signed in to change notification settings - Fork 725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check whether tesseract supports jpeg2000 or not #419
Comments
This error is related to tesseract itself - which version that? |
Oh right: tesseract 5.1.0 The image used by the test: https://github.com/madmaze/pytesseract/blob/v0.3.10/tests/data/test.jpeg2000 |
Well, hmmm. CI on master passes, so not shure what is going on there. At this point, I would check what changed in 5.1.0 in order to not support jpeg2000, because clearly 4.x works with jpeg2000. Have you tried using tesseract directly with the jpeg2000 image? |
I haven't yet used tesseract, I only build pytesseract to provide as an optional dependency for urlwatch in the Arch repos. |
At the moment, I don't have tesseract 5.1.0 around + Arch instance in order to test if it is pytesseract related or tesseract specific issue. When I have time, I will try to boot up a container with that setup in order to check. |
Same issue here. I debugged it, and in my case the root cause was determined as follows:
The remedy for me was to recompile leptonica with OpenJPEG 2.4.0 support. However for py-pytesseract, it should skip the test if there are indications that tesseract does not support JPEG2000. |
Thank you for investigating that @mandree - I am not sure if there is a nice way to ask tesseract if that is the case or not. |
You can query First two examples from FreeBSD 13.0 amd64, third and last example on Fedora 35 x86_64. With JPEG2000 support:
And without:
Fedora Linux:
|
ref: madmaze/pytesseract#419 (comment) git-svn-id: file:///srv/repos/svn-community/svn@1177882 9fca08f4-af9d-4005-b8df-a31f2cc04f65
ref: madmaze/pytesseract#419 (comment) git-svn-id: file:///srv/repos/svn-community/svn@1177882 9fca08f4-af9d-4005-b8df-a31f2cc04f65
Hi @bozhodimitrov you marked this as completed but I do not see relevant commits nor a comment. |
Hi, old issue + it is just closed, not completed + pytesseract does the right thing to notify the users of the underlying tesseract error. From there on it is responsibility of the user to update their stack with supported third-party components. Unless you want to make a PR with parsing all supported formats while invoking the The current error report is enough for all users that search for this specific error to find the workaround that you all shared. Let me know what you think. |
pytesseract.pytesseract.TesseractError: (1, 'Error in pixReadStreamJp2k: function not present Error in pixReadStream: jp2: no pix returned Error in pixRead: pix not read Error during processing.')
pytesseract 0.3.10
tesseract 5.1.0
pillow 9.0.1
openjpeg2 2.4.0
pytest 7.1.0
python 3.10.2
Old title:
test_image_to_string_with_image_type[jpeg2000] failure with tesseract >4.1.x
The text was updated successfully, but these errors were encountered: