-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The accuracy of v4.0.2 is reduced compared to v2.1.4 #717
Comments
I found that tesseract cli is able to recognize the text properly, maybe tesseract.js needs to upgrade tesseract from 5.1.0 to 5.3.0?
|
This is an interesting issue--I was able to replicate using the image provided. Notably, this image has light text on a dark background which Tesseract deals with differently (it needs to detect and invert). When the image is inverted ahead of time (see attached image) it recognizes properly. Therefore, perhaps the issue is specific to this type of text. When I have some free time I will update the version of Tesseract we're using and see if that resolves. There do appear to have been some changes relating to inverted text. |
Updating Tesseract to 5.3.0 appears to have resolved--must have been a bug with the version of Tesseract we were using before. I've updated Tesseract.js and created a new release (v4.0.3), so updating Tesseract.js to the latest version should resolve. Thank you for reporting this issue. |
Describe the bug
The accuracy of v4.0.2 is reduced compared to v2.1.4
To Reproduce
Use v2.1.4 and v4.0.2 versions to identify the following images respectively:
v2.1.4: https://codesandbox.io/s/eager-jasper-9drw5o
v2.1.4 accurately identifies the text in the diagram
v4.0.2: https://codesandbox.io/s/busy-blackburn-pes3yi
The content recognized by v4.0.2 is garbled
Expected behavior
v4.0.2 can accurately recognize the text in the figure
Desktop (please complete the following information):
The text was updated successfully, but these errors were encountered: