-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tesseract crashes when processing certain documents #1181
Comments
The latest traineddatas (tessdata_best and Tessdata_fast) do not support legacy tesseract engine, so --oem 0 and --oem 2 are not supported. However, program should not crash but rather give an error message. |
Thanks for the update. Where can I get the 'latest traineddata' please? I got my data from https://github.com/tesseract-ocr/tessdata/ |
Please see
https://github.com/tesseract-ocr/tesseract/wiki/Data-Files#updated-data-files-for-version-400-september-15-2017
ShreeDevi
…____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Mon, Oct 23, 2017 at 10:21 AM, Tao ***@***.***> wrote:
Thanks for the update. Where can I get the 'latest traineddata' please? I
got my data from https://github.com/tesseract-ocr/tessdata/
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1181 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AE2_o1A6Ocre9MuS8eUo5CfbBl_Ibfk2ks5svBtUgaJpZM4QCEnk>
.
|
Thanks for your reply. Just to be rigid. I was using tessdata, which supports oem mode 2 according to the wiki. |
Although I wrote something similar to the above remark in the wiki, since commits [1] and [2]. [1] tesseract-ocr/tessdata_best@f1d1268 |
Since the discussion is a bit side-tracked, I'm repeating the problem. The original� comment is also updated. Tesseract v4.00.00dev-692-gad5ee184 crashes when using |
@PaniniGelato Are you sure? As reported in the "Environment" section, this crash happens on the build from master commit ad5ee18. And I can still reproduce the crash. Did you use the same tessdata as described in my previous comment? |
It does not crash on windows (the latest code):
I will try to test it on linux later. In meantime please check if you are using the latest traineddata... |
This is the well known assertion |
Environment
Tesseract Version:
Tesseract Open Source OCR Engine v4.00.00dev-692-gad5ee184 with Leptonica
Platform:
Platform: Linux 4.9.43-17.39.amzn1.x86_64 defect issue #1 SMP x86_64 GNU/Linux
Current Behavior:
The following command will crash in the above stated environment:
tesseract /tmp/tr_tmp.jpg /tmp/tr_tmp --tessdata-dir /var/task/tessdata --psm 12 --oem 2 -l eng hocr
tessdata is legacy data from tesseract-ocr/tessdata
Crash error: Assert failed:in file ../ccutil/unicharset.h, line 513
related jpg file:
![c767234e7e51a92ee5a9c211f5892ad66b990e75-2](https://user-images.githubusercontent.com/399202/31865534-f08d1728-b73d-11e7-91b1-a31002dd1061.jpg)
related binaries:
Archive.zip
Expected Behavior:
When tesseract v4 using
--oem 2
and legacy trained data, error message of missing LSTM data should be printed instead of crashing.Suggested Fix:
n/a
The text was updated successfully, but these errors were encountered: