Recurring error #20

thiswillbeyourgithub · 2021-05-10T22:13:18Z

Hi, I frequently have this error message :

Cancelled OCR processing with message :
['Traceback (most recent call last):\n', ' File "/home/USERNAME/.local/share/Anki2/addons21/450181164/gui.py", line 56, in on_run_ocr\n ocr.run_ocr_on_notes(note_ids=selected_nids)\n', ' File "/home/USERNAME/.local/share/Anki2/addons21/450181164/ocr.py", line 306, in run_ocr_on_notes\n notes_query = self.run_ocr_on_query(note_ids=note_ids)\n', ' File "/home/USERNAME/.local/share/Anki2/addons21/450181164/ocr.py", line 285, in run_ocr_on_query\n raw_results = self._ocr_unbatched_process(image_paths=image_paths)\n', ' File "/home/USERNAME/.local/share/Anki2/addons21/450181164/ocr.py", line 149, in _ocr_unbatched_process\n raw_results[image_path] = future.result()\n', ' File "concurrent/futures/_base.py", line 432, in result\n', ' File "concurrent/futures/_base.py", line 388, in __get_result\n', ' File "concurrent/futures/thread.py", line 57, in run\n', ' File "/home/USERNAME/.local/share/Anki2/addons21/450181164/ocr.py", line 262, in _ocr_img\n return pytesseract.image_to_string(str(img_pth), lang="+".join(languages or ["eng"]),\n', ' File "/home/USERNAME/.local/share/Anki2/addons21/450181164/_vendor/pytesseract/pytesseract.py", line 368, in image_to_string\n return {\n', ' File "/home/USERNAME/.local/share/Anki2/addons21/450181164/_vendor/pytesseract/pytesseract.py", line 371, in \n Output.STRING: lambda: run_and_get_output(*args),\n', ' File "/home/USERNAME/.local/share/Anki2/addons21/450181164/_vendor/pytesseract/pytesseract.py", line 280, in run_and_get_output\n run_tesseract(**kwargs)\n', ' File "/home/USERNAME/.local/share/Anki2/addons21/450181164/_vendor/pytesseract/pytesseract.py", line 257, in run_tesseract\n raise TesseractError(proc.returncode, get_errors(error_string))\n', '450181164._vendor.pytesseract.pytesseract.TesseractError: (-11, "read_params_file: Can't open txt read_params_file: Can't open txt Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica Detected 33 diacritics contains_unichar_id(unichar_id):Error:Assert failed:in file ../ccutil/unicharset.h, line 513")\n']

My OS is ubuntu 18, anki version is 2.1.35. I have the latest ankiOCR available for this anki version.

It happens without me understanding what kind of notes causes it. I had it with some text of my lesson that used weird characters (like the female and male thingie like ♂️ ) but it also happenned to random image occlusions.

Unfortunately, this error can happen after having ankiocr running through hundreds of cards in a row without saving the changes :/ So I have to redo it over and over again until I find the culprit cards.

Let me know if I can help you troubleshoot that :)

thiswillbeyourgithub · 2021-05-21T12:21:14Z

Apparently I found a fix. I have no idea what it does, I just followed advice there : tesseract-ocr/tesseract#1205

I'll make a PR right away

thiswillbeyourgithub · 2021-05-21T12:27:57Z

Feel free to close this

cfculhane · 2021-05-21T12:40:02Z

Thanks! I'll look into this when I get a chance, flat out with uni at the moment.

…

On Fri, 21 May 2021, 10:28 pm thiswillbeyourgithub, < ***@***.***> wrote: Feel free to close this — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#20 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJRYIUYC4I645BJMKNZZHKLTOZGV7ANCNFSM44SV6V5A> .

cfculhane · 2021-05-21T12:42:34Z

Oh I just saw how simple the fix was. I'll merge tomorrow, thanks for looking into this and solving it!!

…

On Fri, 21 May 2021, 10:39 pm Chris Culhane, ***@***.***> wrote: Thanks! I'll look into this when I get a chance, flat out with uni at the moment. On Fri, 21 May 2021, 10:28 pm thiswillbeyourgithub, < ***@***.***> wrote: > Feel free to close this > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > <#20 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AJRYIUYC4I645BJMKNZZHKLTOZGV7ANCNFSM44SV6V5A> > . >

thiswillbeyourgithub · 2021-05-21T17:21:52Z

Flat out with uni too :) good luck with your exams!

Beware, I have no idea if this fix causes issue on other setups. Maybe it's just my installation of pytesseract or linux integration, who knows...

cfculhane · 2021-05-22T03:54:10Z

Closed with new release

thiswillbeyourgithub · 2023-03-19T16:32:05Z

Just a heads up: it seems I've been having consistently better results using the following tesseract config : --oem 2 --psm 12 -c preserve_interword_spaces=1

relevant documentation :

 --psm N
     Set Tesseract to only run a subset of layout analysis and assume a
     certain form of image. The options for N are:
         0 = Orientation and script detection (OSD) only.
         1 = Automatic page segmentation with OSD.
         2 = Automatic page segmentation, but no OSD, or OCR. (not implemented)
         3 = Fully automatic page segmentation, but no OSD. (Default)
         4 = Assume a single column of text of variable sizes.
         5 = Assume a single uniform block of vertically aligned text.
         6 = Assume a single uniform block of text.
         7 = Treat the image as a single text line.
         8 = Treat the image as a single word.
         9 = Treat the image as a single word in a circle.
         10 = Treat the image as a single character.
         11 = Sparse text. Find as much text as possible in no particular order.
         12 = Sparse text with OSD.
         13 = Raw line. Treat the image as a single text line,
              bypassing hacks that are Tesseract-specific.
 --oem N
           Specify OCR Engine mode. The options for N are:
               0 = Original Tesseract only.
               1 = Neural nets LSTM only.
               2 = Tesseract + LSTM.
               3 = Default, based on what is available.

I think that changing the oem value can be a breaking change because it depends on which engine is present on the user's computer but the 2 other flags seem like a free meal.

thiswillbeyourgithub mentioned this issue May 21, 2021

fix #20 #21

Closed

cfculhane closed this as completed May 22, 2021

cfculhane added a commit that referenced this issue Sep 4, 2021

Fix #20

5515c0d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recurring error #20

Recurring error #20

thiswillbeyourgithub commented May 10, 2021 •

edited

Loading

thiswillbeyourgithub commented May 21, 2021

thiswillbeyourgithub commented May 21, 2021

cfculhane commented May 21, 2021 via email

cfculhane commented May 21, 2021 via email

thiswillbeyourgithub commented May 21, 2021 •

edited

Loading

cfculhane commented May 22, 2021

thiswillbeyourgithub commented Mar 19, 2023

Recurring error #20

Recurring error #20

Comments

thiswillbeyourgithub commented May 10, 2021 • edited Loading

thiswillbeyourgithub commented May 21, 2021

thiswillbeyourgithub commented May 21, 2021

cfculhane commented May 21, 2021 via email

cfculhane commented May 21, 2021 via email

thiswillbeyourgithub commented May 21, 2021 • edited Loading

cfculhane commented May 22, 2021

thiswillbeyourgithub commented Mar 19, 2023

thiswillbeyourgithub commented May 10, 2021 •

edited

Loading

thiswillbeyourgithub commented May 21, 2021 •

edited

Loading