FileNotFoundError tmp file #454

filirnd · 2022-09-30T06:54:21Z

Hi,
I'm working on a Fedora 32 distro with tesseract 5.0.0-alpha-20201224 and pytesseract Version: 0.3.10.

when I call this function
image_to_string(image, lang="letsgodigital", config="--oem 4 --psm 100 -c tessedit_char_whitelist=.0123456789")
I received this error:

Traceback (most recent call last):
  File "main2.py", line 8, in <module>
    str_Res = digital_display_ocr.ocr_image(image)
  File "/home/fili/workspace/python/digit_recognition/digital_display_ocr.py", line 107, in ocr_image
    return image_to_string(otsu_thresh_image, lang="letsgodigital", config="--oem 4 --psm 100 -c tessedit_char_whitelist=.0123456789")
  File "/home/fili/.local/lib/python3.8/site-packages/pytesseract/pytesseract.py", line 423, in image_to_string
    return {
  File "/home/fili/.local/lib/python3.8/site-packages/pytesseract/pytesseract.py", line 426, in <lambda>
    Output.STRING: lambda: run_and_get_output(*args),
  File "/home/fili/.local/lib/python3.8/site-packages/pytesseract/pytesseract.py", line 290, in run_and_get_output
    with open(filename, 'rb') as output_file:
FileNotFoundError: [Errno 2] File o directory non esistente: '/tmp/tess_4p1bawg8.txt'

How can I resolve this?

The text was updated successfully, but these errors were encountered:

stefan6419846 · 2022-09-30T07:04:20Z

Could you please try to quote the tessedit_char_whitelist value, id est config="--oem 4 --psm 100 -c tessedit_char_whitelist='.0123456789'"?

filirnd · 2022-09-30T07:08:22Z

Could you please try to quote the tessedit_char_whitelist value, id est config="--oem 4 --psm 100 -c tessedit_char_whitelist='.0123456789'"?

Same error.
It seems like pytesseract can't write the output file.

stefan6419846 · 2022-09-30T07:11:05Z

pytesseract will never actually write to the output file itself, but the underlying Tesseract binary. Do you get any output if you call Tesseract manually with your options? What is the final cmd_args in your case in pytesseract.pytesseract.run_tesseract?

Additionally: It seems like you are using a pre-release version of Tesseract. Are you able to test this with a regular release, as Tesseract 5.2.0 has been released about half a year ago?

jeb2112 · 2022-12-23T15:35:17Z

I got the same error on the image_to_pdf_or_hocr method and saw at line 290 of pytesseract.py (and as is seen above in quoted error message from filirnd), that the output tmp file is being opened as 'rb', when it seems like it should be 'wb'.

stefan6419846 · 2022-12-24T09:06:23Z

I got the same error on the image_to_pdf_or_hocr method [...]

As mentioned in my previous comment: What command is sent to Tesseract? Which Tesseract version are you using? What happens if you call Tesseract on this file directly? Do you have a reproducer? Nevertheless, this probably is a Tesseract issue, not a pytesseract one.

[...] and saw at line 290 of pytesseract.py (and as is seen above in quoted error message from filirnd), that the output tmp file is being opened as 'rb', when it seems like it should be 'wb'.

rb is completely fine here, as the corresponding file should be created by Tesseract. If it is not being created, this seems to be a Tesseract issue which might be related to your input data (see the first part of my comment).

jeb2112 · 2022-12-27T02:05:40Z

rb is completely fine here, as the corresponding file should be created by Tesseract. If it is not being created, this seems to be a Tesseract issue which might be related to your input data (see the first part of my comment).

OK, this info helped me dig into it a bit farther, as I certainly hadn't understood the underlying call to tesseract binary, but yes of course I now see that is 'rb' for an input file. So eventually I found instead that I had a missing argument to the pytesseract.image_to_pdf_or_hocr. Whereas I had copied this from the example on the pytesseract doc page at https://pypi.org/project/pytesseract/:
hocr = pytesseract.image_to_pdf_or_hocr('test.png', extension='hocr')
what was missing was the additional config argument:
hocr = pytesseract.image_to_pdf_or_hocr('test.png', extension='hocr', config='-c tessedit_create_hocr=1)
without this additional argument, the tesseract binary is creating a .txt extension by some kind of default, even though the extension arg is being correctly assigned as 'hocr'.

stefan6419846 · 2022-12-30T08:37:10Z

I cannot reproduce this issue on Linux running Tesseract 4.1.0 (shortened output):

stefan@localhost:~/tmp$ python script.py 
b'<?xml version="1.0" encoding="UTF-8"?>\n<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"\n    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">\n <head>\n  <title></title>\n  <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>\n  <meta name=\'ocr-system\' content=\'tesseract 4.1.0\' />\n  [...]
stefan@localhost:~/tmp$ vi script.py
stefan@localhost:~/tmp$ cat script.py 
import pytesseract

hocr = pytesseract.image_to_pdf_or_hocr('file.png', extension='pdf')

print(hocr)

stefan@localhost:~/tmp$ python script.py 
b'%PDF-1.5\n%\xde\xad\xbe\xeb\n1 0 obj\n<<\n  /Type /Catalog\n  /Pages 2 0 [...]

Could you please provide some more details about your setup and versions?

DrPlanecraft · 2023-06-20T16:19:32Z

Hi! I got the same issue, my setup versions are as follows:
Python :

3.11.4

Tesseract:

tesseract 5.3.0
 leptonica-1.78.0 (Apr  9 2021, 08:55:04) [MSC v.1916 LIB Release x64]
  libjpeg 9d : libpng 1.6.39 : libtiff 4.4.0 : zlib 1.2.13
 Found AVX2
 Found AVX
 Found FMA
 Found SSE4.1
 Found libarchive 3.6.2 zlib/1.2.13 liblzma/5.2.6 bz2lib/1.0.8 liblz4/1.9.3 libzstd/1.5.2

Command Being sent:

['tesseract', 'C:\\Users\\DRPLAN~1\\AppData\\Local\\Temp\\tess_45thtfem_input.PNG', 'C:\\Users\\DRPLAN~1\\AppData\\Local\\Temp\\tess_45thtfem', 'batch.nochop', 'makebox']

In my case I can never see the tess_<temp code>.box file being generated

this is my error message: FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\DRPLAN~1\\AppData\\Local\\Temp\\tess_r3wvoqfl.box'

stefan6419846 · 2023-06-21T06:05:51Z

Have you tried running the same command (with the temporary files replaced by "real" files) with Tesseract directly? Does this generate the correct files?

DrPlanecraft · 2023-06-22T02:54:20Z

I have tried, and it does generate the correct files, but the issue was eventually linked to me not including a config = " -c tessedit_create_boxfile=1",

discovered thanks to Issue image_to_boxes crashing #106

bozhodimitrov mentioned this issue Aug 25, 2023

Fix default hocr config #503

Merged

bozhodimitrov closed this as completed in #503 Aug 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FileNotFoundError tmp file #454

FileNotFoundError tmp file #454

filirnd commented Sep 30, 2022

stefan6419846 commented Sep 30, 2022

filirnd commented Sep 30, 2022

stefan6419846 commented Sep 30, 2022 •

edited

Loading

jeb2112 commented Dec 23, 2022

stefan6419846 commented Dec 24, 2022

jeb2112 commented Dec 27, 2022

stefan6419846 commented Dec 30, 2022

DrPlanecraft commented Jun 20, 2023 •

edited

Loading

stefan6419846 commented Jun 21, 2023

DrPlanecraft commented Jun 22, 2023 •

edited

Loading

FileNotFoundError tmp file #454

FileNotFoundError tmp file #454

Comments

filirnd commented Sep 30, 2022

stefan6419846 commented Sep 30, 2022

filirnd commented Sep 30, 2022

stefan6419846 commented Sep 30, 2022 • edited Loading

jeb2112 commented Dec 23, 2022

stefan6419846 commented Dec 24, 2022

jeb2112 commented Dec 27, 2022

stefan6419846 commented Dec 30, 2022

DrPlanecraft commented Jun 20, 2023 • edited Loading

stefan6419846 commented Jun 21, 2023

DrPlanecraft commented Jun 22, 2023 • edited Loading

stefan6419846 commented Sep 30, 2022 •

edited

Loading

DrPlanecraft commented Jun 20, 2023 •

edited

Loading

DrPlanecraft commented Jun 22, 2023 •

edited

Loading