-
Notifications
You must be signed in to change notification settings - Fork 722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FileNotFoundError tmp file #454
Comments
Could you please try to quote the |
Same error. |
pytesseract will never actually write to the output file itself, but the underlying Tesseract binary. Do you get any output if you call Tesseract manually with your options? What is the final Additionally: It seems like you are using a pre-release version of Tesseract. Are you able to test this with a regular release, as Tesseract 5.2.0 has been released about half a year ago? |
I got the same error on the |
As mentioned in my previous comment: What command is sent to Tesseract? Which Tesseract version are you using? What happens if you call Tesseract on this file directly? Do you have a reproducer? Nevertheless, this probably is a Tesseract issue, not a pytesseract one.
|
OK, this info helped me dig into it a bit farther, as I certainly hadn't understood the underlying call to tesseract binary, but yes of course I now see that is 'rb' for an input file. So eventually I found instead that I had a missing argument to the |
I cannot reproduce this issue on Linux running Tesseract 4.1.0 (shortened output): stefan@localhost:~/tmp$ python script.py
b'<?xml version="1.0" encoding="UTF-8"?>\n<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"\n "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">\n <head>\n <title></title>\n <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>\n <meta name=\'ocr-system\' content=\'tesseract 4.1.0\' />\n [...]
stefan@localhost:~/tmp$ vi script.py
stefan@localhost:~/tmp$ cat script.py
import pytesseract
hocr = pytesseract.image_to_pdf_or_hocr('file.png', extension='pdf')
print(hocr)
stefan@localhost:~/tmp$ python script.py
b'%PDF-1.5\n%\xde\xad\xbe\xeb\n1 0 obj\n<<\n /Type /Catalog\n /Pages 2 0 [...] Could you please provide some more details about your setup and versions? |
Hi! I got the same issue, my setup versions are as follows:
Tesseract:
Command Being sent:
In my case I can never see the this is my error message: |
Have you tried running the same command (with the temporary files replaced by "real" files) with Tesseract directly? Does this generate the correct files? |
I have tried, and it does generate the correct files, but the issue was eventually linked to me not including a
|
Hi,
I'm working on a Fedora 32 distro with tesseract 5.0.0-alpha-20201224 and pytesseract Version: 0.3.10.
when I call this function
image_to_string(image, lang="letsgodigital", config="--oem 4 --psm 100 -c tessedit_char_whitelist=.0123456789")
I received this error:
How can I resolve this?
The text was updated successfully, but these errors were encountered: