-
Notifications
You must be signed in to change notification settings - Fork 725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for .heic file format #363
Comments
Hi @datatalking, |
Hi @int3l, |
One possibility is to change the prepare function. But it will require specific conversion for this format into one that is supported by Tesseract itself. |
@int3l What are the differences in the formats? As it stands now I have to convert thousands of them to jpg or other formats so I am curious what format Tesseract supports. |
Tesseract itself uses the Leptopnica image library -- here is unofficial list of supported formats: Leptopnica Image I/O formats |
What would be the process for extending support for pytesseract being able to work with .heic file types?
Essentially we are batch renaming the filetype to .jpg but they are not always supported.
I read through a bit of https://tesseract-ocr.github.io and did not see references to this file type for lossless files.
This is for a public works project using archived public documents going back over 200 years so having access to this filetype would be not only a public service but saving taxpayers money.
The text was updated successfully, but these errors were encountered: