Replies: 1 comment
-
Realised the issue. I installed the OCRmyPDF-EasyOCR plugin during my first tests months ago, and ocrmypdf was using it when I was testing last week. So conversion quality was an EasyOCR problem. (I get roughly the same issues with their online demo) Would it be reasonable to print the plugin list in |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm using this pdf as a test, which is slightly blurry but still quite readable. Tesseract (through pytesseract) seems to work fine with it, on default settings.
But when I use ocrmypdf 16.0.3, with or without
clean
, a lot of the text is mangled. E.g. "In facsimile a photocell" becomes "In facsimi le photocell" at the beginning of the second paragraph. Indefinite article 'a' seems entirely skipped (screenshot). Am I using it incorrectly?I can get slightly better results with some opencv pre-processing, along the lines described here, but only slightly.
I had a look at _exec/tesseract.py, but don't either library enough to see which arguments might be making the difference.
Beta Was this translation helpful? Give feedback.
All reactions