-
-
Notifications
You must be signed in to change notification settings - Fork 156
Optical Character Recognition (OCR)
clawsoftware edited this page Mar 21, 2023
·
4 revisions
Since version 0.8.7 clawPDF has a built-in text recognition (OCR). This allows to convert any document into a text or to create a PDF with text overlay. By default, the recognition of English, Spanish, French and German is supported. To install additional languages, go to the section Setup of additional languages.
To download additional language files, visit the tessdata_best or tessdata_fast page and then copy them to the C:\Program Files (x86)\clawpdf\tessdata
folder.
- To convert documents to text, select the
OCR/TXT (print as text)
profile or theOCR/TXT
format. - To print any document to PDF with text overlay, choose the
PDF/OCR (overlay with text)
profile or thePDF/{color}-OCR
format. - In the OCR tab, ensure that the font to be recognized corresponds to the abbreviation of the language file.
Make sure that the document is printed in portrait orientation. This step is already required in the Windows printer dialog.
Make sure that the language file is present in the folder C:\Program Files (x86)\clawpdf\tessdata
and the correct language is set in the OCR tab.