Add feature to skip OCR on PDF files with text #58

youduda · 2023-08-01T22:40:45Z

This uses the external tool pdftotext from poppler-utils to extract text from a PDF file. In case there is text in it, the OCR can be skipped. If the tool is not installed, this is simply ignored and, by default, this does not change the behavior of existing installations.
close #28

Signed-off-by: Florian Freund <florian@freund.zone>

XueSheng-GIT · 2023-08-21T10:42:54Z

Thanks @youduda for providing this pull request. I'm not a developer, but I've been testing your pull for a couple of days now (on three instances with NC 27.0.2). Works fine so far.

add feature to skip OCR on PDF files with text

5ca4337

Signed-off-by: Florian Freund <florian@freund.zone>

youduda force-pushed the master branch from 6fe086f to 5ca4337 Compare August 1, 2023 22:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add feature to skip OCR on PDF files with text #58

Add feature to skip OCR on PDF files with text #58

youduda commented Aug 1, 2023

XueSheng-GIT commented Aug 21, 2023

Add feature to skip OCR on PDF files with text #58

Are you sure you want to change the base?

Add feature to skip OCR on PDF files with text #58

Conversation

youduda commented Aug 1, 2023

XueSheng-GIT commented Aug 21, 2023