Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add feature to skip OCR on PDF files with text #58

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

youduda
Copy link

@youduda youduda commented Aug 1, 2023

This uses the external tool pdftotext from poppler-utils to extract text from a PDF file. In case there is text in it, the OCR can be skipped. If the tool is not installed, this is simply ignored and, by default, this does not change the behavior of existing installations.
close #28

Signed-off-by: Florian Freund <florian@freund.zone>
@XueSheng-GIT
Copy link

Thanks @youduda for providing this pull request. I'm not a developer, but I've been testing your pull for a couple of days now (on three instances with NC 27.0.2). Works fine so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

avoid OCR on non-image PDF files
2 participants