-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option to skip files which are already processes by some ORC scanner. #113
Comments
HI @SynIV and thank's for your feature-request. First of all: of course this is possible with little effort but since we try to keep the app as simple as possible, i think we have to discuss the default behaviour a little bit. According to the docs there are basically 3 flags for Could you please try to reproduce both of your use-cases via ocrmypdf --skip-text input.pdf output.pdf |
Hi @R0Wi, Thank you for your quick answer. I have tested the --skip-text option on a only half scanned file and it works great. As described in the documentation the already scanned pages are skipped. I think this would be a nice default behavior. I understand that you try to keep the app as simple as possible but in my opinion it would make the app more individual and customizable if unseres could set options to rescan or skip pages with existing printable text. For me I would be happy with --skip-text as the default behavior 😄 |
Thank's for your fast feedback. I will discuss this with @bahnwaerter and i think we can deliver a suitable solution in the next days. We will track our progress here 👍 |
Out of curiosity (I'd also like to avoid double OCR), is there a decision to change the default? |
I think the advantage of using I think we can go that way:
@bahnwaerter any thoughts? |
Thanks @SynIV for reporting this unfavorable behavior in your desired use case. As @R0Wi already said, the To cover use cases described by @SynIV, we have to change the default option from |
Thank you so much! 😊 |
Please let me know if you encounter any errors. Just pushed to the appstore for NC23 and NC24 🚀 |
Sometimes when I scan a document e.g. on my phone OCR is already done there in a pretty good quality. On the other hand when I scan files with my printer or I got some files from somewhere else which are not processed by OCR yet I like the option to automatically scan every file which is newly created on the server.
Therefore it would be absolutely great to automatically skip an OCR scan, if the file was already processed and contains printable text.
I would love the option to remove "--redo-ocr" to skip these documents without activating "--remove-background" because this has some other disadvantages according to the ocrmypdf documentation.
So I would like to ask very nicely if that would be possible. Unfortunately I am not experienced enough to contribute by myself.
The text was updated successfully, but these errors were encountered: