How to Improve the Detection Accuracy of This Character? #1273

MAbdElRaouf · 2024-03-09T20:18:16Z

MAbdElRaouf
Mar 9, 2024

Hello,

I have been using this great library to perform OCR on invoices and it has been working great so far.

I'm having difficulty trying to get OCRmyPDF to recognize the '9' in this picture (part of an invoice) correctly:

I have tried different oversample values (None, 200, 350, 600, 800, 1000) combined with different tesseract_pagesegmode (1, 4, 6, 11, 12), and the generated text would always recognize the '9' as $, €, 6, or omit it altogether.

What can be done to improve the detection accuracy in this case and similar cases?

Thank you.

jbarlow83 · 2024-03-18T20:11:19Z

jbarlow83
Mar 18, 2024
Maintainer

It may be possible to fine-tune Tesseract to improve its recognition with some labeled examples, although it may increase errors in other cases. OCRmyPDF doesn't have anything on its own to help with this.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to Improve the Detection Accuracy of This Character? #1273

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How to Improve the Detection Accuracy of This Character? #1273

MAbdElRaouf Mar 9, 2024

Replies: 1 comment

jbarlow83 Mar 18, 2024 Maintainer

MAbdElRaouf
Mar 9, 2024

jbarlow83
Mar 18, 2024
Maintainer