-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-linear grayscale normalization for layout analyse and/or text recognition #3857
base: main
Are you sure you want to change the base?
Conversation
…in (Thomas Breuel).
…tion and for both tasks.
…raynorm_mode). There are 4 modes 0 - no normalization, 1 - thresholding+recognition, 2 - thresholding (only), 3 - recognition (only).
Hi @JKamlah, Leptonica has some built-in grayscale normalization functions, maybe we can also use them. Here are some examples that demonstrate how to use them to improve thresholding using Otsu's or Sauvola's methods: I suggest to try to add at least |
CC: @bertsky |
You can use this image for testing the new feature: https://github.com/DanBloomberg/leptonica/blob/a14036fa5f5ea971/prog/w91frag.jpg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff, many thanks for bringing this forward!
Good idea to provide a mode parameter where to apply the normalization. But I am afraid the current design does not always completely match the intended function:
if (mode == 1) { | ||
SetInputImage(thresholder_->GetPixNormRectGrey()); | ||
thresholder_->SetImage(GetInputImage()); | ||
} else if (mode == 2) { | ||
thresholder_->SetImage(thresholder_->GetPixNormRectGrey()); | ||
} else if (mode == 3) { | ||
SetInputImage(thresholder_->GetPixNormRectGrey()); | ||
} else { | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some considerations regarding where this should be placed in the code base:
- Using a separate entry point
NormalizeImage
called inProcessPages
instead of merely modifying the thresholder prevents applying this on any PSM other than full pages. And on the API, you would need to addNormalizeImage
to the calling code instead of merely setting the configuration parameter. - Recognition (
SetupForRecognition
→BestPix
) does not always usepix_original_
: afterSetRectangle()
, it usespix_grey_
or evenpix_binary_
. - Layout analysis mostly uses
pix_binary_
, butLineFinder
also tries to usepix_grey_
andpix_thresholds_
. - DPI information (which influences LA in various ways) is taken from
pix_
(i.e. the thresholder'sSetImage
), and that might not work on the output of Leptonica because the metadata might be lost. We still have fallback DPI estimation (which is based on the CC statistics frompix_binary_
), but that might not be as accurate.
Tesseract's Otsu is implemented in I suggest to move |
Thank you for the idea @amitdo. |
Non-linear grayscale normalization
Draft PR only
I would first like to use the draft PR option to get some feedback that the implementation of grayscale normalization works smoothly on a wide variety of templates and is proving beneficial. Please test extensively.
Image normalization
In some cases, image normalization is applied to improve LA and OCR results. A popular method is called nlbin, which is a non-linear grayscale normalization with the option of subsequent binarization. This method was developed by Thomas Breuel for the text recognition program Ocropus.
In this PR the nlbin method was adapted for the existing Leptonica functions. The method can be activated via the parameter for layout analysis and/or the actual text recognition. It only performs a grayscale normalization and then the existing binarization methods can be still applied to it.
The "preprocess_graynorm_mode" parameter
This parameter is an INT member with currently 4 modes and can be activated with "-c preprocess_graynorm_mode=INT".
The modes:
0=no normalization applied (default)
1= apply normalization for thresholding & recognition
2= apply normalization for thresholding (only)
3= apply normalization for recognition (only)
The modes 1-3 are applied on the fullimage.
A normalization on linelevel would also be desirable. (not implemented yet)
Additional option
With the parameter "-c tessedit_write_images=1" the normalized image can be written out as tiff.