PDF Renderer: allow to specify an alternate image or a custom resolution. #4171

phymbert · 2023-12-19T00:05:21Z

Motivation

Input images passed to OCR are often pre-processed (higher resolution, grayed, etc...).
It can be useful to specify an alternate image or a lower resolution in renderer, especially for a searchable pdf export.

Proposed changes

Added TessResultRenderer::SetRenderingImage or TessResultRenderer::SetRenderingResolution methods allow to programmatically change image or resolution to render before adding image to the renderer
New rendering_dpi param allows to override the output resolution by scaling the source image
Added few pdfrenderer tests
Fix missing pdf.ttf font in the cmake install target

These changes might resolve #210 and #3798 features request.

Checks

make check passed locally on ubuntu 23.10
GitHub workflows passed

zdenop · 2023-12-20T07:42:54Z

cmake does not install a PDF font file. It was the old way, how to handle font in pdf. Now it is automatically included in library

zdenop · 2023-12-20T07:47:52Z

@jbreiden @jbreiden2 : Jeff can you have a look at this?

phymbert · 2023-12-20T10:35:21Z

cmake does not install a PDF font file. It was the old way, how to handle font in pdf. Now it is automatically included in library

Thanks, @zdenop, for the explanation. I was confused with the tessdata/Makefile.am, and I will remove it.
Let me submit pdfrendrerer test fixes, it failed on some platforms.

@jbreiden @jbreiden2, a better way to check pdf files generated than file maximum size is welcomed

phymbert · 2024-04-14T18:09:45Z

It looks there is a little interest, that happens :) Thanks all

stweil · 2024-04-14T21:05:08Z

Hi, it's not unusual that pull requests take some time before they are merged. That does not necessarily mean that there is little interest, but there is only a small number of people who contribute to pull requests by adding comments or testing them.

phymbert · 2024-04-14T21:54:52Z

No worries at all, I just saw it open on my to-do list for a while, so I preferred to close. Thanks for your feedback, I understand, reopened, no hurry.

zdenop · 2024-04-18T09:23:19Z

Since it extends the API functionality, it should be included in the 5.4.0 release.

…ammatically Support new rendering_dpi api params. Add pdf renderer tests. Install pdf font in cmake tool chain. resolves tesseract-ocr#210 resolves tesseract-ocr#3798

stweil · 2024-04-19T19:39:22Z

I rebased this pull request and fixed a merge conflict.

include/tesseract/renderer.h

zdenop · 2024-04-19T20:06:18Z

What about implementing this feature also to tesseract executable as a command line option?

stweil · 2024-04-20T05:38:46Z

Isn't that already possible with -c?

pdf: tests add lib leptonica dependency in the make toolchain

zdenop · 2024-04-20T17:20:30Z

Isn't that already possible with -c?

With -c I can set rendering_dpi. How can I set an image for SetRenderingImage?

Signed-off-by: Stefan Weil <sw@weilnetz.de>

stweil · 2024-05-19T16:05:31Z

Tesseract can create multi-page PDF files when it is called with a list of images. Ideally that should also work with alternate images.

stweil · 2024-05-19T16:07:28Z

Isn't that already possible with -c?

With -c I can set rendering_dpi. How can I set an image for SetRenderingImage?

Would it be possible to implement the desired features by only adding new Tesseract parameters – without any change of the C / C++ API?

phymbert added a commit to phymbert/tesseract that referenced this pull request Dec 20, 2023

test tesseract-ocr#4171

3da2448

phymbert marked this pull request as ready for review February 17, 2024 18:00

phymbert closed this Apr 14, 2024

stweil added the enhancement label Apr 14, 2024

phymbert reopened this Apr 14, 2024

zdenop added this to the 5.4.0 milestone Apr 18, 2024

phymbert added 2 commits April 19, 2024 21:32

PDF Renderer: allow to specify an alternate image or resolution progr…

369aa78

…ammatically Support new rendering_dpi api params. Add pdf renderer tests. Install pdf font in cmake tool chain. resolves tesseract-ocr#210 resolves tesseract-ocr#3798

Remove changes on CMakeLists.txt

94b95b1

stweil force-pushed the phymbert/features/3798-rendering-resolution-option branch from 13bccc0 to 94b95b1 Compare April 19, 2024 19:34

stweil reviewed Apr 19, 2024

View reviewed changes

include/tesseract/renderer.h Outdated Show resolved Hide resolved

pdf: move rendering image and resolution to the pdf renderer only.

09b6875

pdf: tests add lib leptonica dependency in the make toolchain

zdenop referenced this pull request May 12, 2024

Create new release 5.3.5-rc1

cab5658

Signed-off-by: Stefan Weil <sw@weilnetz.de>

zdenop referenced this pull request May 19, 2024

Create new release 5.3.5-rc1

9a30816

Signed-off-by: Stefan Weil <sw@weilnetz.de>

stweil mentioned this pull request Jun 8, 2024

Build a front end for OCRmyPDF (development suggestion) UB-Mannheim/zotero-ocr#75

Closed

stweil mentioned this pull request Aug 5, 2024

tesseract picture export format #4290

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PDF Renderer: allow to specify an alternate image or a custom resolution. #4171

PDF Renderer: allow to specify an alternate image or a custom resolution. #4171

phymbert commented Dec 19, 2023 •

edited by stweil

Loading

zdenop commented Dec 20, 2023

zdenop commented Dec 20, 2023

phymbert commented Dec 20, 2023 •

edited

Loading

phymbert commented Apr 14, 2024

stweil commented Apr 14, 2024

phymbert commented Apr 14, 2024

zdenop commented Apr 18, 2024

stweil commented Apr 19, 2024

zdenop commented Apr 19, 2024

stweil commented Apr 20, 2024

zdenop commented Apr 20, 2024

stweil commented May 19, 2024

stweil commented May 19, 2024

PDF Renderer: allow to specify an alternate image or a custom resolution. #4171

Are you sure you want to change the base?

PDF Renderer: allow to specify an alternate image or a custom resolution. #4171

Conversation

phymbert commented Dec 19, 2023 • edited by stweil Loading

zdenop commented Dec 20, 2023

zdenop commented Dec 20, 2023

phymbert commented Dec 20, 2023 • edited Loading

phymbert commented Apr 14, 2024

stweil commented Apr 14, 2024

phymbert commented Apr 14, 2024

zdenop commented Apr 18, 2024

stweil commented Apr 19, 2024

zdenop commented Apr 19, 2024

stweil commented Apr 20, 2024

zdenop commented Apr 20, 2024

stweil commented May 19, 2024

stweil commented May 19, 2024

phymbert commented Dec 19, 2023 •

edited by stweil

Loading

phymbert commented Dec 20, 2023 •

edited

Loading