process multiple html files with hocr2djvused #1

jwilk · 2012-05-11T17:24:27Z

Issue reported by @thkoch2001:

Hi,

I ran tesseract manually on multiple image files (try GNU Parallel!) and ended up with one html (hocr) file for every page. To combine those html pages to one djvused script I hacked your hocr2djvused a bit.

My version now optionally also accepts input file parameters and processes those as consecutive pages.

You can find my changes here:
https://github.com/thkoch2001/ocrodjvu/commit/318657e4a45bb8c8002e06382b73d49e984c0f30

jwilk · 2012-05-11T19:55:15Z

The patch doesn't look crazy, but at least documentation would have to be updated (lib/cli/hocr2djvused.py:31 and doc/hocr2djvused.xml.

Some nitpicking:

I prefer lst += x to lst.extend(x).
Please keep indentation consistent with the rest of code.

jwilk · 2012-05-24T19:50:09Z

Implemented in f9922a6.

jwilk · 2012-05-28T17:51:05Z

Fixed in 0.7.11.

jwilk closed this as completed May 28, 2012

jwilk added the enhancement label Nov 22, 2016

ghost mentioned this issue Mar 15, 2017

error msg "No image suitable for OCR" is too vague #21

Open

ashipunov mentioned this issue Feb 4, 2019

Multiple jobs do not work with Tesseract 4 #31

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

process multiple html files with hocr2djvused #1

process multiple html files with hocr2djvused #1

jwilk commented May 11, 2012

jwilk commented May 11, 2012

jwilk commented May 24, 2012

jwilk commented May 28, 2012

process multiple html files with hocr2djvused #1

process multiple html files with hocr2djvused #1

Comments

jwilk commented May 11, 2012

jwilk commented May 11, 2012

jwilk commented May 24, 2012

jwilk commented May 28, 2012