Skip to content
This repository has been archived by the owner on Oct 3, 2022. It is now read-only.

process multiple html files with hocr2djvused #1

Closed
jwilk opened this issue May 11, 2012 · 3 comments
Closed

process multiple html files with hocr2djvused #1

jwilk opened this issue May 11, 2012 · 3 comments

Comments

@jwilk
Copy link
Member

jwilk commented May 11, 2012

Issue reported by @thkoch2001:

Hi,

I ran tesseract manually on multiple image files (try GNU Parallel!) and ended up with one html (hocr) file for every page. To combine those html pages to one djvused script I hacked your hocr2djvused a bit.

My version now optionally also accepts input file parameters and processes those as consecutive pages.

You can find my changes here:
https://github.com/thkoch2001/ocrodjvu/commit/318657e4a45bb8c8002e06382b73d49e984c0f30

@jwilk
Copy link
Member Author

jwilk commented May 11, 2012

The patch doesn't look crazy, but at least documentation would have to be updated (lib/cli/hocr2djvused.py:31 and doc/hocr2djvused.xml.

Some nitpicking:

  • I prefer lst += x to lst.extend(x).
  • Please keep indentation consistent with the rest of code.

@jwilk
Copy link
Member Author

jwilk commented May 24, 2012

Implemented in f9922a6.

@jwilk
Copy link
Member Author

jwilk commented May 28, 2012

Fixed in 0.7.11.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Development

No branches or pull requests

1 participant