-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge google code branch https://code.google.com/r/email-hocr-tsv #18
Conversation
The original issue tracker is gone, but there's an archived version here: Basically the request is to output the information contained in a hOCR file in tabular TSV format. |
Can this be merged to provide support for tables? Thanks! |
What is the use case for this? I can't find any earlier discussion. As far as I can tell, all the information is included already in the hOCR output (more actually since it host LTR/RTL, italic/bold, etc) -- and, of course, even more info is available programmatically through the API. Here's some example output: http://teksty.klf.uw.edu.pl/12/1/alice_1.png.hocr.tsv |
I've created a cleaned up version of this code in #245. I'm not really happy about adding even more crap to baseapi.cpp, but I've got a separate branch to refactor the hOCR renderer out of it, so I can add the TSV renderer to that, if it's decided to include it in Tess. |
Wouldn't it be easier to keep the |
Link for one of the earlier requests https://groups.google.com/forum/m/#!topic/tesseract-issues/-QOvWLrsjfI
|
The earlier issue mentioned is at: https://web.archive.org/web/20151128094905/http://code.google.com/p/tesseract-ocr/issues/detail?id=918 Basically it posits that TSV output as a (partial?) solution to table layout analysis. I think it's a bit more involved that that, but I have no strong feelings one way or the other on adding this. Pros:
Cons:
Like I said, I'm neutral. I'll let others argue yea or nay. |
Thanks Tom, for listing out the pros and cons for tsv. As a user, I support having a simpler format of output without external Regarding the duplication of functionality, is it not possible to use a ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Wed, Mar 2, 2016 at 9:22 PM, Tom Morris notifications@github.com wrote:
|
Add TSV result renderer. Fixes tesseract-ocr#18
Add TSV result renderer. Fixes tesseract-ocr#18
Add TSV result renderer. Fixes tesseract-ocr#18
Add TSV result renderer. Fixes tesseract-ocr#18
Requested in https://code.google.com/p/tesseract-ocr/issues/detail?id=1378