Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for LINE Aggregation Level #200

Open
Six-Persimmon opened this issue Oct 31, 2023 · 0 comments
Open

Request for LINE Aggregation Level #200

Six-Persimmon opened this issue Oct 31, 2023 · 0 comments

Comments

@Six-Persimmon
Copy link

Hi developers.
Thank you very much for this awesome work! I am writing to request if you could add a GCVFeatureType.LINE aggregation level for the OCR module? e.g. layout = ocr_agent.gather_full_text_annotation(res, agg_level=lp.GCVFeatureType.LINE)
The motivation comes from some random behaviors of layout parser when I was working on some historical materials that contains date information. Specifically, if I aggregate for blocks like "Nov 07, 1995" in a scanned PDF at the WORD level, I may randomly get one of the following:
"Nov 07, 1995" "199507,Nov" "199507Nov,"
which may cause trouble when conducting datetime related operations. My guess is that the aggregation cannot identify the right relation between the comma and other words.
Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant