Request for LINE Aggregation Level #200

Six-Persimmon · 2023-10-31T03:23:52Z

Hi developers.
Thank you very much for this awesome work! I am writing to request if you could add a GCVFeatureType.LINE aggregation level for the OCR module? e.g. layout = ocr_agent.gather_full_text_annotation(res, agg_level=lp.GCVFeatureType.LINE)
The motivation comes from some random behaviors of layout parser when I was working on some historical materials that contains date information. Specifically, if I aggregate for blocks like "Nov 07, 1995" in a scanned PDF at the WORD level, I may randomly get one of the following:
"Nov 07, 1995" "199507,Nov" "199507Nov,"
which may cause trouble when conducting datetime related operations. My guess is that the aggregation cannot identify the right relation between the comma and other words.
Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for LINE Aggregation Level #200

Request for LINE Aggregation Level #200

Six-Persimmon commented Oct 31, 2023

Request for LINE Aggregation Level #200

Request for LINE Aggregation Level #200

Comments

Six-Persimmon commented Oct 31, 2023