You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi developers.
Thank you very much for this awesome work! I am writing to request if you could add a GCVFeatureType.LINE aggregation level for the OCR module? e.g. layout = ocr_agent.gather_full_text_annotation(res, agg_level=lp.GCVFeatureType.LINE)
The motivation comes from some random behaviors of layout parser when I was working on some historical materials that contains date information. Specifically, if I aggregate for blocks like "Nov 07, 1995" in a scanned PDF at the WORD level, I may randomly get one of the following:
"Nov 07, 1995" "199507,Nov" "199507Nov,"
which may cause trouble when conducting datetime related operations. My guess is that the aggregation cannot identify the right relation between the comma and other words.
Thank you!
The text was updated successfully, but these errors were encountered:
Hi developers.
Thank you very much for this awesome work! I am writing to request if you could add a GCVFeatureType.LINE aggregation level for the OCR module? e.g. layout = ocr_agent.gather_full_text_annotation(res, agg_level=lp.GCVFeatureType.LINE)
The motivation comes from some random behaviors of layout parser when I was working on some historical materials that contains date information. Specifically, if I aggregate for blocks like "Nov 07, 1995" in a scanned PDF at the WORD level, I may randomly get one of the following:
"Nov 07, 1995" "199507,Nov" "199507Nov,"
which may cause trouble when conducting datetime related operations. My guess is that the aggregation cannot identify the right relation between the comma and other words.
Thank you!
The text was updated successfully, but these errors were encountered: