划分段落
#17
Replies: 1 comment
-
看着很不错,不懂就问:跨列合并怎么做到的呀 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
1.pdf分段按各级标题组织很合理,难点会集中在标题的识别,找到了标题和层级,分段就很自然了data:image/s3,"s3://crabby-images/853bf/853bf1b47e14e5cdb90212998091950078df4f9f" alt="image"
data:image/s3,"s3://crabby-images/171f2/171f26db58b18de63f3257be2e05dee5d82859f0" alt="image"
data:image/s3,"s3://crabby-images/e7e67/e7e6765e664fe5c9e308deaf13f3fe7820680ee1" alt="image"
2.因为布局模型都是单页扫描,会遗漏跨页或者双列跨列的文段和表格,建议进行跨列跨页的识别框拼接,再进行识别
这是我使用rappidlayout进行分段的尝试,先进行了跨列合并,再进行了跨页合并
原始识别:
跨列合并:
跨页合并:
Beta Was this translation helpful? Give feedback.
All reactions