-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
跨页长表格解析问题 #1344
Comments
这是minerU生成的表格内容
第二页
|
我加了个手动处理的逻辑,判断相邻表格,如果表格间没有换行符之外的其他符号,且表格的最大列数一致,则认为这两个表格应该合并。 |
可否咨询一下,如何添加手动处理的逻辑? |
拿到解析结果之后:
|
ragflow的文件解析能合并跨页表格 |
Description of the bug | 错误描述
当前跨页解析结果生成了多个表格,跨页的生成的表格没有表头数据。
希望跨页表格能生成一个表格或者可以生成多个表格,但是每个表格要有表头。
How to reproduce the bug | 如何复现
上传有跨页表格的文档
Operating system | 操作系统
Linux
Python version | Python 版本
3.10
Software version | 软件版本 (magic-pdf --version)
0.10.x
Device mode | 设备模式
cuda
The text was updated successfully, but these errors were encountered: