-
Notifications
You must be signed in to change notification settings - Fork 8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
识别pdf 文件报错 #10466
Comments
看起来像是没有读到图片,请检查一下图片路径是否正确 |
路径应该是正确的, 我改成绝对路径 也是一样的报错。 请问一下,我是想识别pdf,代码是文档中给的例子,直接读取pdf的,不是图片。 直接读取pdf的这个能力没有了吗? |
读取pdf也是支持的,参考https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/ppstructure/docs/quickstart.md 的2.1.6节 如果希望利用OCR技术从pdf文件对应图片中解析pdf,需要先将pdf转化为图片,然后参考上述链接中的教程进行板面恢复 |
了解, 所以我要解析pdf 中的内容成 txt,需要先将pdf转换为一张一张图片,然后再去识别对吧? 另外上面的命令 报错
|
对的,利用OCR的方法进行识别就需要先将pdf一张张转化为图片 |
有关以pdf作为输入的问题,可以参考以下文档,似乎要求paddleocr版本大于2.6才行: |
paddleocr==2.6.1还是存在一样的问题,用的https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.7/doc/doc_ch/quickstart.md#12例子 |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions. |
将PDF转换为图片后仍然报错 File d:\apps\miniconda3\envs\GNN\lib\site-packages\paddleocr\paddleocr.py:759, in PPStructure.call(self, img, return_ocr_result_in_table, img_idx) File d:\apps\miniconda3\envs\GNN\lib\site-packages\paddleocr\ppstructure\predict_system.py:110, in StructureSystem.call(self, img, return_ocr_result_in_table, img_idx) AttributeError: 'NoneType' object has no attribute 'copy'` 运行代码: 中文测试图table_engine = PPStructure(recovery=True,table=False) 英文测试图table_engine = PPStructure(recovery=True, lang='en')save_folder = './output' for line in result: h, w, _ = img.shape |
请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem
The text was updated successfully, but these errors were encountered: