diff --git a/.github/ISSUE_TEMPLATE/newfeature.md b/.github/ISSUE_TEMPLATE/newfeature.md new file mode 100644 index 0000000000..4ffcbbb564 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/newfeature.md @@ -0,0 +1,17 @@ +--- +name: New Feature Issue template +about: Issue template for new features. +title: '' +labels: 'Code PR is needed' +assignees: 'shiyutang' + +--- + +## 背景 + +经过需求征集https://github.com/PaddlePaddle/PaddleOCR/issues/10334 和每周技术研讨会 https://github.com/PaddlePaddle/PaddleOCR/issues/10223 讨论,我们确定了XXXX任务。 + +## 解决步骤 +1. 根据开源代码进行网络结构、评估指标转换。代码链接:XXXX +2. 结合[论文复现指南](https://github.com/PaddlePaddle/models/blob/release%2F2.2/tutorials/article-implementation/ArticleReproduction_CV.md),进行前反向对齐等操作,达到论文Table.1中的指标。 +3. 参考[PR提交规范](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/doc/doc_ch/code_and_doc.md)提交代码PR到ppocr中。 diff --git a/README_ch.md b/README_ch.md new file mode 100755 index 0000000000..909ae8934a --- /dev/null +++ b/README_ch.md @@ -0,0 +1,254 @@ +[English](README.md) | 简体中文 | [हिन्दी](./doc/doc_i18n/README_हिन्द.md) | [日本語](./doc/doc_i18n/README_日本語.md) | [한국인](./doc/doc_i18n/README_한국어.md) | [Pу́сский язы́к](./doc/doc_i18n/README_Ру́сский_язы́к.md) + +

+ +

+

+ + + + + + + +

+ +## 简介 + +PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力开发者训练出更好的模型,并应用落地。 + +
+ +
+ +
+ +
+ +## 📣 近期更新 + +- **🔥2023.3.10 PaddleOCR集成了高性能、全场景模型部署方案FastDeploy,欢迎参考[指南](https://github.com/PaddlePaddle/PaddleOCR/tree/dygraph/deploy/fastdeploy)试用(注意使用dygraph分支)。** +- 📚**2022.12 发布[《OCR产业范例20讲》电子书](./applications/README.md)**,新增蒙古文、身份证、液晶屏缺陷等**7个场景应用范例** +- 🔨**2022.11 新增实现[4种前沿算法](doc/doc_ch/algorithm_overview.md)**:文本检测 [DRRG](doc/doc_ch/algorithm_det_drrg.md), 文本识别 [RFL](doc/doc_ch/algorithm_rec_rfl.md), 文本超分[Text Telescope](doc/doc_ch/algorithm_sr_telescope.md),公式识别[CAN](doc/doc_ch/algorithm_rec_can.md) +- **2022.10 优化[JS版PP-OCRv3模型](./deploy/paddlejs/README_ch.md)**:模型大小仅4.3M,预测速度提升8倍,配套web demo开箱即用 +- **💥 直播回放:PaddleOCR研发团队详解PP-StructureV2优化策略**。微信扫描[下方二维码](#开源社区),关注公众号并填写问卷后进入官方交流群,获取直播回放链接与20G重磅OCR学习大礼包(内含PDF转Word应用程序、10种垂类模型、《动手学OCR》电子书等) + +- **🔥2022.8.24 发布 PaddleOCR [release/2.6](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.6)** + - 发布[PP-StructureV2](./ppstructure/README_ch.md),系统功能性能全面升级,适配中文场景,新增支持[版面复原](./ppstructure/recovery/README_ch.md),支持**一行命令完成PDF转Word**; + - [版面分析](./ppstructure/layout/README_ch.md)模型优化:模型存储减少95%,速度提升11倍,平均CPU耗时仅需41ms; + - [表格识别](./ppstructure/table/README_ch.md)模型优化:设计3大优化策略,预测耗时不变情况下,模型精度提升6%; + - [关键信息抽取](./ppstructure/kie/README_ch.md)模型优化:设计视觉无关模型结构,语义实体识别精度提升2.8%,关系抽取精度提升9.1%。 +- **2022.8 发布 [OCR场景应用集合](./applications)**:包含数码管、液晶屏、车牌、高精度SVTR模型、手写体识别等**9个垂类模型**,覆盖通用,制造、金融、交通行业的主要OCR垂类应用。 +- **2022.8 新增实现[8种前沿算法](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6rc/doc/doc_ch/algorithm_overview.md)** + - 文本检测:[FCENet](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6rc/doc/doc_ch/algorithm_det_fcenet.md), [DB++](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6rc/doc/doc_ch/algorithm_det_db.md) + - 文本识别:[ViTSTR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6rc/doc/doc_ch/algorithm_rec_vitstr.md), [ABINet](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6rc/doc/doc_ch/algorithm_rec_abinet.md), [VisionLAN](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6rc/doc/doc_ch/algorithm_rec_visionlan.md), [SPIN](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6rc/doc/doc_ch/algorithm_rec_spin.md), [RobustScanner](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6rc/doc/doc_ch/algorithm_rec_robustscanner.md) + - 表格识别:[TableMaster](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6rc/doc/doc_ch/algorithm_table_master.md) + +- **2022.5.9 发布 PaddleOCR [release/2.5](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.5)** + - 发布[PP-OCRv3](./doc/doc_ch/ppocr_introduction.md#pp-ocrv3),速度可比情况下,中文场景效果相比于PP-OCRv2再提升5%,英文场景提升11%,80语种多语言模型平均识别准确率提升5%以上; + - 发布半自动标注工具[PPOCRLabelv2](./PPOCRLabel):新增表格文字图像、图像关键信息抽取任务和不规则文字图像的标注功能; + - 发布OCR产业落地工具集:打通22种训练部署软硬件环境与方式,覆盖企业90%的训练部署环境需求; + - 发布交互式OCR开源电子书[《动手学OCR》](./doc/doc_ch/ocr_book.md),覆盖OCR全栈技术的前沿理论与代码实践,并配套教学视频。 + +> [更多](./doc/doc_ch/update.md) + +## 🌟 特性 + +支持多种OCR相关前沿算法,在此基础上打造产业级特色模型[PP-OCR](./doc/doc_ch/ppocr_introduction.md)和[PP-Structure](./ppstructure/README_ch.md),并打通数据生产、模型训练、压缩、预测部署全流程。 + +
+ +
+ +> 上述内容的使用方法建议从文档教程中的快速开始体验 + + +## ⚡ 快速开始 + +- 在线网站体验:超轻量PP-OCR mobile模型体验地址:https://www.paddlepaddle.org.cn/hub/scene/ocr +- 移动端demo体验:[安装包DEMO下载地址](https://ai.baidu.com/easyedge/app/openSource?from=paddlelite)(基于EasyEdge和Paddle-Lite, 支持iOS和Android系统) +- 一行命令快速使用:[快速开始(中英文/多语言/文档分析)](./doc/doc_ch/quickstart.md) + + +## 📚《动手学OCR》电子书 +- [《动手学OCR》电子书](./doc/doc_ch/ocr_book.md) + + + +## 👫 开源社区 +- **📑项目合作:** 如果您是企业开发者且有明确的OCR垂类应用需求,填写[问卷](https://paddle.wjx.cn/vj/QwF7GKw.aspx)后可免费与官方团队展开不同层次的合作。 +- **👫加入社区:** **微信扫描二维码并填写问卷之后,加入交流群领取20G重磅OCR学习大礼包** + - **包括《动手学OCR》电子书** ,配套讲解视频和notebook项目;**PaddleOCR历次发版直播课回放链接**; + - **OCR场景应用模型集合:** 包含数码管、液晶屏、车牌、高精度SVTR模型、手写体识别等垂类模型,覆盖通用,制造、金融、交通行业的主要OCR垂类应用。 + - PDF2Word应用程序;OCR社区优秀开发者项目分享视频。 +- **🏅️社区项目**:[社区项目](./doc/doc_ch/thirdparty.md)文档中包含了社区用户**使用PaddleOCR开发的各种工具、应用**以及**为PaddleOCR贡献的功能、优化的文档与代码**等,是官方为社区开发者打造的荣誉墙,也是帮助优质项目宣传的广播站。 +- **🎁社区常规赛**:社区常规赛是面向OCR开发者的积分赛事,覆盖文档、代码、模型和应用四大类型,以季度为单位评选并发放奖励,赛题详情与报名方法可参考[链接](https://github.com/PaddlePaddle/PaddleOCR/issues/4982)。 + +
+ +

PaddleOCR官方交流群二维码

+
+ + +## 🛠️ PP-OCR系列模型列表(更新中) + +| 模型简介 | 模型名称 | 推荐场景 | 检测模型 | 方向分类器 | 识别模型 | +| ------------------------------------- | ----------------------- | --------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | +| 中英文超轻量PP-OCRv3模型(16.2M) | ch_PP-OCRv3_xx | 移动端&服务器端 | [推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_train.tar) | +| 英文超轻量PP-OCRv3模型(13.4M) | en_PP-OCRv3_xx | 移动端&服务器端 | [推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_distill_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_train.tar) | + +- 超轻量OCR系列更多模型下载(包括多语言),可以参考[PP-OCR系列模型下载](./doc/doc_ch/models_list.md),文档分析相关模型参考[PP-Structure系列模型下载](./ppstructure/docs/models_list.md) + +### PaddleOCR场景应用模型 + +| 行业 | 类别 | 亮点 | 文档说明 | 模型下载 | +| ---- | ------------ | ---------------------------------- | ------------------------------------------------------------ | --------------------------------------------- | +| 制造 | 数码管识别 | 数码管数据合成、漏识别调优 | [光功率计数码管字符识别](./applications/光功率计数码管字符识别/光功率计数码管字符识别.md) | [下载链接](./applications/README.md#模型下载) | +| 金融 | 通用表单识别 | 多模态通用表单结构化提取 | [多模态表单识别](./applications/多模态表单识别.md) | [下载链接](./applications/README.md#模型下载) | +| 交通 | 车牌识别 | 多角度图像处理、轻量模型、端侧部署 | [轻量级车牌识别](./applications/轻量级车牌识别.md) | [下载链接](./applications/README.md#模型下载) | + +- 更多制造、金融、交通行业的主要OCR垂类应用模型(如电表、液晶屏、高精度SVTR模型等),可参考[场景应用模型下载](./applications) + + + +## 📖 文档教程 + +- [运行环境准备](./doc/doc_ch/environment.md) +- [PP-OCR文本检测识别🔥](./doc/doc_ch/ppocr_introduction.md) + - [快速开始](./doc/doc_ch/quickstart.md) + - [模型库](./doc/doc_ch/models_list.md) + - [模型训练](./doc/doc_ch/training.md) + - [文本检测](./doc/doc_ch/detection.md) + - [文本识别](./doc/doc_ch/recognition.md) + - [文本方向分类器](./doc/doc_ch/angle_class.md) + - 模型压缩 + - [模型量化](./deploy/slim/quantization/README.md) + - [模型裁剪](./deploy/slim/prune/README.md) + - [知识蒸馏](./doc/doc_ch/knowledge_distillation.md) + - [推理部署](./deploy/README_ch.md) + - [基于Python预测引擎推理](./doc/doc_ch/inference_ppocr.md) + - [基于C++预测引擎推理](./deploy/cpp_infer/readme_ch.md) + - [服务化部署](./deploy/pdserving/README_CN.md) + - [端侧部署](./deploy/lite/readme.md) + - [Paddle2ONNX模型转化与预测](./deploy/paddle2onnx/readme.md) + - [云上飞桨部署工具](./deploy/paddlecloud/README.md) + - [Benchmark](./doc/doc_ch/benchmark.md) +- [PP-Structure文档分析🔥](./ppstructure/README_ch.md) + - [快速开始](./ppstructure/docs/quickstart.md) + - [模型库](./ppstructure/docs/models_list.md) + - [模型训练](./doc/doc_ch/training.md) + - [版面分析](./ppstructure/layout/README_ch.md) + - [表格识别](./ppstructure/table/README_ch.md) + - [关键信息提取](./ppstructure/kie/README_ch.md) + - [推理部署](./deploy/README_ch.md) + - [基于Python预测引擎推理](./ppstructure/docs/inference.md) + - [基于C++预测引擎推理](./deploy/cpp_infer/readme_ch.md) + - [服务化部署](./deploy/hubserving/readme.md) +- [前沿算法与模型🚀](./doc/doc_ch/algorithm_overview.md) + - [文本检测算法](./doc/doc_ch/algorithm_overview.md) + - [文本识别算法](./doc/doc_ch/algorithm_overview.md) + - [端到端OCR算法](./doc/doc_ch/algorithm_overview.md) + - [表格识别算法](./doc/doc_ch/algorithm_overview.md) + - [关键信息抽取算法](./doc/doc_ch/algorithm_overview.md) + - [使用PaddleOCR架构添加新算法](./doc/doc_ch/add_new_algorithm.md) +- [场景应用](./applications) +- 数据标注与合成 + - [半自动标注工具PPOCRLabel](./PPOCRLabel/README_ch.md) + - [数据合成工具Style-Text](./StyleText/README_ch.md) + - [其它数据标注工具](./doc/doc_ch/data_annotation.md) + - [其它数据合成工具](./doc/doc_ch/data_synthesis.md) +- 数据集 + - [通用中英文OCR数据集](doc/doc_ch/dataset/datasets.md) + - [手写中文OCR数据集](doc/doc_ch/dataset/handwritten_datasets.md) + - [垂类多语言OCR数据集](doc/doc_ch/dataset/vertical_and_multilingual_datasets.md) + - [版面分析数据集](doc/doc_ch/dataset/layout_datasets.md) + - [表格识别数据集](doc/doc_ch/dataset/table_datasets.md) + - [关键信息提取数据集](doc/doc_ch/dataset/kie_datasets.md) +- [代码组织结构](./doc/doc_ch/tree.md) +- [效果展示](#效果展示) +- [《动手学OCR》电子书📚](./doc/doc_ch/ocr_book.md) +- [开源社区](#开源社区) +- FAQ + - [通用问题](./doc/doc_ch/FAQ.md) + - [PaddleOCR实战问题](./doc/doc_ch/FAQ.md) +- [参考文献](./doc/doc_ch/reference.md) +- [许可证书](#许可证书) + + + + +## 👀 效果展示 [more](./doc/doc_ch/visualization.md) + +
+PP-OCRv3 中文模型 + +
+ + + +
+ +
+ + +
+PP-OCRv3 英文模型 + +
+ + +
+ +
+ + +
+PP-OCRv3 多语言模型 + +
+ + +
+ +
+ +
+PP-Structure 文档分析 + +- 版面分析+表格识别 +
+ +
+ +- SER(语义实体识别) +
+ +
+ +
+ +
+ +
+ +
+ +- RE(关系提取) +
+ +
+ +
+ +
+ +
+ +
+ +
+ + + +## 许可证书 +本项目的发布受Apache 2.0 license许可认证。 diff --git a/deploy/cpp_infer/include/ocr_rec.h b/deploy/cpp_infer/include/ocr_rec.h index 257c261033..f3712cb3ea 100644 --- a/deploy/cpp_infer/include/ocr_rec.h +++ b/deploy/cpp_infer/include/ocr_rec.h @@ -17,7 +17,7 @@ #include "paddle_api.h" #include "paddle_inference_api.h" -#include +#include #include namespace PaddleOCR { diff --git a/deploy/docker/hubserving/README_cn.md b/deploy/docker/hubserving/README_cn.md index 046903c4c7..b695b7993e 100644 --- a/deploy/docker/hubserving/README_cn.md +++ b/deploy/docker/hubserving/README_cn.md @@ -42,7 +42,7 @@ docker logs -f paddle_ocr ``` ## 4.测试服务 -a. 计算待识别图片的Base64编码(如果只是测试一下效果,可以通过免费的在线工具实现,如:http://tool.chinaz.com/tools/imgtobase/) +a. 计算待识别图片的Base64编码(如果只是测试一下效果,可以通过免费的在线工具实现,如:http://tool.chinaz.com/tools/imgtobase/ b. 发送服务请求(可参见sample_request.txt中的值) ``` curl -H "Content-Type:application/json" -X POST --data "{\"images\": [\"填入图片Base64编码(需要删除'data:image/jpg;base64,')\"]}" http://localhost:8868/predict/ocr_system diff --git a/deploy/hubserving/readme.md b/deploy/hubserving/readme.md index 8f4d086988..9302bad97e 100755 --- a/deploy/hubserving/readme.md +++ b/deploy/hubserving/readme.md @@ -3,7 +3,7 @@ - [基于PaddleHub Serving的服务部署](#基于paddlehub-serving的服务部署) - [1. 近期更新](#1-近期更新) - [2. 快速启动服务](#2-快速启动服务) - - [2.1 准备环境](#21-准备环境) + - [2.1 安装PaddleHub](#21-安装PaddleHub) - [2.2 下载推理模型](#22-下载推理模型) - [2.3 安装服务模块](#23-安装服务模块) - [2.4 启动服务](#24-启动服务) @@ -15,8 +15,8 @@ PaddleOCR提供2种服务部署方式: -- 基于PaddleHub Serving的部署:代码路径为"`./deploy/hubserving`",按照本教程使用; -- 基于PaddleServing的部署:代码路径为"`./deploy/pdserving`",使用方法参考[文档](../../deploy/pdserving/README_CN.md)。 +- 基于PaddleHub Serving的部署:代码路径为`./deploy/hubserving`,按照本教程使用; +- 基于PaddleServing的部署:代码路径为`./deploy/pdserving`,使用方法参考[文档](../../deploy/pdserving/README_CN.md)。 # 基于PaddleHub Serving的服务部署 @@ -51,120 +51,77 @@ deploy/hubserving/ocr_system/ ## 2. 快速启动服务 以下步骤以检测+识别2阶段串联服务为例,如果只需要检测服务或识别服务,替换相应文件路径即可。 -### 2.1 准备环境 -```shell -# 安装paddlehub -# paddlehub 需要 python>3.6.2 +### 2.1 安装PaddleHub +paddlehub 需要 python>3.6.2 +```bash pip3 install paddlehub==2.1.0 --upgrade -i https://mirror.baidu.com/pypi/simple ``` ### 2.2 下载推理模型 安装服务模块前,需要准备推理模型并放到正确路径。默认使用的是PP-OCRv3模型,默认模型路径为: +| 模型 | 路径 | +| ------- | - | +| 检测模型 | `./inference/ch_PP-OCRv3_det_infer/` | +| 识别模型 | `./inference/ch_PP-OCRv3_rec_infer/` | +| 方向分类器 | `./inference/ch_ppocr_mobile_v2.0_cls_infer/` | +| 版面分析模型 | `./inference/picodet_lcnet_x1_0_fgd_layout_infer/` | +| 表格结构识别模型 | `./inference/ch_ppstructure_mobile_v2.0_SLANet_infer/` | +| 关键信息抽取SER模型 | `./inference/ser_vi_layoutxlm_xfund_infer/` | +| 关键信息抽取RE模型 | `./inference/re_vi_layoutxlm_xfund_infer/` | -``` -检测模型:./inference/ch_PP-OCRv3_det_infer/ -识别模型:./inference/ch_PP-OCRv3_rec_infer/ -方向分类器:./inference/ch_ppocr_mobile_v2.0_cls_infer/ -版面分析模型:./inference/picodet_lcnet_x1_0_fgd_layout_infer/ -表格结构识别模型:./inference/ch_ppstructure_mobile_v2.0_SLANet_infer/ -关键信息抽取SER模型:./inference/ser_vi_layoutxlm_xfund_infer/ -关键信息抽取RE模型:./inference/re_vi_layoutxlm_xfund_infer/ -``` +**模型路径可在`params.py`中查看和修改。** -**模型路径可在`params.py`中查看和修改。** 更多模型可以从PaddleOCR提供的模型库[PP-OCR](../../doc/doc_ch/models_list.md)和[PP-Structure](../../ppstructure/docs/models_list.md)下载,也可以替换成自己训练转换好的模型。 +更多模型可以从PaddleOCR提供的模型库[PP-OCR](../../doc/doc_ch/models_list.md)和[PP-Structure](../../ppstructure/docs/models_list.md)下载,也可以替换成自己训练转换好的模型。 ### 2.3 安装服务模块 PaddleOCR提供5种服务模块,根据需要安装所需模块。 -* 在Linux环境下,安装示例如下: -```shell -# 安装检测服务模块: -hub install deploy/hubserving/ocr_det/ - -# 或,安装分类服务模块: -hub install deploy/hubserving/ocr_cls/ - -# 或,安装识别服务模块: -hub install deploy/hubserving/ocr_rec/ - -# 或,安装检测+识别串联服务模块: -hub install deploy/hubserving/ocr_system/ - -# 或,安装表格识别服务模块: -hub install deploy/hubserving/structure_table/ - -# 或,安装PP-Structure服务模块: -hub install deploy/hubserving/structure_system/ - -# 或,安装版面分析服务模块: -hub install deploy/hubserving/structure_layout/ - -# 或,安装关键信息抽取SER服务模块: -hub install deploy/hubserving/kie_ser/ - -# 或,安装关键信息抽取SER+RE服务模块: -hub install deploy/hubserving/kie_ser_re/ -``` - -* 在Windows环境下(文件夹的分隔符为`\`),安装示例如下: -```shell -# 安装检测服务模块: -hub install deploy\hubserving\ocr_det\ - -# 或,安装分类服务模块: -hub install deploy\hubserving\ocr_cls\ - -# 或,安装识别服务模块: -hub install deploy\hubserving\ocr_rec\ - -# 或,安装检测+识别串联服务模块: -hub install deploy\hubserving\ocr_system\ - -# 或,安装表格识别服务模块: -hub install deploy\hubserving\structure_table\ - -# 或,安装PP-Structure服务模块: -hub install deploy\hubserving\structure_system\ - -# 或,安装版面分析服务模块: -hub install deploy\hubserving\structure_layout\ - -# 或,安装关键信息抽取SER服务模块: -hub install deploy\hubserving\kie_ser\ - -# 或,安装关键信息抽取SER+RE服务模块: -hub install deploy\hubserving\kie_ser_re\ -``` +在Linux环境(Windows环境请将`/`替换为`\`)下,安装模块命令如下表: +| 服务模块 | 命令 | +| ------- | - | +| 检测 | `hub install deploy/hubserving/ocr_det` | +| 分类 | `hub install deploy/hubserving/ocr_cls` | +| 识别 | `hub install deploy/hubserving/ocr_rec` | +| 检测+识别串联 | `hub install deploy/hubserving/ocr_system` | +| 表格识别 | `hub install deploy/hubserving/structure_table` | +| PP-Structure | `hub install deploy/hubserving/structure_system` | +| 版面分析 | `hub install deploy/hubserving/structure_layout` | +| 关键信息抽取SER | `hub install deploy/hubserving/kie_ser` | +| 关键信息抽取SER+RE | `hub install deploy/hubserving/kie_ser_re` | ### 2.4 启动服务 #### 2.4.1. 命令行命令启动(仅支持CPU) -**启动命令:** -```shell -$ hub serving start --modules [Module1==Version1, Module2==Version2, ...] \ - --port XXXX \ - --use_multiprocess \ - --workers \ +**启动命令:** +```bash +hub serving start --modules Module1==Version1, Module2==Version2, ... \ + --port 8866 \ + --use_multiprocess \ + --workers \ ``` -**参数:** - -|参数|用途| -|---|---| -|--modules/-m|PaddleHub Serving预安装模型,以多个Module==Version键值对的形式列出
*`当不指定Version时,默认选择最新版本`*| -|--port/-p|服务端口,默认为8866| -|--use_multiprocess|是否启用并发方式,默认为单进程方式,推荐多核CPU机器使用此方式
*`Windows操作系统只支持单进程方式`*| -|--workers|在并发方式下指定的并发任务数,默认为`2*cpu_count-1`,其中`cpu_count`为CPU核数| - -如启动串联服务: ```hub serving start -m ocr_system``` +**参数:** +|参数|用途| +|---|---| +|`--modules`/`-m`|PaddleHub Serving预安装模型,以多个Module==Version键值对的形式列出
**当不指定Version时,默认选择最新版本**| +|`--port`/`-p`|服务端口,默认为8866| +|`--use_multiprocess`|是否启用并发方式,默认为单进程方式,推荐多核CPU机器使用此方式
**Windows操作系统只支持单进程方式**| +|`--workers`|在并发方式下指定的并发任务数,默认为`2*cpu_count-1`,其中`cpu_count`为CPU核数| + +如启动串联服务: +```bash +hub serving start -m ocr_system +``` 这样就完成了一个服务化API的部署,使用默认端口号8866。 #### 2.4.2 配置文件启动(支持CPU、GPU) -**启动命令:** -```hub serving start -c config.json``` +**启动命令:** +```bash +hub serving start -c config.json +``` 其中,`config.json`格式如下: -```python +```json { "modules_info": { "ocr_system": { @@ -182,48 +139,59 @@ $ hub serving start --modules [Module1==Version1, Module2==Version2, ...] \ } ``` -- `init_args`中的可配参数与`module.py`中的`_initialize`函数接口一致。其中,**当`use_gpu`为`true`时,表示使用GPU启动服务**。 +- `init_args`中的可配参数与`module.py`中的`_initialize`函数接口一致。 + + **当`use_gpu`为`true`时,表示使用GPU启动服务。** - `predict_args`中的可配参数与`module.py`中的`predict`函数接口一致。 -**注意:** +**注意:** - 使用配置文件启动服务时,其他参数会被忽略。 -- 如果使用GPU预测(即,`use_gpu`置为`true`),则需要在启动服务之前,设置CUDA_VISIBLE_DEVICES环境变量,如:```export CUDA_VISIBLE_DEVICES=0```,否则不用设置。 +- 如果使用GPU预测(即,`use_gpu`置为`true`),则需要在启动服务之前,设置CUDA_VISIBLE_DEVICES环境变量,如: + ```bash + export CUDA_VISIBLE_DEVICES=0 + ``` - **`use_gpu`不可与`use_multiprocess`同时为`true`**。 -如,使用GPU 3号卡启动串联服务: -```shell +如,使用GPU 3号卡启动串联服务: +```bash export CUDA_VISIBLE_DEVICES=3 hub serving start -c deploy/hubserving/ocr_system/config.json ``` ## 3. 发送预测请求 -配置好服务端,可使用以下命令发送预测请求,获取预测结果: - -```python tools/test_hubserving.py --server_url=server_url --image_dir=image_path``` - -需要给脚本传递2个参数: -- **server_url**:服务地址,格式为 -`http://[ip_address]:[port]/predict/[module_name]` -例如,如果使用配置文件启动分类,检测、识别,检测+分类+识别3阶段,表格识别和PP-Structure服务,那么发送请求的url将分别是: -`http://127.0.0.1:8865/predict/ocr_det` -`http://127.0.0.1:8866/predict/ocr_cls` -`http://127.0.0.1:8867/predict/ocr_rec` -`http://127.0.0.1:8868/predict/ocr_system` -`http://127.0.0.1:8869/predict/structure_table` -`http://127.0.0.1:8870/predict/structure_system` -`http://127.0.0.1:8870/predict/structure_layout` -`http://127.0.0.1:8871/predict/kie_ser` -`http://127.0.0.1:8872/predict/kie_ser_re` -- **image_dir**:测试图像路径,可以是单张图片路径,也可以是图像集合目录路径 -- **visualize**:是否可视化结果,默认为False -- **output**:可视化结果保存路径,默认为`./hubserving_result` - -访问示例: -```python tools/test_hubserving.py --server_url=http://127.0.0.1:8868/predict/ocr_system --image_dir=./doc/imgs/ --visualize=false``` +配置好服务端,可使用以下命令发送预测请求,获取预测结果: +```bash +python tools/test_hubserving.py --server_url=server_url --image_dir=image_path +``` + +需要给脚本传递2个参数: +- `server_url`:服务地址,格式为`http://[ip_address]:[port]/predict/[module_name]` + + 例如,如果使用配置文件启动分类,检测、识别,检测+分类+识别3阶段,表格识别和PP-Structure服务 + + 并为每个服务修改了port,那么发送请求的url将分别是: + ``` + http://127.0.0.1:8865/predict/ocr_det + http://127.0.0.1:8866/predict/ocr_cls + http://127.0.0.1:8867/predict/ocr_rec + http://127.0.0.1:8868/predict/ocr_system + http://127.0.0.1:8869/predict/structure_table + http://127.0.0.1:8870/predict/structure_system + http://127.0.0.1:8870/predict/structure_layout + http://127.0.0.1:8871/predict/kie_ser + http://127.0.0.1:8872/predict/kie_ser_re + ``` +- `image_dir`:测试图像路径,可以是单张图片路径,也可以是图像集合目录路径 +- `visualize`:是否可视化结果,默认为False +- `output`:可视化结果保存路径,默认为`./hubserving_result` + +访问示例: +```bash +python tools/test_hubserving.py --server_url=http://127.0.0.1:8868/predict/ocr_system --image_dir=./doc/imgs/ --visualize=false +``` ## 4. 返回结果格式说明 返回结果为列表(list),列表中的每一项为词典(dict),词典一共可能包含3种字段,信息如下: - |字段名称|数据类型|意义| |---|---|---| |angle|str|文本角度| @@ -231,41 +199,52 @@ hub serving start -c deploy/hubserving/ocr_system/config.json |confidence|float| 文本识别置信度或文本角度分类置信度| |text_region|list|文本位置坐标| |html|str|表格的html字符串| -|regions|list|版面分析+表格识别+OCR的结果,每一项为一个list,包含表示区域坐标的`bbox`,区域类型的`type`和区域结果的`res`三个字段| +|regions|list|版面分析+表格识别+OCR的结果,每一项为一个list
包含表示区域坐标的`bbox`,区域类型的`type`和区域结果的`res`三个字段| |layout|list|版面分析的结果,每一项一个dict,包含版面区域坐标的`bbox`,区域类型的`label`| 不同模块返回的字段不同,如,文本识别服务模块返回结果不含`text_region`字段,具体信息如下: - -| 字段名/模块名 | ocr_det | ocr_cls | ocr_rec | ocr_system | structure_table | structure_system | Structure_layout | kie_ser | kie_re | -| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | -|angle| | ✔ | | ✔ | ||| -|text| | |✔|✔| | ✔ | | ✔ | ✔ | -|confidence| |✔ |✔| | | ✔| |✔ | ✔ | -|text_region| ✔| | |✔ | | ✔| |✔ | ✔ | -|html| | | | |✔ |✔||| | -|regions| | | | |✔ |✔ | || | -|layout| | | | | | | ✔ || | -|ser_res| | | | | | | | ✔ | | -|re_res| | | | | | | | | ✔ | - +|字段名/模块名 |ocr_det |ocr_cls |ocr_rec |ocr_system |structure_table |structure_system |structure_layout |kie_ser |kie_re | +|--- |--- |--- |--- |--- |--- |--- |--- |--- |--- | +|angle | |✔ | |✔ | | | | +|text | | |✔ |✔ | |✔ | |✔ |✔ | +|confidence | |✔ |✔ |✔ | |✔ | |✔ |✔ | +|text_region |✔ | | |✔ | |✔ | |✔ |✔ | +|html | | | | |✔ |✔ | | | | +|regions | | | | |✔ |✔ | | | | +|layout | | | | | | |✔ | | | +|ser_res | | | | | | | |✔ | | +|re_res | | | | | | | | |✔ | **说明:** 如果需要增加、删除、修改返回字段,可在相应模块的`module.py`文件中进行修改,完整流程参考下一节自定义修改服务模块。 ## 5. 自定义修改服务模块 -如果需要修改服务逻辑,你一般需要操作以下步骤(以修改`ocr_system`为例): - -- 1、 停止服务 -```hub serving stop --port/-p XXXX``` - -- 2、 到相应的`module.py`和`params.py`等文件中根据实际需求修改代码。 -例如,如果需要替换部署服务所用模型,则需要到`params.py`中修改模型路径参数`det_model_dir`和`rec_model_dir`,如果需要关闭文本方向分类器,则将参数`use_angle_cls`置为`False`,当然,同时可能还需要修改其他相关参数,请根据实际情况修改调试。 **强烈建议修改后先直接运行`module.py`调试,能正确运行预测后再启动服务测试。** -**注意** PPOCR-v3识别模型使用的图片输入shape为`3,48,320`,因此需要修改`params.py`中的`cfg.rec_image_shape = "3, 48, 320"`,如果不使用PPOCR-v3识别模型,则无需修改该参数。 - -- 3、 卸载旧服务包 -```hub uninstall ocr_system``` - -- 4、 安装修改后的新服务包 -```hub install deploy/hubserving/ocr_system/``` - -- 5、重新启动服务 -```hub serving start -m ocr_system``` +如果需要修改服务逻辑,一般需要操作以下步骤(以修改`deploy/hubserving/ocr_system`为例): + +1. 停止服务: + ```bash + hub serving stop --port/-p XXXX + ``` +2. 到`deploy/hubserving/ocr_system`下的`module.py`和`params.py`等文件中根据实际需求修改代码。 + + 例如,如果需要替换部署服务所用模型,则需要到`params.py`中修改模型路径参数`det_model_dir`和`rec_model_dir`,如果需要关闭文本方向分类器,则将参数`use_angle_cls`置为`False` + + 当然,同时可能还需要修改其他相关参数,请根据实际情况修改调试。 + + **强烈建议修改后先直接运行`module.py`调试,能正确运行预测后再启动服务测试。** + + **注意:** PPOCR-v3识别模型使用的图片输入shape为`3,48,320`,因此需要修改`params.py`中的`cfg.rec_image_shape = "3, 48, 320"`,如果不使用PPOCR-v3识别模型,则无需修改该参数。 +3. (可选)如果想要重命名模块需要更改`module.py`文件中的以下行: + - [`from deploy.hubserving.ocr_system.params import read_params`中的`ocr_system`](https://github.com/PaddlePaddle/PaddleOCR/blob/a923f35de57b5e378f8dd16e54d0a3e4f51267fd/deploy/hubserving/ocr_system/module.py#L35) + - [`name="ocr_system",`中的`ocr_system`](https://github.com/PaddlePaddle/PaddleOCR/blob/a923f35de57b5e378f8dd16e54d0a3e4f51267fd/deploy/hubserving/ocr_system/module.py#L39) +4. (可选)可能需要删除`__pycache__`目录以强制刷新CPython缓存: + ```bash + find deploy/hubserving/ocr_system -name '__pycache__' -exec rm -r {} \; + ``` +5. 安装修改后的新服务包: + ```bash + hub install deploy/hubserving/ocr_system + ``` +6. 重新启动服务: + ```bash + hub serving start -m ocr_system + ``` diff --git a/deploy/hubserving/readme_en.md b/deploy/hubserving/readme_en.md index 613f0ed48e..034e2786ce 100755 --- a/deploy/hubserving/readme_en.md +++ b/deploy/hubserving/readme_en.md @@ -3,24 +3,23 @@ English | [简体中文](readme.md) - [Service deployment based on PaddleHub Serving](#service-deployment-based-on-paddlehub-serving) - [1. Update](#1-update) - [2. Quick start service](#2-quick-start-service) - - [2.1 Prepare the environment](#21-prepare-the-environment) + - [2.1 Install PaddleHub](#21-install-paddlehub) - [2.2 Download inference model](#22-download-inference-model) - [2.3 Install Service Module](#23-install-service-module) - [2.4 Start service](#24-start-service) - [2.4.1 Start with command line parameters (CPU only)](#241-start-with-command-line-parameters-cpu-only) - - [2.4.2 Start with configuration file(CPU、GPU)](#242-start-with-configuration-filecpugpu) + - [2.4.2 Start with configuration file(CPU and GPU)](#242-start-with-configuration-filecpugpu) - [3. Send prediction requests](#3-send-prediction-requests) - [4. Returned result format](#4-returned-result-format) - - [5. User defined service module modification](#5-user-defined-service-module-modification) - + - [5. User-defined service module modification](#5-user-defined-service-module-modification) PaddleOCR provides 2 service deployment methods: -- Based on **PaddleHub Serving**: Code path is "`./deploy/hubserving`". Please follow this tutorial. -- Based on **PaddleServing**: Code path is "`./deploy/pdserving`". Please refer to the [tutorial](../../deploy/pdserving/README.md) for usage. +- Based on **PaddleHub Serving**: Code path is `./deploy/hubserving`. Please follow this tutorial. +- Based on **PaddleServing**: Code path is `./deploy/pdserving`. Please refer to the [tutorial](../../deploy/pdserving/README.md) for usage. -# Service deployment based on PaddleHub Serving +# Service deployment based on PaddleHub Serving -The hubserving service deployment directory includes seven service packages: text detection, text angle class, text recognition, text detection+text angle class+text recognition three-stage series connection, layout analysis, table recognition and PP-Structure. Please select the corresponding service package to install and start service according to your needs. The directory is as follows: +The hubserving service deployment directory includes seven service packages: text detection, text angle class, text recognition, text detection+text angle class+text recognition three-stage series connection, layout analysis, table recognition, and PP-Structure. Please select the corresponding service package to install and start the service according to your needs. The directory is as follows: ``` deploy/hubserving/ └─ ocr_det text detection module service package @@ -34,13 +33,13 @@ deploy/hubserving/ └─ kie_ser_re KIE(SER+RE) service package ``` -Each service pack contains 3 files. Take the 2-stage series connection service package as an example, the directory is as follows: +Each service pack contains 3 files. Take the 2-stage series connection service package as an example, the directory is as follows: ``` deploy/hubserving/ocr_system/ └─ __init__.py Empty file, required └─ config.json Configuration file, optional, passed in as a parameter when using configuration to start the service └─ module.py Main module file, required, contains the complete logic of the service - └─ params.py Parameter file, required, including parameters such as model path, pre- and post-processing parameters + └─ params.py Parameter file, required, including parameters such as model path, pre and post-processing parameters ``` ## 1. Update @@ -49,124 +48,76 @@ deploy/hubserving/ocr_system/ * 2022.03.30 add PP-Structure and table recognition services. * 2022.05.05 add PP-OCRv3 text detection and recognition services. - ## 2. Quick start service The following steps take the 2-stage series service as an example. If only the detection service or recognition service is needed, replace the corresponding file path. -### 2.1 Prepare the environment -```shell -# Install paddlehub -# python>3.6.2 is required bt paddlehub -pip3 install paddlehub==2.1.0 --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple +### 2.1 Install PaddleHub +```bash +pip3 install paddlehub==2.1.0 --upgrade ``` ### 2.2 Download inference model -Before installing the service module, you need to prepare the inference model and put it in the correct path. By default, the PP-OCRv3 models are used, and the default model path is: -``` -text detection model: ./inference/ch_PP-OCRv3_det_infer/ -text recognition model: ./inference/ch_PP-OCRv3_rec_infer/ -text angle classifier: ./inference/ch_ppocr_mobile_v2.0_cls_infer/ -layout parse model: ./inference/picodet_lcnet_x1_0_fgd_layout_infer/ -tanle recognition: ./inference/ch_ppstructure_mobile_v2.0_SLANet_infer/ -KIE(SER): ./inference/ser_vi_layoutxlm_xfund_infer/ -KIE(SER+RE): ./inference/re_vi_layoutxlm_xfund_infer/ -``` - -**The model path can be found and modified in `params.py`.** More models provided by PaddleOCR can be obtained from the [model library](../../doc/doc_en/models_list_en.md). You can also use models trained by yourself. +Before installing the service module, you need to prepare the inference model and put it in the correct path. By default, the PP-OCRv3 models are used, and the default model path is: +| Model | Path | +| ------- | - | +| text detection model | ./inference/ch_PP-OCRv3_det_infer/ | +| text recognition model | ./inference/ch_PP-OCRv3_rec_infer/ | +| text angle classifier | ./inference/ch_ppocr_mobile_v2.0_cls_infer/ | +| layout parse model | ./inference/picodet_lcnet_x1_0_fgd_layout_infer/ | +| tanle recognition | ./inference/ch_ppstructure_mobile_v2.0_SLANet_infer/ | +| KIE(SER) | ./inference/ser_vi_layoutxlm_xfund_infer/ | +| KIE(SER+RE) | ./inference/re_vi_layoutxlm_xfund_infer/ | + +**The model path can be found and modified in `params.py`.** +More models provided by PaddleOCR can be obtained from the [model library](../../doc/doc_en/models_list_en.md). You can also use models trained by yourself. ### 2.3 Install Service Module PaddleOCR provides 5 kinds of service modules, install the required modules according to your needs. -* On Linux platform, the examples are as follows. -```shell -# Install the text detection service module: -hub install deploy/hubserving/ocr_det/ - -# Or, install the text angle class service module: -hub install deploy/hubserving/ocr_cls/ - -# Or, install the text recognition service module: -hub install deploy/hubserving/ocr_rec/ - -# Or, install the 2-stage series service module: -hub install deploy/hubserving/ocr_system/ - -# Or install table recognition service module -hub install deploy/hubserving/structure_table/ - -# Or install PP-Structure service module -hub install deploy/hubserving/structure_system/ - -# Or install KIE(SER) service module -hub install deploy/hubserving/kie_ser/ - -# Or install KIE(SER+RE) service module -hub install deploy/hubserving/kie_ser_re/ -``` - -* On Windows platform, the examples are as follows. -```shell -# Install the detection service module: -hub install deploy\hubserving\ocr_det\ - -# Or, install the angle class service module: -hub install deploy\hubserving\ocr_cls\ - -# Or, install the recognition service module: -hub install deploy\hubserving\ocr_rec\ - -# Or, install the 2-stage series service module: -hub install deploy\hubserving\ocr_system\ - -# Or install table recognition service module -hub install deploy/hubserving/structure_table/ - -# Or install PP-Structure service module -hub install deploy\hubserving\structure_system\ - -# Or install layout analysis service module -hub install deploy\hubserving\structure_layout\ - -# Or install KIE(SER) service module -hub install deploy\hubserving\kie_ser\ - -# Or install KIE(SER+RE) service module -hub install deploy\hubserving\kie_ser_re\ -``` +* On the Linux platform(replace `/` with `\` if using Windows), the examples are as the following table: +| Service model | Command | +| text detection | `hub install deploy/hubserving/ocr_det` | +| text angle class: | `hub install deploy/hubserving/ocr_cls` | +| text recognition: | `hub install deploy/hubserving/ocr_rec` | +| 2-stage series: | `hub install deploy/hubserving/ocr_system` | +| table recognition | `hub install deploy/hubserving/structure_table` | +| PP-Structure | `hub install deploy/hubserving/structure_system` | +| KIE(SER) | `hub install deploy/hubserving/kie_ser` | +| KIE(SER+RE) | `hub install deploy/hubserving/kie_ser_re` | ### 2.4 Start service #### 2.4.1 Start with command line parameters (CPU only) +**start command:** +```bash +hub serving start --modules Module1==Version1, Module2==Version2, ... \ + --port 8866 \ + --use_multiprocess \ + --workers \ +``` + +**Parameters:** +|parameters|usage| +|---|---| +|`--modules`/`-m`|PaddleHub Serving pre-installed model, listed in the form of multiple Module==Version key-value pairs
**When Version is not specified, the latest version is selected by default**| +|`--port`/`-p`|Service port, default is 8866| +|`--use_multiprocess`|Enable concurrent mode, by default using the single-process mode, this mode is recommended for multi-core CPU machines
**Windows operating system only supports single-process mode**| +|`--workers`|The number of concurrent tasks specified in concurrent mode, the default is `2*cpu_count-1`, where `cpu_count` is the number of CPU cores| -**start command:** -```shell -$ hub serving start --modules [Module1==Version1, Module2==Version2, ...] \ - --port XXXX \ - --use_multiprocess \ - --workers \ -``` -**parameters:** - -|parameters|usage| -|---|---| -|--modules/-m|PaddleHub Serving pre-installed model, listed in the form of multiple Module==Version key-value pairs
*`When Version is not specified, the latest version is selected by default`*| -|--port/-p|Service port, default is 8866| -|--use_multiprocess|Enable concurrent mode, the default is single-process mode, this mode is recommended for multi-core CPU machines
*`Windows operating system only supports single-process mode`*| -|--workers|The number of concurrent tasks specified in concurrent mode, the default is `2*cpu_count-1`, where `cpu_count` is the number of CPU cores| - -For example, start the 2-stage series service: -```shell +For example, start the 2-stage series service: +```bash hub serving start -m ocr_system -``` +``` -This completes the deployment of a service API, using the default port number 8866. +This completes the deployment of a service API, using the default port number 8866. -#### 2.4.2 Start with configuration file(CPU、GPU) -**start command:** -```shell +#### 2.4.2 Start with configuration file(CPU and GPU) +**start command:** +```bash hub serving start --config/-c config.json -``` -Wherein, the format of `config.json` is as follows: -```python +``` + +In which the format of `config.json` is as follows: +```json { "modules_info": { "ocr_system": { @@ -183,51 +134,61 @@ Wherein, the format of `config.json` is as follows: "workers": 2 } ``` -- The configurable parameters in `init_args` are consistent with the `_initialize` function interface in `module.py`. Among them, **when `use_gpu` is `true`, it means that the GPU is used to start the service**. +- The configurable parameters in `init_args` are consistent with the `_initialize` function interface in `module.py`. + + **When `use_gpu` is `true`, it means that the GPU is used to start the service**. - The configurable parameters in `predict_args` are consistent with the `predict` function interface in `module.py`. -**Note:** -- When using the configuration file to start the service, other parameters will be ignored. -- If you use GPU prediction (that is, `use_gpu` is set to `true`), you need to set the environment variable CUDA_VISIBLE_DEVICES before starting the service, such as: ```export CUDA_VISIBLE_DEVICES=0```, otherwise you do not need to set it. -- **`use_gpu` and `use_multiprocess` cannot be `true` at the same time.** + **Note:** + - When using the configuration file to start the service, other parameters will be ignored. + - If you use GPU prediction (that is, `use_gpu` is set to `true`), you need to set the environment variable CUDA_VISIBLE_DEVICES before starting the service, such as: + ```bash + export CUDA_VISIBLE_DEVICES=0 + ``` + - **`use_gpu` and `use_multiprocess` cannot be `true` at the same time.** For example, use GPU card No. 3 to start the 2-stage series service: -```shell +```bash export CUDA_VISIBLE_DEVICES=3 hub serving start -c deploy/hubserving/ocr_system/config.json -``` +``` ## 3. Send prediction requests -After the service starts, you can use the following command to send a prediction request to obtain the prediction result: -```shell +After the service starts, you can use the following command to send a prediction request to obtain the prediction result: +```bash python tools/test_hubserving.py --server_url=server_url --image_dir=image_path -``` +``` Two parameters need to be passed to the script: -- **server_url**:service address,format of which is -`http://[ip_address]:[port]/predict/[module_name]` -For example, if using the configuration file to start the text angle classification, text detection, text recognition, detection+classification+recognition 3 stages, table recognition and PP-Structure service, then the `server_url` to send the request will be: - -`http://127.0.0.1:8865/predict/ocr_det` -`http://127.0.0.1:8866/predict/ocr_cls` -`http://127.0.0.1:8867/predict/ocr_rec` -`http://127.0.0.1:8868/predict/ocr_system` -`http://127.0.0.1:8869/predict/structure_table` -`http://127.0.0.1:8870/predict/structure_system` -`http://127.0.0.1:8870/predict/structure_layout` -`http://127.0.0.1:8871/predict/kie_ser` -`http://127.0.0.1:8872/predict/kie_ser_re` -- **image_dir**:Test image path, can be a single image path or an image directory path -- **visualize**:Whether to visualize the results, the default value is False -- **output**:The floder to save Visualization result, default value is `./hubserving_result` - -**Eg.** -```shell +- **server_url**:service address, the format of which is +`http://[ip_address]:[port]/predict/[module_name]` + + For example, if using the configuration file to start the text angle classification, text detection, text recognition, detection+classification+recognition 3 stages, table recognition and PP-Structure service, + + also modified the port for each service, then the `server_url` to send the request will be: + + ``` + http://127.0.0.1:8865/predict/ocr_det + http://127.0.0.1:8866/predict/ocr_cls + http://127.0.0.1:8867/predict/ocr_rec + http://127.0.0.1:8868/predict/ocr_system + http://127.0.0.1:8869/predict/structure_table + http://127.0.0.1:8870/predict/structure_system + http://127.0.0.1:8870/predict/structure_layout + http://127.0.0.1:8871/predict/kie_ser + http://127.0.0.1:8872/predict/kie_ser_re + ``` +- **image_dir**:Test image path, which can be a single image path or an image directory path +- **visualize**:Whether to visualize the results, the default value is False +- **output**:The folder to save the Visualization result, the default value is `./hubserving_result` + +Example: +```bash python tools/test_hubserving.py --server_url=http://127.0.0.1:8868/predict/ocr_system --image_dir=./doc/imgs/ --visualize=false` ``` ## 4. Returned result format -The returned result is a list. Each item in the list is a dict. The dict may contain three fields. The information is as follows: +The returned result is a list. Each item in the list is a dictionary which may contain three fields. The information is as follows: |field name|data type|description| |----|----|----| @@ -235,45 +196,54 @@ The returned result is a list. Each item in the list is a dict. The dict may con |text|str|text content| |confidence|float|text recognition confidence| |text_region|list|text location coordinates| -|html|str|table html str| -|regions|list|The result of layout analysis + table recognition + OCR, each item is a list, including `bbox` indicating area coordinates, `type` of area type and `res` of area results| +|html|str|table HTML string| +|regions|list|The result of layout analysis + table recognition + OCR, each item is a list
including `bbox` indicating area coordinates, `type` of area type and `res` of area results| |layout|list|The result of layout analysis, each item is a dict, including `bbox` indicating area coordinates, `label` of area type| -The fields returned by different modules are different. For example, the results returned by the text recognition service module do not contain `text_region`. The details are as follows: +The fields returned by different modules are different. For example, the results returned by the text recognition service module do not contain `text_region`, detailed table is as follows: -| field name/module name | ocr_det | ocr_cls | ocr_rec | ocr_system | structure_table | structure_system | structure_layout | kie_ser | kie_re | -| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | -|angle| | ✔ | | ✔ | ||| -|text| | |✔|✔| | ✔ | | ✔ | ✔ | -|confidence| |✔ |✔| | | ✔| |✔ | ✔ | -|text_region| ✔| | |✔ | | ✔| |✔ | ✔ | -|html| | | | |✔ |✔||| | -|regions| | | | |✔ |✔ | || | -|layout| | | | | | | ✔ || | -|ser_res| | | | | | | | ✔ | | -|re_res| | | | | | | | | ✔ | +|field name/module name |ocr_det |ocr_cls |ocr_rec |ocr_system |structure_table |structure_system |structure_layout |kie_ser |kie_re | +|--- |--- |--- |--- |--- |--- |--- |--- |--- |--- | +|angle | |✔ | |✔ | | | | +|text | | |✔ |✔ | |✔ | |✔ |✔ | +|confidence | |✔ |✔ |✔ | |✔ | |✔ |✔ | +|text_region |✔ | | |✔ | |✔ | |✔ |✔ | +|html | | | | |✔ |✔ | | | | +|regions | | | | |✔ |✔ | | | | +|layout | | | | | | |✔ | | | +|ser_res | | | | | | | |✔ | | +|re_res | | | | | | | | |✔ | -**Note:** If you need to add, delete or modify the returned fields, you can modify the file `module.py` of the corresponding module. For the complete process, refer to the user-defined modification service module in the next section. +**Note:** If you need to add, delete or modify the returned fields, you can modify the file `module.py` of the corresponding module. For the complete process, refer to the user-defined modification service module in the next section. -## 5. User defined service module modification -If you need to modify the service logic, the following steps are generally required (take the modification of `ocr_system` for example): +## 5. User-defined service module modification +If you need to modify the service logic, the following steps are generally required (take the modification of `deploy/hubserving/ocr_system` for example): -- 1. Stop service -```shell +1. Stop service: +```bash hub serving stop --port/-p XXXX ``` -- 2. Modify the code in the corresponding files, like `module.py` and `params.py`, according to the actual needs. -For example, if you need to replace the model used by the deployed service, you need to modify model path parameters `det_model_dir` and `rec_model_dir` in `params.py`. If you want to turn off the text direction classifier, set the parameter `use_angle_cls` to `False`. Of course, other related parameters may need to be modified at the same time. Please modify and debug according to the actual situation. It is suggested to run `module.py` directly for debugging after modification before starting the service test. -**Note** The image input shape used by the PPOCR-v3 recognition model is `3, 48, 320`, so you need to modify `cfg.rec_image_shape = "3, 48, 320"` in `params.py`, if you do not use the PPOCR-v3 recognition model, then there is no need to modify this parameter. -- 3. Uninstall old service module -```shell -hub uninstall ocr_system -``` -- 4. Install modified service module -```shell -hub install deploy/hubserving/ocr_system/ -``` -- 5. Restart service -```shell -hub serving start -m ocr_system -``` +2. Modify the code in the corresponding files under `deploy/hubserving/ocr_system`, such as `module.py` and `params.py`, to your actual needs. + + For example, if you need to replace the model used by the deployed service, you need to modify model path parameters `det_model_dir` and `rec_model_dir` in `params.py`. If you want to turn off the text direction classifier, set the parameter `use_angle_cls` to `False`. + + Of course, other related parameters may need to be modified at the same time. Please modify and debug according to the actual situation. + + **It is suggested to run `module.py` directly for debugging after modification before starting the service test.** + + **Note** The image input shape used by the PPOCR-v3 recognition model is `3, 48, 320`, so you need to modify `cfg.rec_image_shape = "3, 48, 320"` in `params.py`, if you do not use the PPOCR-v3 recognition model, then there is no need to modify this parameter. +3. (Optional) If you want to rename the module, the following lines should be modified: + - [`ocr_system` within `from deploy.hubserving.ocr_system.params import read_params`](https://github.com/PaddlePaddle/PaddleOCR/blob/a923f35de57b5e378f8dd16e54d0a3e4f51267fd/deploy/hubserving/ocr_system/module.py#L35) + - [`ocr_system` within `name="ocr_system",`](https://github.com/PaddlePaddle/PaddleOCR/blob/a923f35de57b5e378f8dd16e54d0a3e4f51267fd/deploy/hubserving/ocr_system/module.py#L39) +4. (Optional) It may require you to delete the directory `__pycache__` to force flush build cache of CPython: + ```bash + find deploy/hubserving/ocr_system -name '__pycache__' -exec rm -r {} \; + ``` +5. Install modified service module: + ```bash + hub install deploy/hubserving/ocr_system/ + ``` +6. Restart service: + ```bash + hub serving start -m ocr_system + ``` diff --git a/deploy/pdserving/README_CN.md b/deploy/pdserving/README_CN.md index ab05b766e3..be314b9e75 100644 --- a/deploy/pdserving/README_CN.md +++ b/deploy/pdserving/README_CN.md @@ -106,13 +106,13 @@ python3 -m paddle_serving_client.convert --dirname ./ch_PP-OCRv3_rec_infer/ \ 检测模型转换完成后,会在当前文件夹多出`ppocr_det_v3_serving` 和`ppocr_det_v3_client`的文件夹,具备如下格式: ``` |- ppocr_det_v3_serving/ - |- __model__ + |- __model__ |- __params__ - |- serving_server_conf.prototxt + |- serving_server_conf.prototxt |- serving_server_conf.stream.prototxt |- ppocr_det_v3_client - |- serving_client_conf.prototxt + |- serving_client_conf.prototxt |- serving_client_conf.stream.prototxt ``` @@ -232,6 +232,7 @@ cp -rf general_detection_op.cpp Serving/core/general-server/op # 启动服务,运行日志保存在log.txt python3 -m paddle_serving_server.serve --model ppocr_det_v3_serving ppocr_rec_v3_serving --op GeneralDetectionOp GeneralInferOp --port 8181 &>log.txt & ``` + 成功启动服务后,log.txt中会打印类似如下日志 ![](./imgs/start_server.png) diff --git a/doc/doc_ch/FAQ.md b/doc/doc_ch/FAQ.md index a4437b8b78..531d649178 100644 --- a/doc/doc_ch/FAQ.md +++ b/doc/doc_ch/FAQ.md @@ -188,7 +188,7 @@ A:可以看下训练的尺度和预测的尺度是否相同,如果训练的 #### Q: 如何识别招牌或者广告图中的艺术字? -**A**: 招牌或者广告图中的艺术字是文本识别一个非常有挑战性的难题,因为艺术字中的单字和印刷体相比,变化非常大。如果需要识别的艺术字是在一个词典列表内,可以将改每个词典认为是一个待识别图像模板,通过通用图像检索识别系统解决识别问题。可以尝试使用PaddleClas的图像识别系统。 +**A**: 招牌或者广告图中的艺术字是文本识别一个非常有挑战性的难题,因为艺术字中的单字和印刷体相比,变化非常大。如果需要识别的艺术字是在一个词典列表内,可以将该每个词典认为是一个待识别图像模板,通过通用图像检索识别系统解决识别问题。可以尝试使用PaddleClas的图像识别系统PP-shituV2。 #### Q: 图像正常识别出来的文字是OK的,旋转90度后识别出来的结果就比较差,有什么方法可以优化? @@ -400,7 +400,7 @@ StyleText的用途主要是:提取style_image中的字体、背景等style信 A:无论是文字检测,还是文字识别,骨干网络的选择是预测效果和预测效率的权衡。一般,选择更大规模的骨干网络,例如ResNet101_vd,则检测或识别更准确,但预测耗时相应也会增加。而选择更小规模的骨干网络,例如MobileNetV3_small_x0_35,则预测更快,但检测或识别的准确率会大打折扣。幸运的是不同骨干网络的检测或识别效果与在ImageNet数据集图像1000分类任务效果正相关。飞桨图像分类套件PaddleClas汇总了ResNet_vd、Res2Net、HRNet、MobileNetV3、GhostNet等23种系列的分类网络结构,在上述图像分类任务的top1识别准确率,GPU(V100和T4)和CPU(骁龙855)的预测耗时以及相应的117个预训练模型下载地址。 -(1)文字检测骨干网络的替换,主要是确定类似与ResNet的4个stages,以方便集成后续的类似FPN的检测头。此外,对于文字检测问题,使用ImageNet训练的分类预训练模型,可以加速收敛和效果提升。 +(1)文字检测骨干网络的替换,主要是确定类似于ResNet的4个stages,以方便集成后续的类似FPN的检测头。此外,对于文字检测问题,使用ImageNet训练的分类预训练模型,可以加速收敛和效果提升。 (2)文字识别的骨干网络的替换,需要注意网络宽高stride的下降位置。由于文本识别一般宽高比例很大,因此高度下降频率少一些,宽度下降频率多一些。可以参考PaddleOCR中MobileNetV3骨干网络的改动。 diff --git a/doc/doc_ch/PPOCRv3_det_train.md b/doc/doc_ch/PPOCRv3_det_train.md index bcddd249ab..45f459ba65 100644 --- a/doc/doc_ch/PPOCRv3_det_train.md +++ b/doc/doc_ch/PPOCRv3_det_train.md @@ -30,7 +30,7 @@ PP-OCRv3检测训练包括两个步骤: ### 2.2 训练教师模型 -教师模型训练的配置文件是[ch_PP-OCRv3_det_dml.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.5/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml)。教师模型模型结构的Backbone、Neck、Head分别为Resnet50, LKPAN, DBHead,采用DML的蒸馏方法训练。有关配置文件的详细介绍参考[文档](./knowledge_distillation)。 +教师模型训练的配置文件是[ch_PP-OCRv3_det_dml.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.5/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml)。教师模型模型结构的Backbone、Neck、Head分别为Resnet50, LKPAN, DBHead,采用DML的蒸馏方法训练。有关配置文件的详细介绍参考[文档](./knowledge_distillation.md)。 下载ImageNet预训练模型: diff --git a/doc/doc_ch/application.md b/doc/doc_ch/application.md index 5135dfac10..9105a87854 100644 --- a/doc/doc_ch/application.md +++ b/doc/doc_ch/application.md @@ -30,12 +30,12 @@ PaddleOCR场景应用覆盖通用,制造、金融、交通行业的主要OCR | 类别 | 亮点 | 类别 | 亮点 | | -------------- | ------------------------ | ------------ | --------------------- | | 表单VQA | 多模态通用表单结构化提取 | 通用卡证识别 | 通用结构化提取 | -| 增值税发票 | 尽请期待 | 身份证识别 | 结构化提取、图像阴影 | +| 增值税发票 | 敬请期待 | 身份证识别 | 结构化提取、图像阴影 | | 印章检测与识别 | 端到端弯曲文本识别 | 合同比对 | 密集文本检测、NLP串联 | ## 交通 | 类别 | 亮点 | 类别 | 亮点 | | ----------------- | ------------------------------ | ---------- | -------- | -| 车牌识别 | 多角度图像、轻量模型、端侧部署 | 快递单识别 | 尽请期待 | -| 驾驶证/行驶证识别 | 尽请期待 | | | \ No newline at end of file +| 车牌识别 | 多角度图像、轻量模型、端侧部署 | 快递单识别 | 敬请期待 | +| 驾驶证/行驶证识别 | 敬请期待 | | | diff --git a/doc/doc_ch/config.md b/doc/doc_ch/config.md index 41ba8c1f7b..3430105fb5 100644 --- a/doc/doc_ch/config.md +++ b/doc/doc_ch/config.md @@ -223,4 +223,4 @@ PaddleOCR目前已支持80种(除中文外)语种识别,`configs/rec/multi | rec_cyrillic_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 斯拉夫字母 | | rec_devanagari_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 梵文字母 | -更多支持语种请参考: [多语言模型](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_ch/multi_languages.md#%E8%AF%AD%E7%A7%8D%E7%BC%A9%E5%86%99) +更多支持语种请参考: [多语言模型](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_ch/multi_languages.md) diff --git a/doc/doc_ch/customize.md b/doc/doc_ch/customize.md index 5944bf08e4..3da61ab44b 100644 --- a/doc/doc_ch/customize.md +++ b/doc/doc_ch/customize.md @@ -27,4 +27,4 @@ PaddleOCR提供了检测和识别模型的串联工具,可以将训练好的 ``` python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/det/" --rec_model_dir="./inference/rec/" ``` -更多的文本检测、识别串联推理使用方式请参考文档教程中的[基于预测引擎推理](./inference.md)。 +更多的文本检测、识别串联推理使用方式请参考文档教程中的[基于预测引擎推理](./algorithm_inference.md)。 diff --git a/doc/doc_ch/kie.md b/doc/doc_ch/kie.md index 26d2e560fc..a41d48011d 100644 --- a/doc/doc_ch/kie.md +++ b/doc/doc_ch/kie.md @@ -205,7 +205,7 @@ Architecture: name: LayoutXLMForSer pretrained: True mode: vi - # 假设字典中包含n个字段(包含other),由于采用BIO标注,则类别数为2n-1 + # 由于采用BIO标注,假设字典中包含n个字段(包含other)时,则类别数为2n-1; 假设字典中包含n个字段(不含other)时,则类别数为2n+1。否则在train过程会报:IndexError: (OutOfRange) label value should less than the shape of axis dimension 。 num_classes: &num_classes 7 PostProcess: diff --git a/doc/doc_ch/knowledge_distillation.md b/doc/doc_ch/knowledge_distillation.md index 79c4418d53..f9cbcbfa36 100644 --- a/doc/doc_ch/knowledge_distillation.md +++ b/doc/doc_ch/knowledge_distillation.md @@ -69,7 +69,7 @@ PaddleOCR中集成了知识蒸馏的算法,具体地,有以下几个主要 ```yaml Architecture: - model_type: &model_type "rec" # 模型类别,rec、det等,每个子网络的模型类别都与 + model_type: &model_type "rec" # 模型类别,rec、det等,每个子网络的模型类别 name: DistillationModel # 结构名称,蒸馏任务中,为DistillationModel,用于构建对应的结构 algorithm: Distillation # 算法名称 Models: # 模型,包含子网络的配置信息 diff --git a/doc/doc_ch/models_list.md b/doc/doc_ch/models_list.md index 7126a1a3cc..9b1dc97114 100644 --- a/doc/doc_ch/models_list.md +++ b/doc/doc_ch/models_list.md @@ -107,6 +107,7 @@ PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训 |en_number_mobile_slim_v2.0_rec|slim裁剪量化版超轻量模型,支持英文、数字识别|[rec_en_number_lite_train.yml](../../configs/rec/multi_language/rec_en_number_lite_train.yml)| 2.7M | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/en_number_mobile_v2.0_rec_slim_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/en_number_mobile_v2.0_rec_slim_train.tar) | |en_number_mobile_v2.0_rec|原始超轻量模型,支持英文、数字识别|[rec_en_number_lite_train.yml](../../configs/rec/multi_language/rec_en_number_lite_train.yml)|2.6M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_train.tar) | +**注意:** 所有英文识别模型的字典文件均为`ppocr/utils/en_dict.txt` ### 2.3 多语言识别模型(更多语言持续更新中...) @@ -152,3 +153,4 @@ Paddle-Lite 是一个高性能、轻量级、灵活性强且易于扩展的深 |PP-OCRv2(slim)|蒸馏版超轻量中文OCR移动端模型|4.9M|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_opt.nb)|v2.9| |V2.0|ppocr_v2.0超轻量中文OCR移动端模型|7.8M|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_det_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_rec_opt.nb)|v2.9| |V2.0(slim)|ppocr_v2.0超轻量中文OCR移动端模型|3.3M|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_det_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_rec_slim_opt.nb)|v2.9| + diff --git a/doc/doc_en/models_list_en.md b/doc/doc_en/models_list_en.md index 3ec5013cfe..a56f72d8d6 100644 --- a/doc/doc_en/models_list_en.md +++ b/doc/doc_en/models_list_en.md @@ -1,8 +1,8 @@ # OCR Model List(V3, updated on 2022.4.28) > **Note** -> 1. Compared with the model v2, the 3rd version of the detection model has a improvement in accuracy, and the 2.1 version of the recognition model has optimizations in accuracy and speed with CPU. +> 1. Compared with model v2, the 3rd version of the detection model has an improvement in accuracy, and the 2.1 version of the recognition model has optimizations in accuracy and speed with CPU. > 2. Compared with [models 1.1](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/models_list_en.md), which are trained with static graph programming paradigm, models 2.0 or higher are the dynamic graph trained version and achieve close performance. -> 3. All models in this tutorial are all ppocr-series models, for more introduction of algorithms and models based on public dataset, you can refer to [algorithm overview tutorial](./algorithm_overview_en.md). +> 3. All models in this tutorial are from the PaddleOCR series, for more introduction to algorithms and models based on the public dataset, you can refer to [algorithm overview tutorial](./algorithm_overview_en.md). - [OCR Model List(V3, updated on 2022.4.28)]() - [1. Text Detection Model](#1-text-detection-model) @@ -16,15 +16,15 @@ - [3. Text Angle Classification Model](#3-text-angle-classification-model) - [4. Paddle-Lite Model](#4-paddle-lite-model) -The downloadable models provided by PaddleOCR include `inference model`, `trained model`, `pre-trained model` and `nb model`. The differences between the models are as follows: +The downloadable models provided by PaddleOCR include the `inference model`, `trained model`, `pre-trained model` and `nb model`. The differences between the models are as follows: |model type|model format|description| |--- | --- | --- | |inference model|inference.pdmodel、inference.pdiparams|Used for inference based on Paddle inference engine,[detail](./inference_ppocr_en.md)| -|trained model, pre-trained model|\*.pdparams、\*.pdopt、\*.states |The checkpoints model saved in the training process, which stores the parameters of the model, mostly used for model evaluation and continuous training.| +|trained model, pre-trained model|\*.pdparams、\*.pdopt、\*.states |The checkpoints model saved in the training process, which stores the parameters of the model, is mostly used for model evaluation and continuous training.| |nb model|\*.nb| Model optimized by Paddle-Lite, which is suitable for mobile-side deployment scenarios (Paddle-Lite is needed for nb model deployment). | -Relationship of the above models is as follows. +The relationship of the above models is as follows. ![](../imgs_en/model_prod_flow_en.png) @@ -51,10 +51,10 @@ Relationship of the above models is as follows. |model name|description|config|model size|download| | --- | --- | --- | --- | --- | -|en_PP-OCRv3_det_slim | [New] Slim qunatization with distillation lightweight detection model, supporting English | [ch_PP-OCRv3_det_cml.yml](../../configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml) | 1.1M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_slim_distill_train.tar) / [nb model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_slim_infer.nb) | -|ch_PP-OCRv3_det | [New] Original lightweight detection model, supporting English |[ch_PP-OCRv3_det_cml.yml](../../configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml)| 3.8M | [inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_distill_train.tar) | +|en_PP-OCRv3_det_slim | [New] Slim quantization with distillation lightweight detection model, supporting English | [ch_PP-OCRv3_det_cml.yml](../../configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml) | 1.1M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_slim_distill_train.tar) / [nb model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_slim_infer.nb) | +|en_PP-OCRv3_det | [New] Original lightweight detection model, supporting English |[ch_PP-OCRv3_det_cml.yml](../../configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml)| 3.8M | [inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_distill_train.tar) | -* Note: English configuration file is same as Chinese except training data, here we only provide one configuration file. +* Note: English configuration file is the same as Chinese except for training data, here we only provide one configuration file. @@ -62,10 +62,10 @@ Relationship of the above models is as follows. |model name|description|config|model size|download| | --- | --- | --- | --- | --- | -| ml_PP-OCRv3_det_slim | [New] Slim qunatization with distillation lightweight detection model, supporting English | [ch_PP-OCRv3_det_cml.yml](../../configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml) | 1.1M | [inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/Multilingual_PP-OCRv3_det_slim_infer.tar) / [trained model ](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/Multilingual_PP-OCRv3_det_slim_distill_train.tar) / [nb model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/Multilingual_PP-OCRv3_det_slim_infer.nb) | +| ml_PP-OCRv3_det_slim | [New] Slim quantization with distillation lightweight detection model, supporting English | [ch_PP-OCRv3_det_cml.yml](../../configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml) | 1.1M | [inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/Multilingual_PP-OCRv3_det_slim_infer.tar) / [trained model ](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/Multilingual_PP-OCRv3_det_slim_distill_train.tar) / [nb model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/Multilingual_PP-OCRv3_det_slim_infer.nb) | | ml_PP-OCRv3_det |[New] Original lightweight detection model, supporting English | [ch_PP-OCRv3_det_cml.yml](../../configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml)| 3.8M | [inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/Multilingual_PP-OCRv3_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/Multilingual_PP-OCRv3_det_distill_train.tar) | -* Note: English configuration file is same as Chinese except training data, here we only provide one configuration file. +* Note: English configuration file is the same as Chinese except for training data, here we only provide one configuration file. ## 2. Text Recognition Model @@ -75,27 +75,29 @@ Relationship of the above models is as follows. |model name|description|config|model size|download| | --- | --- | --- | --- | --- | -|ch_PP-OCRv3_rec_slim | [New] Slim qunatization with distillation lightweight model, supporting Chinese, English text recognition |[ch_PP-OCRv3_rec_distillation.yml](../../configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml)| 4.9M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_slim_train.tar) / [nb model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_slim_infer.nb) | +|ch_PP-OCRv3_rec_slim | [New] Slim quantization with distillation lightweight model, supporting Chinese, English text recognition |[ch_PP-OCRv3_rec_distillation.yml](../../configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml)| 4.9M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_slim_train.tar) / [nb model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_slim_infer.nb) | |ch_PP-OCRv3_rec| [New] Original lightweight model, supporting Chinese, English, multilingual text recognition |[ch_PP-OCRv3_rec_distillation.yml](../../configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml)| 12.4M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_train.tar) | -|ch_PP-OCRv2_rec_slim| Slim qunatization with distillation lightweight model, supporting Chinese, English text recognition|[ch_PP-OCRv2_rec.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml)| 9.0M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_train.tar) | -|ch_PP-OCRv2_rec| Original lightweight model, supporting Chinese, English, multilingual text recognition |[ch_PP-OCRv2_rec_distillation.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec_distillation.yml)|8.5M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar) | +|ch_PP-OCRv2_rec_slim| Slim quantization with distillation lightweight model, supporting Chinese, English text recognition|[ch_PP-OCRv2_rec.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml)| 9.0M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_train.tar) | +|ch_PP-OCRv2_rec| Original lightweight model, supporting Chinese, English, and multilingual text recognition |[ch_PP-OCRv2_rec_distillation.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec_distillation.yml)|8.5M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar) | |ch_ppocr_mobile_slim_v2.0_rec|Slim pruned and quantized lightweight model, supporting Chinese, English and number recognition|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)| 6.0M | [inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_train.tar) | |ch_ppocr_mobile_v2.0_rec|Original lightweight model, supporting Chinese, English and number recognition|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)|5.2M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) | |ch_ppocr_server_v2.0_rec|General model, supporting Chinese, English and number recognition|[rec_chinese_common_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_common_train_v2.0.yml)|94.8M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) | -**Note:** The `trained model` is fine-tuned on the `pre-trained model` with real data and synthesized vertical text data, which achieved better performance in real scene. The `pre-trained model` is directly trained on the full amount of real data and synthesized data, which is more suitable for fine-tune on your own dataset. +**Note:** The `trained model` is fine-tuned on the `pre-trained model` with real data and synthesized vertical text data, which achieved better performance in the real scene. The `pre-trained model` is directly trained on the full amount of real data and synthesized data, which is more suitable for fine-tuning your dataset. ### 2.2 English Recognition Model |model name|description|config|model size|download| | --- | --- | --- | --- | --- | -|en_PP-OCRv3_rec_slim | [New] Slim qunatization with distillation lightweight model, supporting english, English text recognition |[en_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml)| 3.2M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_slim_train.tar) / [nb model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_slim_infer.nb) | -|en_PP-OCRv3_rec| [New] Original lightweight model, supporting english, English, multilingual text recognition |[en_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml)| 9.6M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_train.tar) | +|en_PP-OCRv3_rec_slim | [New] Slim quantization with distillation lightweight model, supporting English, English text recognition |[en_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml)| 3.2M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_slim_train.tar) / [nb model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_slim_infer.nb) | +|en_PP-OCRv3_rec| [New] Original lightweight model, supporting English, English, multilingual text recognition |[en_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml)| 9.6M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_train.tar) | |en_number_mobile_slim_v2.0_rec|Slim pruned and quantized lightweight model, supporting English and number recognition|[rec_en_number_lite_train.yml](../../configs/rec/multi_language/rec_en_number_lite_train.yml)| 2.7M | [inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/en_number_mobile_v2.0_rec_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/en_number_mobile_v2.0_rec_slim_train.tar) | |en_number_mobile_v2.0_rec|Original lightweight model, supporting English and number recognition|[rec_en_number_lite_train.yml](../../configs/rec/multi_language/rec_en_number_lite_train.yml)|2.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_train.tar) | +**Note:** Dictionary file of all English recognition models is `ppocr/utils/en_dict.txt`. + ### 2.3 Multilingual Recognition Model(Updating...) @@ -112,7 +114,7 @@ Relationship of the above models is as follows. | cyrillic_PP-OCRv3_rec | ppocr/utils/dict/cyrillic_dict.txt | Lightweight model for cyrillic recognition | [cyrillic_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/cyrillic_PP-OCRv3_rec.yml) |9.6M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/cyrillic_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/cyrillic_PP-OCRv3_rec_train.tar) | | devanagari_PP-OCRv3_rec | ppocr/utils/dict/devanagari_dict.txt | Lightweight model for devanagari recognition | [devanagari_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/multi_language/devanagari_PP-OCRv3_rec.yml) |9.9M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/devanagari_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/devanagari_PP-OCRv3_rec_train.tar) | -For a complete list of languages ​​and tutorials, please refer to : [Multi-language model](./multi_languages_en.md) +For a complete list of languages ​​and tutorials, please refer to [Multi-language model](./multi_languages_en.md) ## 3. Text Angle Classification Model @@ -125,9 +127,9 @@ For a complete list of languages ​​and tutorials, please refer to : [Multi-l ## 4. Paddle-Lite Model -Paddle Lite is an updated version of Paddle-Mobile, an open-open source deep learning framework designed to make it easy to perform inference on mobile, embeded, and IoT devices. It can further optimize the inference model and generate `nb model` used for edge devices. It's suggested to optimize the quantization model using Paddle-Lite because `INT8` format is used for the model storage and inference. +Paddle Lite is an updated version of Paddle-Mobile, an open-open source deep learning framework designed to make it easy to perform inference on mobile, embedded, and IoT devices. It can further optimize the inference model and generate the `nb model` used for edge devices. It's suggested to optimize the quantization model using Paddle-Lite because the `INT8` format is used for the model storage and inference. -This chapter lists OCR nb models with PP-OCRv2 or earlier versions. You can access to the latest nb models from the above tables. +This chapter lists OCR nb models with PP-OCRv2 or earlier versions. You can access the latest nb models from the above tables. |Version|Introduction|Model size|Detection model|Text Direction model|Recognition model|Paddle-Lite branch| |---|---|---|---|---|---|---| diff --git a/doc/doc_en/recognition_en.md b/doc/doc_en/recognition_en.md index 78917aea90..bf14a3eaa3 100644 --- a/doc/doc_en/recognition_en.md +++ b/doc/doc_en/recognition_en.md @@ -80,7 +80,7 @@ PaddleOCR has built-in dictionaries, which can be used on demand. `ppocr/utils/ppocr_keys_v1.txt` is a Chinese dictionary with 6623 characters. -`ppocr/utils/ic15_dict.txt` is an English dictionary with 63 characters +`ppocr/utils/ic15_dict.txt` is an English dictionary with 36 characters `ppocr/utils/dict/french_dict.txt` is a French dictionary with 118 characters diff --git a/paddleocr.py b/paddleocr.py index 36980aec44..ee15950cc2 100644 --- a/paddleocr.py +++ b/paddleocr.py @@ -45,7 +45,8 @@ def _import_file(module_name, file_path, make_importable=False): ppocr = importlib.import_module('ppocr', 'paddleocr') ppstructure = importlib.import_module('ppstructure', 'paddleocr') from ppocr.utils.logging import get_logger -from tools.infer import predict_system + +logger = get_logger() from ppocr.utils.utility import check_and_read, get_image_file_list, alpha_to_color, binarize_img from ppocr.utils.network import maybe_download, download_with_progressbar, is_link, confirm_model_dir_url from tools.infer.utility import draw_ocr, str2bool, check_gpu @@ -635,6 +636,7 @@ def __init__(self, **kwargs): def ocr(self, img, det=True, rec=True, cls=True, bin=False, inv=False, alpha_color=(255, 255, 255)): """ OCR with PaddleOCR + args: img: img for OCR, support ndarray, img_path and list or ndarray det: use text detection or not. If False, only rec will be exec. Default is True @@ -657,12 +659,14 @@ def ocr(self, img, det=True, rec=True, cls=True, bin=False, inv=False, alpha_col # for infer pdf file if isinstance(img, list): if self.page_num > len(img) or self.page_num == 0: - self.page_num = len(img) - imgs = img[:self.page_num] + imgs = img + else: + imgs = img[:self.page_num] else: imgs = [img] def preprocess_image(_image): + _image = alpha_to_color(_image, alpha_color) if inv: _image = cv2.bitwise_not(_image) if bin: diff --git a/ppocr/data/imaug/ct_process.py b/ppocr/data/imaug/ct_process.py index 933d42f98c..2434c91609 100644 --- a/ppocr/data/imaug/ct_process.py +++ b/ppocr/data/imaug/ct_process.py @@ -14,18 +14,16 @@ import os import cv2 +import paddle import random import pyclipper -import paddle - import numpy as np -from ppocr.utils.utility import check_install - -import scipy.io as scio - from PIL import Image + import paddle.vision.transforms as transforms +from ppocr.utils.utility import check_install + class RandomScale(): def __init__(self, short_size=640, **kwargs): diff --git a/ppocr/postprocess/rec_postprocess.py b/ppocr/postprocess/rec_postprocess.py index 2ea0e1bbbd..96df7bb796 100644 --- a/ppocr/postprocess/rec_postprocess.py +++ b/ppocr/postprocess/rec_postprocess.py @@ -88,7 +88,7 @@ def get_word_info(self, text, selection): word_list = [] word_col_list = [] state_list = [] - valid_col = np.where(selection == True)[0] + valid_col = np.where(selection==True)[0] for c_i, char in enumerate(text): if '\u4e00' <= char <= '\u9fff': @@ -97,14 +97,12 @@ def get_word_info(self, text, selection): c_state = 'en&num' else: c_state = 'splitter' - - if char == '.' and state == 'en&num' and c_i + 1 < len( - text) and bool(re.search('[0-9]', text[ - c_i + 1])): # grouping floting number + + if char == '.' and state == 'en&num' and c_i + 1 < len(text) and bool(re.search('[0-9]', text[c_i+1])): # grouping floting number c_state = 'en&num' - if char == '-' and state == "en&num": # grouping word with '-', such as 'state-of-the-art' + if char == '-' and state == "en&num": # grouping word with '-', such as 'state-of-the-art' c_state = 'en&num' - + if state == None: state = c_state diff --git a/ppocr/utils/utility.py b/ppocr/utils/utility.py index 688e55698c..245e4f0cf9 100644 --- a/ppocr/utils/utility.py +++ b/ppocr/utils/utility.py @@ -14,7 +14,6 @@ import logging import os -import imghdr import cv2 import random import numpy as np @@ -59,33 +58,22 @@ def _check_image_file(path): def get_image_file_list(img_file, infer_list=None): imgs_lists = [] - if infer_list and not os.path.exists(infer_list): - raise Exception("not found infer list {}".format(infer_list)) - if infer_list: - with open(infer_list, "r") as f: - lines = f.readlines() - for line in lines: - image_path = line.strip().split("\t")[0] - image_path = os.path.join(img_file, image_path) - imgs_lists.append(image_path) - else: - if img_file is None or not os.path.exists(img_file): - raise Exception("not found any img file in {}".format(img_file)) - - img_end = {'jpg', 'bmp', 'png', 'jpeg', 'rgb', 'tif', 'tiff', 'gif', 'pdf'} - if os.path.isfile(img_file) and _check_image_file(img_file): - imgs_lists.append(img_file) - elif os.path.isdir(img_file): - for single_file in os.listdir(img_file): - file_path = os.path.join(img_file, single_file) - if os.path.isfile(file_path) and _check_image_file(file_path): - imgs_lists.append(file_path) + if img_file is None or not os.path.exists(img_file): + raise Exception("not found any img file in {}".format(img_file)) + + if os.path.isfile(img_file) and _check_image_file(img_file): + imgs_lists.append(img_file) + elif os.path.isdir(img_file): + for single_file in os.listdir(img_file): + file_path = os.path.join(img_file, single_file) + if os.path.isfile(file_path) and _check_image_file(file_path): + imgs_lists.append(file_path) + if len(imgs_lists) == 0: raise Exception("not found any img file in {}".format(img_file)) imgs_lists = sorted(imgs_lists) return imgs_lists - def binarize_img(img): if len(img.shape) == 3 and img.shape[2] == 3: gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # conversion to grayscale image diff --git a/ppstructure/README.md b/ppstructure/README.md index e44ba58860..6d426157e6 100644 --- a/ppstructure/README.md +++ b/ppstructure/README.md @@ -50,6 +50,10 @@ PP-StructureV2 supports the independent use or flexible collocation of each modu The figure shows the pipeline of layout analysis + table recognition. The image is first divided into four areas of image, text, title and table by layout analysis, and then OCR detection and recognition is performed on the three areas of image, text and title, and the table is performed table recognition, where the image will also be stored for use. +### 3.1.1 版面识别返回单字坐标 +The following figure shows the result of layout analysis on single word, please refer to the [doc](./return_word_pos.md)。 +![show_0_mdf_v2](https://github.com/PaddlePaddle/PaddleOCR/assets/43341135/799450d4-d2c5-4b61-b490-e160dc0f515c) + ### 3.2 Layout recovery The following figure shows the effect of layout recovery based on the results of layout analysis and table recognition in the previous section. diff --git a/ppstructure/README_ch.md b/ppstructure/README_ch.md index 53c251d154..019e84c1a9 100644 --- a/ppstructure/README_ch.md +++ b/ppstructure/README_ch.md @@ -52,12 +52,16 @@ PP-StructureV2支持各个模块独立使用或灵活搭配,如,可以单独 下图展示了版面分析+表格识别的整体流程,图片先有版面分析划分为图像、文本、标题和表格四种区域,然后对图像、文本和标题三种区域进行OCR的检测识别,对表格进行表格识别,其中图像还会被存储下来以便使用。 +### 3.1.1 版面识别返回单字坐标 +下图展示了基于上一节版面分析对文字进行定位的效果, 可参考[文档](./return_word_pos.md)。 +![show_0_mdf_v2](https://github.com/PaddlePaddle/PaddleOCR/assets/43341135/799450d4-d2c5-4b61-b490-e160dc0f515c) + + ### 3.2 版面恢复 下图展示了基于上一节版面分析和表格识别的结果进行版面恢复的效果。 - ### 3.3 关键信息抽取 diff --git a/ppstructure/recovery/README.md b/ppstructure/recovery/README.md index 46a348c8e5..499ef02754 100644 --- a/ppstructure/recovery/README.md +++ b/ppstructure/recovery/README.md @@ -152,7 +152,7 @@ cd PaddleOCR/ppstructure # download model mkdir inference && cd inference # Download the detection model of the ultra-lightweight English PP-OCRv3 model and unzip it -https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar && tar xf en_PP-OCRv3_det_infer.tar +wget https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar && tar xf en_PP-OCRv3_det_infer.tar # Download the recognition model of the ultra-lightweight English PP-OCRv3 model and unzip it wget https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar && tar xf en_PP-OCRv3_rec_infer.tar # Download the ultra-lightweight English table inch model and unzip it diff --git a/ppstructure/return_word_pos.md b/ppstructure/return_word_pos.md new file mode 100644 index 0000000000..5a42d1c0aa --- /dev/null +++ b/ppstructure/return_word_pos.md @@ -0,0 +1,85 @@ + +# 返回识别位置 +根据横排的文档,识别模型不仅返回识别的内容,还返回每个文字的位置。 + +## 英文文档恢复: +### 先下载推理模型: +```bash +cd PaddleOCR/ppstructure + +## download model +mkdir inference && cd inference +## Download the detection model of the ultra-lightweight English PP-OCRv3 model and unzip it +https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar && tar xf en_PP-OCRv3_det_infer.tar +## Download the recognition model of the ultra-lightweight English PP-OCRv3 model and unzip it +wget https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar && tar xf en_PP-OCRv3_rec_infer.tar +## Download the ultra-lightweight English table inch model and unzip it +wget https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/en_ppstructure_mobile_v2.0_SLANet_infer.tar +tar xf en_ppstructure_mobile_v2.0_SLANet_infer.tar +## Download the layout model of publaynet dataset and unzip it +wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar +tar xf picodet_lcnet_x1_0_fgd_layout_infer.tar +cd .. +``` +### 然后在/ppstructure/目录下使用下面的指令推理: +```bash +python predict_system.py \ + --image_dir=./docs/table/1.png \ + --det_model_dir=inference/en_PP-OCRv3_det_infer \ + --rec_model_dir=inference/en_PP-OCRv3_rec_infer \ + --rec_char_dict_path=../ppocr/utils/en_dict.txt \ + --table_model_dir=inference/en_ppstructure_mobile_v2.0_SLANet_infer \ + --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt \ + --layout_model_dir=inference/picodet_lcnet_x1_0_fgd_layout_infer \ + --layout_dict_path=../ppocr/utils/dict/layout_dict/layout_publaynet_dict.txt \ + --vis_font_path=../doc/fonts/simfang.ttf \ + --recovery=True \ + --output=../output/ \ + --return_word_box=True +``` + +### 在../output/structure/1/show_0.jpg下查看推理结果的可视化,如下图所示: +![show_0_mdf_v2](https://github.com/PaddlePaddle/PaddleOCR/assets/43341135/799450d4-d2c5-4b61-b490-e160dc0f515c) + +## 针对中文文档恢复 +### 先下载推理模型 +```bash +cd PaddleOCR/ppstructure + +## download model +cd inference +## Download the detection model of the ultra-lightweight Chinesse PP-OCRv3 model and unzip it +wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar && tar xf ch_PP-OCRv3_det_infer.tar +## Download the recognition model of the ultra-lightweight Chinese PP-OCRv3 model and unzip it +wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar && tar xf ch_PP-OCRv3_rec_infer.tar +## Download the ultra-lightweight Chinese table inch model and unzip it +wget https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar +tar xf ch_ppstructure_mobile_v2.0_SLANet_infer.tar +## Download the layout model of CDLA dataset and unzip it +wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla_infer.tar +tar xf picodet_lcnet_x1_0_fgd_layout_cdla_infer.tar +cd .. +``` + +### 上传下面的测试图片 "2.png" 至目录 ./docs/table/ 中 +![2](https://github.com/PaddlePaddle/PaddleOCR/assets/43341135/d0858341-a889-483c-8373-5ecaa57f3b20) + +### 然后在/ppstructure/目录下使用下面的指令推理 +```bash +python predict_system.py \ + --image_dir=./docs/table/2.png \ + --det_model_dir=inference/ch_PP-OCRv3_det_infer \ + --rec_model_dir=inference/ch_PP-OCRv3_rec_infer \ + --rec_char_dict_path=../ppocr/utils/ppocr_keys_v1.txt \ + --table_model_dir=inference/ch_ppstructure_mobile_v2.0_SLANet_infer \ + --table_char_dict_path=../ppocr/utils/dict/table_structure_dict_ch.txt \ + --layout_model_dir=inference/picodet_lcnet_x1_0_fgd_layout_cdla_infer \ + --layout_dict_path=../ppocr/utils/dict/layout_dict/layout_cdla_dict.txt \ + --vis_font_path=../doc/fonts/chinese_cht.ttf \ + --recovery=True \ + --output=../output/ \ + --return_word_box=True +``` + +### 在../output/structure/2/show_0.jpg下查看推理结果的可视化,如下图所示: +![show_1_mdf_v2](https://github.com/PaddlePaddle/PaddleOCR/assets/43341135/3c200538-f2e6-4d79-847a-4c4587efa9f0) diff --git a/ppstructure/utility.py b/ppstructure/utility.py index 4ab4b88b9b..7a6f36505f 100644 --- a/ppstructure/utility.py +++ b/ppstructure/utility.py @@ -19,7 +19,6 @@ from tools.infer.utility import draw_ocr_box_txt, str2bool, str2int_tuple, init_args as infer_args import math - def init_args(): parser = infer_args() @@ -193,7 +192,6 @@ def draw_structure_result(image, result, font_path): img_layout, boxes, txts, scores, font_path=font_path, drop_score=0) return im_show - def cal_ocr_word_box(rec_str, box, rec_word_info): ''' Calculate the detection frame for each word based on the results of recognition and detection of ocr''' @@ -229,7 +227,7 @@ def cal_ocr_word_box(rec_str, box, rec_word_info): if len(cn_width_list) != 0: avg_char_width = np.mean(cn_width_list) else: - avg_char_width = (bbox_x_end - bbox_x_start) / len(rec_str) +g avg_char_width = (bbox_x_end - bbox_x_start) / len(rec_str) for center_idx in cn_col_list: center_x = (center_idx + 0.5) * cell_width cell_x_start = max(int(center_x - avg_char_width / 2), @@ -242,3 +240,4 @@ def cal_ocr_word_box(rec_str, box, rec_word_info): word_box_list.append(cell) return word_box_content_list, word_box_list + diff --git a/setup.py b/setup.py index 3aa0a1701c..f308fddc5b 100644 --- a/setup.py +++ b/setup.py @@ -43,7 +43,7 @@ def readme(): version=VERSION, install_requires=load_requirements(['requirements.txt', 'ppstructure/recovery/requirements.txt']), license='Apache License 2.0', - description='Awesome OCR toolkits based on PaddlePaddle (8.6M ultra-lightweight pre-trained model, support training and deployment among server, mobile, embeded and IoT devices', + description='Awesome OCR toolkits based on PaddlePaddle (8.6M ultra-lightweight pre-trained model, support training and deployment among server, mobile, embedded and IoT devices', long_description=readme(), long_description_content_type='text/markdown', url='https://github.com/PaddlePaddle/PaddleOCR', diff --git a/tools/infer/utility.py b/tools/infer/utility.py index b064cbf189..55e752230c 100644 --- a/tools/infer/utility.py +++ b/tools/infer/utility.py @@ -31,15 +31,18 @@ def str2bool(v): return v.lower() in ("true", "yes", "t", "y", "1") + def str2int_tuple(v): return tuple([int(i.strip()) for i in v.split(",")]) + def init_args(): parser = argparse.ArgumentParser() # params for prediction engine parser.add_argument("--use_gpu", type=str2bool, default=True) parser.add_argument("--use_xpu", type=str2bool, default=False) parser.add_argument("--use_npu", type=str2bool, default=False) + parser.add_argument("--use_mlu", type=str2bool, default=False) parser.add_argument("--ir_optim", type=str2bool, default=True) parser.add_argument("--use_tensorrt", type=str2bool, default=False) parser.add_argument("--min_subgraph_size", type=int, default=15) @@ -257,6 +260,8 @@ def create_predictor(args, mode, logger): elif args.use_npu: config.enable_custom_device("npu") + elif args.use_mlu: + config.enable_custom_device("mlu") elif args.use_xpu: config.enable_xpu(10 * 1024 * 1024) else: