[Cherry-pick] Cherry-pick from release/2.6 (#11092)

* Update recognition_en.md (#10059) ic15_dict.txt only have 36 digits * Update ocr_rec.h (#9469) It is enough to include preprocess_op.h, we do not need to include ocr_cls.h. * 补充num_classes注释说明 (#10073) ser_vi_layoutxlm_xfund_zh.yml中的Architecture.Backbone.num_classes所赋值会设置给Loss.num_classes，由于采用BIO标注，假设字典中包含n个字段（包含other）时，则类别数为2n-1;假设字典中包含n个字段（不含other）时，则类别数为2n+1。 * Update algorithm_overview_en.md (#9747) Fix links to super-resolution algorithm docs * 改进文档`deploy/hubserving/readme.md`和`doc/doc_ch/models_list.md` (#9110) * Update readme.md * Update readme.md * Update readme.md * Update models_list.md * trim trailling spaces @ `deploy/hubserving/readme_en.md` * `s/shell/bash/` @ `deploy/hubserving/readme_en.md` * Update `deploy/hubserving/readme_en.md` to sync with `deploy/hubserving/readme.md` * Update deploy/hubserving/readme_en.md to sync with `deploy/hubserving/readme.md` * Update deploy/hubserving/readme_en.md to sync with `deploy/hubserving/readme.md` * Update `doc/doc_en/models_list_en.md` to sync with `doc/doc_ch/models_list_en.md` * using Grammarly to weak `deploy/hubserving/readme_en.md` * using Grammarly to tweak `doc/doc_en/models_list_en.md` * `ocr_system` module will return with values of field `confidence` * Update README_CN.md * 修复测试服务中图片转Base64的引用地址错误。 (#8334) * Update application.md * [Doc] Fix 404 link. (#10318) * Update PP-OCRv3_det_train.md * Update knowledge_distillation.md * Update config.md * Fix fitz camelCase deprecation and .PDF not being recognized as pdf file (#10181) * Fix fitz camelCase deprecation and .PDF not being recognized as pdf file * refactor get_image_file_list function * Update customize.md (#10325) * Update FAQ.md (#10345) * Update FAQ.md (#10349) * Don't break overall processing on a bad image (#10216) * Add preprocessing common to OCR tasks (#10217) Add preprocessing to options * [MLU] add mlu device for infer (#10249) * Create newfeature.md * Update newfeature.md * remove unused imported module, so can avoid PyInstaller packaged binary's start-time not found module error. (#10502) * CV套件建设专项活动 - 文字识别返回单字识别坐标 (#10515) * modification of return word box * update_implements * Update rec_postprocess.py * Update utility.py * Update README_ch.md * revert README_ch.md update * Fixed Layout recovery README file (#10493) Co-authored-by: Shubham Chambhare <shubhamchambhare@zoop.one> * update_doc * bugfix --------- Co-authored-by: ChuongLoc <89434232+ChuongLoc@users.noreply.github.com> Co-authored-by: Wang Xin <xinwang614@gmail.com> Co-authored-by: tanjh <dtdhinjapan@gmail.com> Co-authored-by: Louis Maddox <lmmx@users.noreply.github.com> Co-authored-by: n0099 <n@n0099.net> Co-authored-by: zhenliang li <37922155+shouyong@users.noreply.github.com> Co-authored-by: itasli <ilyas.tasli@outlook.fr> Co-authored-by: UserUnknownFactor <63057995+UserUnknownFactor@users.noreply.github.com> Co-authored-by: PeiyuLau <135964669+PeiyuLau@users.noreply.github.com> Co-authored-by: kerneltravel <kjpioo2006@gmail.com> Co-authored-by: ToddBear <43341135+ToddBear@users.noreply.github.com> Co-authored-by: Ligoml <39876205+Ligoml@users.noreply.github.com> Co-authored-by: Shubham Chambhare <59397280+Shubham654@users.noreply.github.com> Co-authored-by: Shubham Chambhare <shubhamchambhare@zoop.one> Co-authored-by: andyj <87074272+andyjpaddle@users.noreply.github.com>
PaddlePaddle · Oct 18, 2023 · e3fc639 · e3fc639
1 parent ce72835
commit e3fc639
Show file tree

Hide file tree

Showing 28 changed files with 719 additions and 409 deletions.
diff --git a/.github/ISSUE_TEMPLATE/newfeature.md b/.github/ISSUE_TEMPLATE/newfeature.md
@@ -0,0 +1,17 @@
+---
+name: New Feature Issue template
+about: Issue template for new features.
+title: ''
+labels: 'Code PR is needed'
+assignees: 'shiyutang'
+
+---
+
+## 背景
+
+经过需求征集https://github.com/PaddlePaddle/PaddleOCR/issues/10334 和每周技术研讨会 https://github.com/PaddlePaddle/PaddleOCR/issues/10223 讨论，我们确定了XXXX任务。
+
+## 解决步骤
+1. 根据开源代码进行网络结构、评估指标转换。代码链接：XXXX
+2. 结合[论文复现指南](https://github.com/PaddlePaddle/models/blob/release%2F2.2/tutorials/article-implementation/ArticleReproduction_CV.md)，进行前反向对齐等操作，达到论文Table.1中的指标。
+3. 参考[PR提交规范](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/doc/doc_ch/code_and_doc.md)提交代码PR到ppocr中。
diff --git a/README_ch.md b/README_ch.md
diff --git a/deploy/cpp_infer/include/ocr_rec.h b/deploy/cpp_infer/include/ocr_rec.h
@@ -17,7 +17,7 @@
 #include "paddle_api.h"
 #include "paddle_inference_api.h"
 
-#include <include/ocr_cls.h>
+#include <include/preprocess_op.h>
 #include <include/utility.h>
 
 namespace PaddleOCR {

diff --git a/deploy/docker/hubserving/README_cn.md b/deploy/docker/hubserving/README_cn.md
@@ -42,7 +42,7 @@ docker logs -f paddle_ocr
 ```
 
 ## 4.测试服务
-a. 计算待识别图片的Base64编码（如果只是测试一下效果，可以通过免费的在线工具实现，如：http://tool.chinaz.com/tools/imgtobase/）
+a. 计算待识别图片的Base64编码（如果只是测试一下效果，可以通过免费的在线工具实现，如：http://tool.chinaz.com/tools/imgtobase/
 b. 发送服务请求（可参见sample_request.txt中的值）
 ```
 curl -H "Content-Type:application/json" -X POST --data "{\"images\": [\"填入图片Base64编码(需要删除'data:image/jpg;base64,'）\"]}" http://localhost:8868/predict/ocr_system

diff --git a/deploy/hubserving/readme.md b/deploy/hubserving/readme.md
diff --git a/deploy/hubserving/readme_en.md b/deploy/hubserving/readme_en.md
diff --git a/deploy/pdserving/README_CN.md b/deploy/pdserving/README_CN.md
@@ -106,13 +106,13 @@ python3 -m paddle_serving_client.convert --dirname ./ch_PP-OCRv3_rec_infer/ \
 检测模型转换完成后，会在当前文件夹多出`ppocr_det_v3_serving` 和`ppocr_det_v3_client`的文件夹，具备如下格式：
 ```
 |- ppocr_det_v3_serving/
-  |- __model__  
+  |- __model__
   |- __params__
-  |- serving_server_conf.prototxt  
+  |- serving_server_conf.prototxt
   |- serving_server_conf.stream.prototxt
 
 |- ppocr_det_v3_client
-  |- serving_client_conf.prototxt  
+  |- serving_client_conf.prototxt
   |- serving_client_conf.stream.prototxt
 
 ```
@@ -232,6 +232,7 @@ cp -rf general_detection_op.cpp Serving/core/general-server/op
     # 启动服务，运行日志保存在log.txt
     python3 -m paddle_serving_server.serve --model ppocr_det_v3_serving ppocr_rec_v3_serving --op GeneralDetectionOp GeneralInferOp --port 8181 &>log.txt &
     ```
+
     成功启动服务后，log.txt中会打印类似如下日志
     ![](./imgs/start_server.png)
 

diff --git a/doc/doc_ch/FAQ.md b/doc/doc_ch/FAQ.md
@@ -188,7 +188,7 @@ A：可以看下训练的尺度和预测的尺度是否相同，如果训练的
 
 #### Q: 如何识别招牌或者广告图中的艺术字？
 
-**A**: 招牌或者广告图中的艺术字是文本识别一个非常有挑战性的难题，因为艺术字中的单字和印刷体相比，变化非常大。如果需要识别的艺术字是在一个词典列表内，可以将改每个词典认为是一个待识别图像模板，通过通用图像检索识别系统解决识别问题。可以尝试使用PaddleClas的图像识别系统。
+**A**: 招牌或者广告图中的艺术字是文本识别一个非常有挑战性的难题，因为艺术字中的单字和印刷体相比，变化非常大。如果需要识别的艺术字是在一个词典列表内，可以将该每个词典认为是一个待识别图像模板，通过通用图像检索识别系统解决识别问题。可以尝试使用PaddleClas的图像识别系统PP-shituV2。
 
 #### Q: 图像正常识别出来的文字是OK的，旋转90度后识别出来的结果就比较差，有什么方法可以优化？
 
@@ -400,7 +400,7 @@ StyleText的用途主要是：提取style_image中的字体、背景等style信
 
 A：无论是文字检测，还是文字识别，骨干网络的选择是预测效果和预测效率的权衡。一般，选择更大规模的骨干网络，例如ResNet101_vd，则检测或识别更准确，但预测耗时相应也会增加。而选择更小规模的骨干网络，例如MobileNetV3_small_x0_35，则预测更快，但检测或识别的准确率会大打折扣。幸运的是不同骨干网络的检测或识别效果与在ImageNet数据集图像1000分类任务效果正相关。飞桨图像分类套件PaddleClas汇总了ResNet_vd、Res2Net、HRNet、MobileNetV3、GhostNet等23种系列的分类网络结构，在上述图像分类任务的top1识别准确率，GPU(V100和T4)和CPU(骁龙855)的预测耗时以及相应的117个预训练模型下载地址。
 
-（1）文字检测骨干网络的替换，主要是确定类似与ResNet的4个stages，以方便集成后续的类似FPN的检测头。此外，对于文字检测问题，使用ImageNet训练的分类预训练模型，可以加速收敛和效果提升。
+（1）文字检测骨干网络的替换，主要是确定类似于ResNet的4个stages，以方便集成后续的类似FPN的检测头。此外，对于文字检测问题，使用ImageNet训练的分类预训练模型，可以加速收敛和效果提升。
 
 （2）文字识别的骨干网络的替换，需要注意网络宽高stride的下降位置。由于文本识别一般宽高比例很大，因此高度下降频率少一些，宽度下降频率多一些。可以参考PaddleOCR中MobileNetV3骨干网络的改动。
 

diff --git a/doc/doc_ch/PPOCRv3_det_train.md b/doc/doc_ch/PPOCRv3_det_train.md
@@ -30,7 +30,7 @@ PP-OCRv3检测训练包括两个步骤：
 
 ### 2.2 训练教师模型
 
-教师模型训练的配置文件是[ch_PP-OCRv3_det_dml.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.5/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml)。教师模型模型结构的Backbone、Neck、Head分别为Resnet50, LKPAN, DBHead，采用DML的蒸馏方法训练。有关配置文件的详细介绍参考[文档](./knowledge_distillation)。
+教师模型训练的配置文件是[ch_PP-OCRv3_det_dml.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.5/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml)。教师模型模型结构的Backbone、Neck、Head分别为Resnet50, LKPAN, DBHead，采用DML的蒸馏方法训练。有关配置文件的详细介绍参考[文档](./knowledge_distillation.md)。
 
 
 下载ImageNet预训练模型：

diff --git a/doc/doc_ch/application.md b/doc/doc_ch/application.md
@@ -30,12 +30,12 @@ PaddleOCR场景应用覆盖通用，制造、金融、交通行业的主要OCR
 | 类别           | 亮点                     | 类别         | 亮点                  |
 | -------------- | ------------------------ | ------------ | --------------------- |
 | 表单VQA        | 多模态通用表单结构化提取 | 通用卡证识别 | 通用结构化提取        |
-| 增值税发票     | 尽请期待                 | 身份证识别   | 结构化提取、图像阴影  |
+| 增值税发票     | 敬请期待                 | 身份证识别   | 结构化提取、图像阴影  |
 | 印章检测与识别 | 端到端弯曲文本识别       | 合同比对     | 密集文本检测、NLP串联 |
 
 ## 交通
 
 | 类别              | 亮点                           | 类别       | 亮点     |
 | ----------------- | ------------------------------ | ---------- | -------- |
-| 车牌识别          | 多角度图像、轻量模型、端侧部署 | 快递单识别 | 尽请期待 |
-| 驾驶证/行驶证识别 | 尽请期待                       |            |          |
+| 车牌识别          | 多角度图像、轻量模型、端侧部署 | 快递单识别 | 敬请期待 |
+| 驾驶证/行驶证识别 | 敬请期待                       |            |          |
diff --git a/doc/doc_ch/config.md b/doc/doc_ch/config.md
@@ -223,4 +223,4 @@ PaddleOCR目前已支持80种（除中文外）语种识别，`configs/rec/multi
 | rec_cyrillic_lite_train.yml |  CRNN |   Mobilenet_v3 small 0.5 |  None   |  BiLSTM |  ctc  | 斯拉夫字母  |
 | rec_devanagari_lite_train.yml |  CRNN |   Mobilenet_v3 small 0.5 |  None   |  BiLSTM |  ctc  | 梵文字母  |
 
-更多支持语种请参考: [多语言模型](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_ch/multi_languages.md#%E8%AF%AD%E7%A7%8D%E7%BC%A9%E5%86%99)
+更多支持语种请参考: [多语言模型](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_ch/multi_languages.md)
diff --git a/doc/doc_ch/customize.md b/doc/doc_ch/customize.md
@@ -27,4 +27,4 @@ PaddleOCR提供了检测和识别模型的串联工具，可以将训练好的
 ```
 python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/det/"  --rec_model_dir="./inference/rec/"
 ```
-更多的文本检测、识别串联推理使用方式请参考文档教程中的[基于预测引擎推理](./inference.md)。
+更多的文本检测、识别串联推理使用方式请参考文档教程中的[基于预测引擎推理](./algorithm_inference.md)。
diff --git a/doc/doc_ch/kie.md b/doc/doc_ch/kie.md
@@ -205,7 +205,7 @@ Architecture:
     name: LayoutXLMForSer
     pretrained: True
     mode: vi
-    # 假设字典中包含n个字段（包含other），由于采用BIO标注，则类别数为2n-1
+    # 由于采用BIO标注，假设字典中包含n个字段（包含other）时，则类别数为2n-1; 假设字典中包含n个字段（不含other）时，则类别数为2n+1。否则在train过程会报：IndexError: (OutOfRange) label value should less than the shape of axis dimension 。
     num_classes: &num_classes 7
 
 PostProcess:

diff --git a/doc/doc_ch/knowledge_distillation.md b/doc/doc_ch/knowledge_distillation.md
@@ -69,7 +69,7 @@ PaddleOCR中集成了知识蒸馏的算法，具体地，有以下几个主要
 
 ```yaml
 Architecture:
-  model_type: &model_type "rec"    # 模型类别，rec、det等，每个子网络的模型类别都与
+  model_type: &model_type "rec"    # 模型类别，rec、det等，每个子网络的模型类别
   name: DistillationModel          # 结构名称，蒸馏任务中，为DistillationModel，用于构建对应的结构
   algorithm: Distillation          # 算法名称
   Models:                          # 模型，包含子网络的配置信息

diff --git a/doc/doc_ch/models_list.md b/doc/doc_ch/models_list.md
@@ -107,6 +107,7 @@ PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训
 |en_number_mobile_slim_v2.0_rec|slim裁剪量化版超轻量模型，支持英文、数字识别|[rec_en_number_lite_train.yml](../../configs/rec/multi_language/rec_en_number_lite_train.yml)| 2.7M | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/en_number_mobile_v2.0_rec_slim_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/en_number_mobile_v2.0_rec_slim_train.tar) |
 |en_number_mobile_v2.0_rec|原始超轻量模型，支持英文、数字识别|[rec_en_number_lite_train.yml](../../configs/rec/multi_language/rec_en_number_lite_train.yml)|2.6M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_train.tar) |
 
+**注意：** 所有英文识别模型的字典文件均为`ppocr/utils/en_dict.txt`
 
 <a name="多语言识别模型"></a>
 ### 2.3 多语言识别模型（更多语言持续更新中...）
@@ -152,3 +153,4 @@ Paddle-Lite 是一个高性能、轻量级、灵活性强且易于扩展的深
 |PP-OCRv2(slim)|蒸馏版超轻量中文OCR移动端模型|4.9M|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_opt.nb)|v2.9|
 |V2.0|ppocr_v2.0超轻量中文OCR移动端模型|7.8M|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_det_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_rec_opt.nb)|v2.9|
 |V2.0(slim)|ppocr_v2.0超轻量中文OCR移动端模型|3.3M|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_det_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_rec_slim_opt.nb)|v2.9|
+