Skip to content

Commit

Permalink
[Cherry-pick] Cherry-pick from release/2.6 (#11092)
Browse files Browse the repository at this point in the history
* Update recognition_en.md (#10059)

ic15_dict.txt only have 36 digits

* Update ocr_rec.h (#9469)

It is enough to include preprocess_op.h, we do not need to include ocr_cls.h.

* 补充num_classes注释说明 (#10073)

ser_vi_layoutxlm_xfund_zh.yml中的Architecture.Backbone.num_classes所赋值会设置给Loss.num_classes,
由于采用BIO标注,假设字典中包含n个字段(包含other)时,则类别数为2n-1;假设字典中包含n个字段(不含other)时,则类别数为2n+1。

* Update algorithm_overview_en.md (#9747)

Fix links to super-resolution algorithm docs

* 改进文档`deploy/hubserving/readme.md`和`doc/doc_ch/models_list.md` (#9110)

* Update readme.md

* Update readme.md

* Update readme.md

* Update models_list.md

* trim trailling spaces @ `deploy/hubserving/readme_en.md`

* `s/shell/bash/` @ `deploy/hubserving/readme_en.md`

* Update `deploy/hubserving/readme_en.md` to sync with `deploy/hubserving/readme.md`

* Update deploy/hubserving/readme_en.md to sync with `deploy/hubserving/readme.md`

* Update deploy/hubserving/readme_en.md to sync with `deploy/hubserving/readme.md`

* Update `doc/doc_en/models_list_en.md` to sync with `doc/doc_ch/models_list_en.md`

* using Grammarly to weak `deploy/hubserving/readme_en.md`

* using Grammarly to tweak `doc/doc_en/models_list_en.md`

* `ocr_system` module will return with values of field `confidence`

* Update README_CN.md

* 修复测试服务中图片转Base64的引用地址错误。 (#8334)

* Update application.md

* [Doc] Fix 404 link.  (#10318)

* Update PP-OCRv3_det_train.md

* Update knowledge_distillation.md

* Update config.md

* Fix fitz camelCase deprecation and .PDF not being recognized as pdf file (#10181)

* Fix fitz camelCase deprecation and .PDF not being recognized as pdf file

* refactor get_image_file_list function

* Update customize.md (#10325)

* Update FAQ.md (#10345)

* Update FAQ.md (#10349)

* Don't break overall processing on a bad image (#10216)

* Add preprocessing common to OCR tasks (#10217)

Add preprocessing to options

* [MLU] add mlu device for infer (#10249)

* Create newfeature.md

* Update newfeature.md

* remove unused imported module, so can avoid PyInstaller packaged binary's start-time not found module error. (#10502)

* CV套件建设专项活动 - 文字识别返回单字识别坐标 (#10515)

* modification of return word box

* update_implements

* Update rec_postprocess.py

* Update utility.py

* Update README_ch.md

* revert README_ch.md update

* Fixed Layout recovery README file (#10493)

Co-authored-by: Shubham Chambhare <shubhamchambhare@zoop.one>

* update_doc

* bugfix

---------

Co-authored-by: ChuongLoc <89434232+ChuongLoc@users.noreply.github.com>
Co-authored-by: Wang Xin <xinwang614@gmail.com>
Co-authored-by: tanjh <dtdhinjapan@gmail.com>
Co-authored-by: Louis Maddox <lmmx@users.noreply.github.com>
Co-authored-by: n0099 <n@n0099.net>
Co-authored-by: zhenliang li <37922155+shouyong@users.noreply.github.com>
Co-authored-by: itasli <ilyas.tasli@outlook.fr>
Co-authored-by: UserUnknownFactor <63057995+UserUnknownFactor@users.noreply.github.com>
Co-authored-by: PeiyuLau <135964669+PeiyuLau@users.noreply.github.com>
Co-authored-by: kerneltravel <kjpioo2006@gmail.com>
Co-authored-by: ToddBear <43341135+ToddBear@users.noreply.github.com>
Co-authored-by: Ligoml <39876205+Ligoml@users.noreply.github.com>
Co-authored-by: Shubham Chambhare <59397280+Shubham654@users.noreply.github.com>
Co-authored-by: Shubham Chambhare <shubhamchambhare@zoop.one>
Co-authored-by: andyj <87074272+andyjpaddle@users.noreply.github.com>
  • Loading branch information
16 people authored Oct 18, 2023
1 parent ce72835 commit e3fc639
Show file tree
Hide file tree
Showing 28 changed files with 719 additions and 409 deletions.
17 changes: 17 additions & 0 deletions .github/ISSUE_TEMPLATE/newfeature.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
---
name: New Feature Issue template
about: Issue template for new features.
title: ''
labels: 'Code PR is needed'
assignees: 'shiyutang'

---

## 背景

经过需求征集https://github.com/PaddlePaddle/PaddleOCR/issues/10334 和每周技术研讨会 https://github.com/PaddlePaddle/PaddleOCR/issues/10223 讨论,我们确定了XXXX任务。

## 解决步骤
1. 根据开源代码进行网络结构、评估指标转换。代码链接:XXXX
2. 结合[论文复现指南](https://github.com/PaddlePaddle/models/blob/release%2F2.2/tutorials/article-implementation/ArticleReproduction_CV.md),进行前反向对齐等操作,达到论文Table.1中的指标。
3. 参考[PR提交规范](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/doc/doc_ch/code_and_doc.md)提交代码PR到ppocr中。
254 changes: 254 additions & 0 deletions README_ch.md

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion deploy/cpp_infer/include/ocr_rec.h
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
#include "paddle_api.h"
#include "paddle_inference_api.h"

#include <include/ocr_cls.h>
#include <include/preprocess_op.h>
#include <include/utility.h>

namespace PaddleOCR {
Expand Down
2 changes: 1 addition & 1 deletion deploy/docker/hubserving/README_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ docker logs -f paddle_ocr
```

## 4.测试服务
a. 计算待识别图片的Base64编码(如果只是测试一下效果,可以通过免费的在线工具实现,如:http://tool.chinaz.com/tools/imgtobase/
a. 计算待识别图片的Base64编码(如果只是测试一下效果,可以通过免费的在线工具实现,如:http://tool.chinaz.com/tools/imgtobase/
b. 发送服务请求(可参见sample_request.txt中的值)
```
curl -H "Content-Type:application/json" -X POST --data "{\"images\": [\"填入图片Base64编码(需要删除'data:image/jpg;base64,')\"]}" http://localhost:8868/predict/ocr_system
Expand Down
289 changes: 134 additions & 155 deletions deploy/hubserving/readme.md

Large diffs are not rendered by default.

318 changes: 144 additions & 174 deletions deploy/hubserving/readme_en.md

Large diffs are not rendered by default.

7 changes: 4 additions & 3 deletions deploy/pdserving/README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,13 +106,13 @@ python3 -m paddle_serving_client.convert --dirname ./ch_PP-OCRv3_rec_infer/ \
检测模型转换完成后,会在当前文件夹多出`ppocr_det_v3_serving``ppocr_det_v3_client`的文件夹,具备如下格式:
```
|- ppocr_det_v3_serving/
|- __model__
|- __model__
|- __params__
|- serving_server_conf.prototxt
|- serving_server_conf.prototxt
|- serving_server_conf.stream.prototxt
|- ppocr_det_v3_client
|- serving_client_conf.prototxt
|- serving_client_conf.prototxt
|- serving_client_conf.stream.prototxt
```
Expand Down Expand Up @@ -232,6 +232,7 @@ cp -rf general_detection_op.cpp Serving/core/general-server/op
# 启动服务,运行日志保存在log.txt
python3 -m paddle_serving_server.serve --model ppocr_det_v3_serving ppocr_rec_v3_serving --op GeneralDetectionOp GeneralInferOp --port 8181 &>log.txt &
```
成功启动服务后,log.txt中会打印类似如下日志
![](./imgs/start_server.png)
Expand Down
4 changes: 2 additions & 2 deletions doc/doc_ch/FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -188,7 +188,7 @@ A:可以看下训练的尺度和预测的尺度是否相同,如果训练的

#### Q: 如何识别招牌或者广告图中的艺术字?

**A**: 招牌或者广告图中的艺术字是文本识别一个非常有挑战性的难题,因为艺术字中的单字和印刷体相比,变化非常大。如果需要识别的艺术字是在一个词典列表内,可以将改每个词典认为是一个待识别图像模板,通过通用图像检索识别系统解决识别问题。可以尝试使用PaddleClas的图像识别系统
**A**: 招牌或者广告图中的艺术字是文本识别一个非常有挑战性的难题,因为艺术字中的单字和印刷体相比,变化非常大。如果需要识别的艺术字是在一个词典列表内,可以将该每个词典认为是一个待识别图像模板,通过通用图像检索识别系统解决识别问题。可以尝试使用PaddleClas的图像识别系统PP-shituV2

#### Q: 图像正常识别出来的文字是OK的,旋转90度后识别出来的结果就比较差,有什么方法可以优化?

Expand Down Expand Up @@ -400,7 +400,7 @@ StyleText的用途主要是:提取style_image中的字体、背景等style信

A:无论是文字检测,还是文字识别,骨干网络的选择是预测效果和预测效率的权衡。一般,选择更大规模的骨干网络,例如ResNet101_vd,则检测或识别更准确,但预测耗时相应也会增加。而选择更小规模的骨干网络,例如MobileNetV3_small_x0_35,则预测更快,但检测或识别的准确率会大打折扣。幸运的是不同骨干网络的检测或识别效果与在ImageNet数据集图像1000分类任务效果正相关。飞桨图像分类套件PaddleClas汇总了ResNet_vd、Res2Net、HRNet、MobileNetV3、GhostNet等23种系列的分类网络结构,在上述图像分类任务的top1识别准确率,GPU(V100和T4)和CPU(骁龙855)的预测耗时以及相应的117个预训练模型下载地址。

(1)文字检测骨干网络的替换,主要是确定类似与ResNet的4个stages,以方便集成后续的类似FPN的检测头。此外,对于文字检测问题,使用ImageNet训练的分类预训练模型,可以加速收敛和效果提升。
(1)文字检测骨干网络的替换,主要是确定类似于ResNet的4个stages,以方便集成后续的类似FPN的检测头。此外,对于文字检测问题,使用ImageNet训练的分类预训练模型,可以加速收敛和效果提升。

(2)文字识别的骨干网络的替换,需要注意网络宽高stride的下降位置。由于文本识别一般宽高比例很大,因此高度下降频率少一些,宽度下降频率多一些。可以参考PaddleOCR中MobileNetV3骨干网络的改动。

Expand Down
2 changes: 1 addition & 1 deletion doc/doc_ch/PPOCRv3_det_train.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ PP-OCRv3检测训练包括两个步骤:

### 2.2 训练教师模型

教师模型训练的配置文件是[ch_PP-OCRv3_det_dml.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.5/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml)。教师模型模型结构的Backbone、Neck、Head分别为Resnet50, LKPAN, DBHead,采用DML的蒸馏方法训练。有关配置文件的详细介绍参考[文档](./knowledge_distillation)
教师模型训练的配置文件是[ch_PP-OCRv3_det_dml.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.5/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml)。教师模型模型结构的Backbone、Neck、Head分别为Resnet50, LKPAN, DBHead,采用DML的蒸馏方法训练。有关配置文件的详细介绍参考[文档](./knowledge_distillation.md)


下载ImageNet预训练模型:
Expand Down
6 changes: 3 additions & 3 deletions doc/doc_ch/application.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,12 @@ PaddleOCR场景应用覆盖通用,制造、金融、交通行业的主要OCR
| 类别 | 亮点 | 类别 | 亮点 |
| -------------- | ------------------------ | ------------ | --------------------- |
| 表单VQA | 多模态通用表单结构化提取 | 通用卡证识别 | 通用结构化提取 |
| 增值税发票 | 尽请期待 | 身份证识别 | 结构化提取、图像阴影 |
| 增值税发票 | 敬请期待 | 身份证识别 | 结构化提取、图像阴影 |
| 印章检测与识别 | 端到端弯曲文本识别 | 合同比对 | 密集文本检测、NLP串联 |

## 交通

| 类别 | 亮点 | 类别 | 亮点 |
| ----------------- | ------------------------------ | ---------- | -------- |
| 车牌识别 | 多角度图像、轻量模型、端侧部署 | 快递单识别 | 尽请期待 |
| 驾驶证/行驶证识别 | 尽请期待 | | |
| 车牌识别 | 多角度图像、轻量模型、端侧部署 | 快递单识别 | 敬请期待 |
| 驾驶证/行驶证识别 | 敬请期待 | | |
2 changes: 1 addition & 1 deletion doc/doc_ch/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -223,4 +223,4 @@ PaddleOCR目前已支持80种(除中文外)语种识别,`configs/rec/multi
| rec_cyrillic_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 斯拉夫字母 |
| rec_devanagari_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 梵文字母 |
更多支持语种请参考: [多语言模型](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_ch/multi_languages.md#%E8%AF%AD%E7%A7%8D%E7%BC%A9%E5%86%99)
更多支持语种请参考: [多语言模型](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_ch/multi_languages.md)
2 changes: 1 addition & 1 deletion doc/doc_ch/customize.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,4 @@ PaddleOCR提供了检测和识别模型的串联工具,可以将训练好的
```
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/det/" --rec_model_dir="./inference/rec/"
```
更多的文本检测、识别串联推理使用方式请参考文档教程中的[基于预测引擎推理](./inference.md)
更多的文本检测、识别串联推理使用方式请参考文档教程中的[基于预测引擎推理](./algorithm_inference.md)
2 changes: 1 addition & 1 deletion doc/doc_ch/kie.md
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,7 @@ Architecture:
name: LayoutXLMForSer
pretrained: True
mode: vi
# 假设字典中包含n个字段(包含other),由于采用BIO标注,则类别数为2n-1
# 由于采用BIO标注,假设字典中包含n个字段(包含other)时,则类别数为2n-1; 假设字典中包含n个字段(不含other)时,则类别数为2n+1。否则在train过程会报:IndexError: (OutOfRange) label value should less than the shape of axis dimension 。
num_classes: &num_classes 7

PostProcess:
Expand Down
2 changes: 1 addition & 1 deletion doc/doc_ch/knowledge_distillation.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ PaddleOCR中集成了知识蒸馏的算法,具体地,有以下几个主要

```yaml
Architecture:
model_type: &model_type "rec" # 模型类别,rec、det等,每个子网络的模型类别都与
model_type: &model_type "rec" # 模型类别,rec、det等,每个子网络的模型类别
name: DistillationModel # 结构名称,蒸馏任务中,为DistillationModel,用于构建对应的结构
algorithm: Distillation # 算法名称
Models: # 模型,包含子网络的配置信息
Expand Down
2 changes: 2 additions & 0 deletions doc/doc_ch/models_list.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,7 @@ PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训
|en_number_mobile_slim_v2.0_rec|slim裁剪量化版超轻量模型,支持英文、数字识别|[rec_en_number_lite_train.yml](../../configs/rec/multi_language/rec_en_number_lite_train.yml)| 2.7M | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/en_number_mobile_v2.0_rec_slim_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/en_number_mobile_v2.0_rec_slim_train.tar) |
|en_number_mobile_v2.0_rec|原始超轻量模型,支持英文、数字识别|[rec_en_number_lite_train.yml](../../configs/rec/multi_language/rec_en_number_lite_train.yml)|2.6M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_train.tar) |

**注意:** 所有英文识别模型的字典文件均为`ppocr/utils/en_dict.txt`

<a name="多语言识别模型"></a>
### 2.3 多语言识别模型(更多语言持续更新中...)
Expand Down Expand Up @@ -152,3 +153,4 @@ Paddle-Lite 是一个高性能、轻量级、灵活性强且易于扩展的深
|PP-OCRv2(slim)|蒸馏版超轻量中文OCR移动端模型|4.9M|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_opt.nb)|v2.9|
|V2.0|ppocr_v2.0超轻量中文OCR移动端模型|7.8M|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_det_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_rec_opt.nb)|v2.9|
|V2.0(slim)|ppocr_v2.0超轻量中文OCR移动端模型|3.3M|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_det_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_rec_slim_opt.nb)|v2.9|

Loading

0 comments on commit e3fc639

Please sign in to comment.