From 1f1bd0ca14a4ce035a2d59ed5f021860a4e0bad8 Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Sat, 8 Oct 2022 06:40:49 +0000 Subject: [PATCH 01/52] Imporve OCR Readme --- examples/vision/ocr/README.md | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/examples/vision/ocr/README.md b/examples/vision/ocr/README.md index 4a88654d78..c863f2ed79 100644 --- a/examples/vision/ocr/README.md +++ b/examples/vision/ocr/README.md @@ -17,3 +17,24 @@ | PPOCRv2_mobile |[ch_ppocr_mobile_v2.0_det](https://bj.bcebos.com/paddlehub/fastdeploy/ch_ppocr_mobile_v2.0_det_infer.tar.gz) | [ch_ppocr_mobile_v2.0_cls](https://bj.bcebos.com/paddlehub/fastdeploy/ch_ppocr_mobile_v2.0_cls_infer.tar.gz) | [ch_ppocr_mobile_v2.0_rec](https://bj.bcebos.com/paddlehub/fastdeploy/ch_ppocr_mobile_v2.0_rec_infer.tar.gz) | [ppocr_keys_v1.txt](https://bj.bcebos.com/paddlehub/fastdeploy/ppocr_keys_v1.txt) | OCRv2系列原始超轻量模型,支持中英文、多语种文本检测,比PPOCRv2更加轻量 | | PPOCRv2_server |[ch_ppocr_server_v2.0_det](https://bj.bcebos.com/paddlehub/fastdeploy/ch_ppocr_server_v2.0_det_infer.tar.gz) | [ch_ppocr_mobile_v2.0_cls](https://bj.bcebos.com/paddlehub/fastdeploy/ch_ppocr_mobile_v2.0_cls_infer.tar.gz) | [ch_ppocr_server_v2.0_rec](https://bj.bcebos.com/paddlehub/fastdeploy/ch_ppocr_server_v2.0_rec_infer.tar.gz) |[ppocr_keys_v1.txt](https://bj.bcebos.com/paddlehub/fastdeploy/ppocr_keys_v1.txt) | OCRv2服务器系列模型, 支持中英文、多语种文本检测,比超轻量模型更大,但效果更好| +### OCR 模型的处理说明 + +为了让OCR系列模型在FastDeploy多个推理后端上正确推理,以上表格中的部分模型的输入shape,和PaddleOCR套件提供的模型有差异. +所以用户在FastDeploy上推理PaddleOCR提供的模型,可能会存在shape上的报错. +例如,由PaddleOCR套件库提供的英文版PP-OCRv3_det模型,输入的shape是`[?,3,960,960]`, 而FastDeploy提供的此模型输入shape为`[?,3,?,?]`. +我们推荐用户直接下载FastDeploy提供的模型, 用户也可以参考如下工具仓库,自行修改模型的输入shape. + +仓库链接: https://github.com/jiangjiajun/PaddleUtils + +使用示例如下: +``` +#该用例将en_PP-OCRv3_det_infer模型的输入shape, 改为[-1,3,-1,-1], 并将新模型存放至output文件夹下 + +git clone git@github.com:jiangjiajun/PaddleUtils.git +cd paddle +python paddle_infer_shape.py --model_dir en_PP-OCRv3_det_infer/ \ + --model_filename inference.pdmodel \ + --params_filename inference.pdiparams \ + --save_dir output \ + --input_shape_dict="{'x':[-1,3,-1,-1]}" +``` From b23825ab67e9e78ff798a53bfe8a37bf5572be3f Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Sat, 8 Oct 2022 06:41:20 +0000 Subject: [PATCH 02/52] Improve OCR Readme --- examples/vision/ocr/README.md | 1 - 1 file changed, 1 deletion(-) diff --git a/examples/vision/ocr/README.md b/examples/vision/ocr/README.md index c863f2ed79..230dd3cea5 100644 --- a/examples/vision/ocr/README.md +++ b/examples/vision/ocr/README.md @@ -29,7 +29,6 @@ 使用示例如下: ``` #该用例将en_PP-OCRv3_det_infer模型的输入shape, 改为[-1,3,-1,-1], 并将新模型存放至output文件夹下 - git clone git@github.com:jiangjiajun/PaddleUtils.git cd paddle python paddle_infer_shape.py --model_dir en_PP-OCRv3_det_infer/ \ From 22dfedb8d4cd21bf2f91aed38f5040e8628b6269 Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Sat, 8 Oct 2022 06:42:34 +0000 Subject: [PATCH 03/52] Improve OCR Readme --- examples/vision/ocr/README.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/examples/vision/ocr/README.md b/examples/vision/ocr/README.md index 230dd3cea5..a192aff6d4 100644 --- a/examples/vision/ocr/README.md +++ b/examples/vision/ocr/README.md @@ -20,9 +20,8 @@ ### OCR 模型的处理说明 为了让OCR系列模型在FastDeploy多个推理后端上正确推理,以上表格中的部分模型的输入shape,和PaddleOCR套件提供的模型有差异. -所以用户在FastDeploy上推理PaddleOCR提供的模型,可能会存在shape上的报错. 例如,由PaddleOCR套件库提供的英文版PP-OCRv3_det模型,输入的shape是`[?,3,960,960]`, 而FastDeploy提供的此模型输入shape为`[?,3,?,?]`. -我们推荐用户直接下载FastDeploy提供的模型, 用户也可以参考如下工具仓库,自行修改模型的输入shape. +所以用户在FastDeploy上推理PaddleOCR提供的模型,可能会存在shape上的报错.我们推荐用户直接下载FastDeploy提供的模型, 用户也可以参考如下工具仓库,自行修改模型的输入shape. 仓库链接: https://github.com/jiangjiajun/PaddleUtils From 755a19d2cb39d58ca7302b4d834a1714d7dd5a4a Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Sat, 8 Oct 2022 06:44:51 +0000 Subject: [PATCH 04/52] Improve OCR Readme --- examples/vision/ocr/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/vision/ocr/README.md b/examples/vision/ocr/README.md index a192aff6d4..202d9ac650 100644 --- a/examples/vision/ocr/README.md +++ b/examples/vision/ocr/README.md @@ -20,7 +20,7 @@ ### OCR 模型的处理说明 为了让OCR系列模型在FastDeploy多个推理后端上正确推理,以上表格中的部分模型的输入shape,和PaddleOCR套件提供的模型有差异. -例如,由PaddleOCR套件库提供的英文版PP-OCRv3_det模型,输入的shape是`[?,3,960,960]`, 而FastDeploy提供的此模型输入shape为`[?,3,?,?]`. +例如,由PaddleOCR套件库提供的英文版PP-OCRv3_det模型,输入的shape是`[-1,3,960,960]`, 而FastDeploy提供的此模型输入shape为`[-1,3,-1,-1]`. 所以用户在FastDeploy上推理PaddleOCR提供的模型,可能会存在shape上的报错.我们推荐用户直接下载FastDeploy提供的模型, 用户也可以参考如下工具仓库,自行修改模型的输入shape. 仓库链接: https://github.com/jiangjiajun/PaddleUtils From 338ed80516e185299c21fb0c07913ef6f38cb57e Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Sat, 8 Oct 2022 09:10:36 +0000 Subject: [PATCH 05/52] Improve OCR Readme --- examples/vision/ocr/README.md | 33 ++++++++++++++++++++++++++------- 1 file changed, 26 insertions(+), 7 deletions(-) diff --git a/examples/vision/ocr/README.md b/examples/vision/ocr/README.md index 202d9ac650..94ac022874 100644 --- a/examples/vision/ocr/README.md +++ b/examples/vision/ocr/README.md @@ -11,19 +11,25 @@ | OCR版本 | 文本框检测 | 方向分类模型 | 文字识别 |字典文件| 说明 | |:----|:----|:----|:----|:----|:--------| -| PPOCRv3[推荐] |[ch_PP-OCRv3_det](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar) | [ch_ppocr_mobile_v2.0_cls](https://bj.bcebos.com/paddlehub/fastdeploy/ch_ppocr_mobile_v2.0_cls_infer.tar.gz) | [ch_PP-OCRv3_rec](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar) | [ppocr_keys_v1.txt](https://bj.bcebos.com/paddlehub/fastdeploy/ppocr_keys_v1.txt) | OCRv3系列原始超轻量模型,支持中英文、多语种文本检测 | -| PPOCRv3[推荐] |[en_PP-OCRv3_det](https://bj.bcebos.com/paddlehub/fastdeploy/en_PP-OCRv3_det_infer.tar.gz) | [ch_ppocr_mobile_v2.0_cls](https://bj.bcebos.com/paddlehub/fastdeploy/ch_ppocr_mobile_v2.0_cls_infer.tar.gz) | [en_PP-OCRv3_rec](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar) | [en_dict.txt](https://bj.bcebos.com/paddlehub/fastdeploy/en_dict.txt) | OCRv3系列原始超轻量模型,支持英文与数字识别,除检测模型和识别模型的训练数据与中文模型不同以外,无其他区别 | -| PPOCRv2 |[ch_PP-OCRv2_det](https://bj.bcebos.com/paddlehub/fastdeploy/ch_PP-OCRv2_det_infer.tar.gz) | [ch_ppocr_mobile_v2.0_cls](https://bj.bcebos.com/paddlehub/fastdeploy/ch_ppocr_mobile_v2.0_cls_infer.tar.gz) | [ch_PP-OCRv2_rec](https://bj.bcebos.com/paddlehub/fastdeploy/ch_PP-OCRv2_rec_infer.tar.gz) | [ppocr_keys_v1.txt](https://bj.bcebos.com/paddlehub/fastdeploy/ppocr_keys_v1.txt) | OCRv2系列原始超轻量模型,支持中英文、多语种文本检测 | -| PPOCRv2_mobile |[ch_ppocr_mobile_v2.0_det](https://bj.bcebos.com/paddlehub/fastdeploy/ch_ppocr_mobile_v2.0_det_infer.tar.gz) | [ch_ppocr_mobile_v2.0_cls](https://bj.bcebos.com/paddlehub/fastdeploy/ch_ppocr_mobile_v2.0_cls_infer.tar.gz) | [ch_ppocr_mobile_v2.0_rec](https://bj.bcebos.com/paddlehub/fastdeploy/ch_ppocr_mobile_v2.0_rec_infer.tar.gz) | [ppocr_keys_v1.txt](https://bj.bcebos.com/paddlehub/fastdeploy/ppocr_keys_v1.txt) | OCRv2系列原始超轻量模型,支持中英文、多语种文本检测,比PPOCRv2更加轻量 | -| PPOCRv2_server |[ch_ppocr_server_v2.0_det](https://bj.bcebos.com/paddlehub/fastdeploy/ch_ppocr_server_v2.0_det_infer.tar.gz) | [ch_ppocr_mobile_v2.0_cls](https://bj.bcebos.com/paddlehub/fastdeploy/ch_ppocr_mobile_v2.0_cls_infer.tar.gz) | [ch_ppocr_server_v2.0_rec](https://bj.bcebos.com/paddlehub/fastdeploy/ch_ppocr_server_v2.0_rec_infer.tar.gz) |[ppocr_keys_v1.txt](https://bj.bcebos.com/paddlehub/fastdeploy/ppocr_keys_v1.txt) | OCRv2服务器系列模型, 支持中英文、多语种文本检测,比超轻量模型更大,但效果更好| +| PPOCRv3[推荐] |[ch_PP-OCRv3_det](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar) | [ch_ppocr_mobile_v2.0_cls](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) | [ch_PP-OCRv3_rec](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar) | [ppocr_keys_v1.txt](https://bj.bcebos.com/paddlehub/fastdeploy/ppocr_keys_v1.txt) | OCRv3系列原始超轻量模型,支持中英文、多语种文本检测 | +| PPOCRv3[推荐] |[en_PP-OCRv3_det](https://bj.bcebos.com/paddlehub/fastdeploy/en_PP-OCRv3_det_infer.tar.gz) | [ch_ppocr_mobile_v2.0_cls](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) | [en_PP-OCRv3_rec](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar) | [en_dict.txt](https://bj.bcebos.com/paddlehub/fastdeploy/en_dict.txt) | OCRv3系列原始超轻量模型,支持英文与数字识别,除检测模型和识别模型的训练数据与中文模型不同以外,无其他区别 | +| PPOCRv2 |[ch_PP-OCRv2_det](https://bj.bcebos.com/paddlehub/fastdeploy/ch_PP-OCRv2_det_infer.tar.gz) | [ch_ppocr_mobile_v2.0_cls](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) | [ch_PP-OCRv2_rec](https://bj.bcebos.com/paddlehub/fastdeploy/ch_PP-OCRv2_rec_infer.tar.gz) | [ppocr_keys_v1.txt](https://bj.bcebos.com/paddlehub/fastdeploy/ppocr_keys_v1.txt) | OCRv2系列原始超轻量模型,支持中英文、多语种文本检测 | +| PPOCRv2_mobile |[ch_ppocr_mobile_v2.0_det](https://bj.bcebos.com/paddlehub/fastdeploy/ch_ppocr_mobile_v2.0_det_infer.tar.gz) | [ch_ppocr_mobile_v2.0_cls](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) | [ch_ppocr_mobile_v2.0_rec](https://bj.bcebos.com/paddlehub/fastdeploy/ch_ppocr_mobile_v2.0_rec_infer.tar.gz) | [ppocr_keys_v1.txt](https://bj.bcebos.com/paddlehub/fastdeploy/ppocr_keys_v1.txt) | OCRv2系列原始超轻量模型,支持中英文、多语种文本检测,比PPOCRv2更加轻量 | +| PPOCRv2_server |[ch_ppocr_server_v2.0_det](https://bj.bcebos.com/paddlehub/fastdeploy/ch_ppocr_server_v2.0_det_infer.tar.gz) | [ch_ppocr_mobile_v2.0_cls](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) | [ch_ppocr_server_v2.0_rec](https://bj.bcebos.com/paddlehub/fastdeploy/ch_ppocr_server_v2.0_rec_infer.tar.gz) |[ppocr_keys_v1.txt](https://bj.bcebos.com/paddlehub/fastdeploy/ppocr_keys_v1.txt) | OCRv2服务器系列模型, 支持中英文、多语种文本检测,比超轻量模型更大,但效果更好| ### OCR 模型的处理说明 为了让OCR系列模型在FastDeploy多个推理后端上正确推理,以上表格中的部分模型的输入shape,和PaddleOCR套件提供的模型有差异. 例如,由PaddleOCR套件库提供的英文版PP-OCRv3_det模型,输入的shape是`[-1,3,960,960]`, 而FastDeploy提供的此模型输入shape为`[-1,3,-1,-1]`. -所以用户在FastDeploy上推理PaddleOCR提供的模型,可能会存在shape上的报错.我们推荐用户直接下载FastDeploy提供的模型, 用户也可以参考如下工具仓库,自行修改模型的输入shape. -仓库链接: https://github.com/jiangjiajun/PaddleUtils +**差异存在的原因**: 当我们在ORT和OpenVINO上部署输入shape固定的模型时(指定了高和宽),由于OCR的输入图片尺寸是变化的,会报例如下面所示的错误,导致无法推理: +``` +Failed to Infer: Got invalid dimensions for input: x for the following indices +index: 3 Got: 608 Expected: 960 +``` +**解决办法**:除了直接下载FastDeploy提供的模型外,用户还可以使用如下工具仓库, 修改模型的输入shape. + +**仓库链接**: https://github.com/jiangjiajun/PaddleUtils 使用示例如下: ``` @@ -36,3 +42,16 @@ python paddle_infer_shape.py --model_dir en_PP-OCRv3_det_infer/ \ --save_dir output \ --input_shape_dict="{'x':[-1,3,-1,-1]}" ``` + +#### OCR模型输入shape更改记录 +以下表格记录了FastDeploy修改过的OCR模型的输入`('输入名':[shape])`, 供用户参考. + +| OCR版本 | 模型 | 修改前 | 修改后 | +|:----|:----|:----|:----| +|PPOCRv3 |en_PP-OCRv3_det|'x':[-1,3,960,960]|'x':[-1,3,-1,-1]| +|PPOCRv2 |ch_PP-OCRv2_det|'x':[-1,3,960,960]|'x':[-1,3,-1,-1]| +|PPOCRv2 |ch_PP-OCRv2_rec|'x':[-1,3,32,100]|'x':[-1,3,-1,-1]| +|PPOCRv2_mobile |ch_ppocr_mobile_v2.0_det|'x':[-1,3,640,640]|'x':[-1,3,-1,-1]| +|PPOCRv2_mobile|ch_ppocr_mobile_v2.0_rec|'x':[-1,3,32,100]|'x':[-1,3,-1,-1]| +|PPOCRv2_server|ch_ppocr_server_v2.0_det|'x':[-1,3,640,640]|'x':[-1,3,-1,-1]| +|PPOCRv2_server |ch_ppocr_server_v2.0_rec|'x':[-1,3,32,100]|'x':[-1,3,-1,-1]| From c4119e74d61bae4d2366f9d92b458b62d1aeedd6 Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Sun, 9 Oct 2022 02:56:36 +0000 Subject: [PATCH 06/52] Add Initialize function to PP-OCR --- examples/vision/ocr/PPOCRSystemv3/cpp/infer.cc | 5 +++++ fastdeploy/vision/ocr/ppocr/ppocr_system_v2.cc | 16 ++++++++++++++++ fastdeploy/vision/ocr/ppocr/ppocr_system_v2.h | 1 + 3 files changed, 22 insertions(+) diff --git a/examples/vision/ocr/PPOCRSystemv3/cpp/infer.cc b/examples/vision/ocr/PPOCRSystemv3/cpp/infer.cc index 4cdd8bdece..a48fb6bc08 100644 --- a/examples/vision/ocr/PPOCRSystemv3/cpp/infer.cc +++ b/examples/vision/ocr/PPOCRSystemv3/cpp/infer.cc @@ -41,6 +41,11 @@ void InitAndInfer(const std::string& det_model_dir, const std::string& cls_model // auto ocr_system_v3 = fastdeploy::application::ocrsystem::PPOCRSystemv3(&det_model, &rec_model); auto ocr_system_v3 = fastdeploy::application::ocrsystem::PPOCRSystemv3(&det_model, &cls_model, &rec_model); + if(!ocr_system_v3.Initialized()){ + std::cerr << "Failed to initialize OCR system." << std::endl; + return; + } + auto im = cv::imread(image_file); auto im_bak = im.clone(); diff --git a/fastdeploy/vision/ocr/ppocr/ppocr_system_v2.cc b/fastdeploy/vision/ocr/ppocr/ppocr_system_v2.cc index bb02ad1052..a06ab1148a 100644 --- a/fastdeploy/vision/ocr/ppocr/ppocr_system_v2.cc +++ b/fastdeploy/vision/ocr/ppocr/ppocr_system_v2.cc @@ -32,6 +32,22 @@ PPOCRSystemv2::PPOCRSystemv2(fastdeploy::vision::ocr::DBDetector* det_model, recognizer_->rec_image_shape[1] = 32; } +bool PPOCRSystemv2::Initialized() const { + + if ( detector_!=nullptr && !detector_->Initialized()){ + return false; + } + + if ( classifier_!=nullptr && !classifier_->Initialized()){ + return false; + } + + if ( recognizer_!=nullptr && !recognizer_->Initialized()){ + return false; + } + return true; +} + bool PPOCRSystemv2::Detect(cv::Mat* img, fastdeploy::vision::OCRResult* result) { if (!detector_->Predict(img, &(result->boxes))) { diff --git a/fastdeploy/vision/ocr/ppocr/ppocr_system_v2.h b/fastdeploy/vision/ocr/ppocr/ppocr_system_v2.h index 34e416946e..f2a8ccbed8 100644 --- a/fastdeploy/vision/ocr/ppocr/ppocr_system_v2.h +++ b/fastdeploy/vision/ocr/ppocr/ppocr_system_v2.h @@ -39,6 +39,7 @@ class FASTDEPLOY_DECL PPOCRSystemv2 : public FastDeployModel { fastdeploy::vision::ocr::Recognizer* rec_model); virtual bool Predict(cv::Mat* img, fastdeploy::vision::OCRResult* result); + bool Initialized() const override; protected: fastdeploy::vision::ocr::DBDetector* detector_ = nullptr; From 146d7217ffc44715a3f40356c52fabb03a1b804f Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Sun, 9 Oct 2022 03:11:31 +0000 Subject: [PATCH 07/52] Add Initialize function to PP-OCR --- examples/vision/ocr/PP-OCRv2/cpp/infer.cc | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/examples/vision/ocr/PP-OCRv2/cpp/infer.cc b/examples/vision/ocr/PP-OCRv2/cpp/infer.cc index 64cacd24ce..bf0ff5f27e 100644 --- a/examples/vision/ocr/PP-OCRv2/cpp/infer.cc +++ b/examples/vision/ocr/PP-OCRv2/cpp/infer.cc @@ -41,6 +41,11 @@ void InitAndInfer(const std::string& det_model_dir, const std::string& cls_model // auto ocr_system_v2 = fastdeploy::application::ocrsystem::PPOCRSystemv2(&det_model, &rec_model); auto ocr_system_v2 = fastdeploy::application::ocrsystem::PPOCRSystemv2(&det_model, &cls_model, &rec_model); + if(!ocr_system_v2.Initialized()){ + std::cerr << "Failed to initialize OCR system." << std::endl; + return; + } + auto im = cv::imread(image_file); auto im_bak = im.clone(); From 9dfddfdfb9db9f91a57056048b7cbc4a9738b175 Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Sun, 9 Oct 2022 04:50:01 +0000 Subject: [PATCH 08/52] Add Initialize function to PP-OCR --- fastdeploy/vision/ocr/ppocr/ppocr_system_v2.cc | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fastdeploy/vision/ocr/ppocr/ppocr_system_v2.cc b/fastdeploy/vision/ocr/ppocr/ppocr_system_v2.cc index a06ab1148a..728b9f8834 100644 --- a/fastdeploy/vision/ocr/ppocr/ppocr_system_v2.cc +++ b/fastdeploy/vision/ocr/ppocr/ppocr_system_v2.cc @@ -34,15 +34,15 @@ PPOCRSystemv2::PPOCRSystemv2(fastdeploy::vision::ocr::DBDetector* det_model, bool PPOCRSystemv2::Initialized() const { - if ( detector_!=nullptr && !detector_->Initialized()){ + if (detector_ != nullptr && !detector_->Initialized()){ return false; } - if ( classifier_!=nullptr && !classifier_->Initialized()){ + if (classifier_ != nullptr && !classifier_->Initialized()){ return false; } - if ( recognizer_!=nullptr && !recognizer_->Initialized()){ + if (recognizer_ != nullptr && !recognizer_->Initialized()){ return false; } return true; From 16e7c93b74cd33c43a171a413e991f5d1d90f54c Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Wed, 12 Oct 2022 08:57:04 +0000 Subject: [PATCH 09/52] Make all the model links come from PaddleOCR --- examples/vision/ocr/README.md | 49 ++++------------------------------- 1 file changed, 5 insertions(+), 44 deletions(-) diff --git a/examples/vision/ocr/README.md b/examples/vision/ocr/README.md index 18d4458d68..b908f1a7fd 100644 --- a/examples/vision/ocr/README.md +++ b/examples/vision/ocr/README.md @@ -11,47 +11,8 @@ | OCR版本 | 文本框检测 | 方向分类模型 | 文字识别 |字典文件| 说明 | |:----|:----|:----|:----|:----|:--------| -| ch_PP-OCRv3[推荐] |[ch_PP-OCRv3_det](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar) | [ch_ppocr_mobile_v2.0_cls](https://bj.bcebos.com/paddlehub/fastdeploy/ch_ppocr_mobile_v2.0_cls_infer.tar.gz) | [ch_PP-OCRv3_rec](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar) | [ppocr_keys_v1.txt](https://bj.bcebos.com/paddlehub/fastdeploy/ppocr_keys_v1.txt) | OCRv3系列原始超轻量模型,支持中英文、多语种文本检测 | -| en_PP-OCRv3[推荐] |[en_PP-OCRv3_det](https://bj.bcebos.com/paddlehub/fastdeploy/en_PP-OCRv3_det_infer.tar.gz) | [ch_ppocr_mobile_v2.0_cls](https://bj.bcebos.com/paddlehub/fastdeploy/ch_ppocr_mobile_v2.0_cls_infer.tar.gz) | [en_PP-OCRv3_rec](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar) | [en_dict.txt](https://bj.bcebos.com/paddlehub/fastdeploy/en_dict.txt) | OCRv3系列原始超轻量模型,支持英文与数字识别,除检测模型和识别模型的训练数据与中文模型不同以外,无其他区别 | -| ch_PP-OCRv2 |[ch_PP-OCRv2_det](https://bj.bcebos.com/paddlehub/fastdeploy/ch_PP-OCRv2_det_infer.tar.gz) | [ch_ppocr_mobile_v2.0_cls](https://bj.bcebos.com/paddlehub/fastdeploy/ch_ppocr_mobile_v2.0_cls_infer.tar.gz) | [ch_PP-OCRv2_rec](https://bj.bcebos.com/paddlehub/fastdeploy/ch_PP-OCRv2_rec_infer.tar.gz) | [ppocr_keys_v1.txt](https://bj.bcebos.com/paddlehub/fastdeploy/ppocr_keys_v1.txt) | OCRv2系列原始超轻量模型,支持中英文、多语种文本检测 | -| ch_PP-OCRv2_mobile |[ch_ppocr_mobile_v2.0_det](https://bj.bcebos.com/paddlehub/fastdeploy/ch_ppocr_mobile_v2.0_det_infer.tar.gz) | [ch_ppocr_mobile_v2.0_cls](https://bj.bcebos.com/paddlehub/fastdeploy/ch_ppocr_mobile_v2.0_cls_infer.tar.gz) | [ch_ppocr_mobile_v2.0_rec](https://bj.bcebos.com/paddlehub/fastdeploy/ch_ppocr_mobile_v2.0_rec_infer.tar.gz) | [ppocr_keys_v1.txt](https://bj.bcebos.com/paddlehub/fastdeploy/ppocr_keys_v1.txt) | OCRv2系列原始超轻量模型,支持中英文、多语种文本检测,比PPOCRv2更加轻量 | -| ch_PP-OCRv2_server |[ch_ppocr_server_v2.0_det](https://bj.bcebos.com/paddlehub/fastdeploy/ch_ppocr_server_v2.0_det_infer.tar.gz) | [ch_ppocr_mobile_v2.0_cls](https://bj.bcebos.com/paddlehub/fastdeploy/ch_ppocr_mobile_v2.0_cls_infer.tar.gz) | [ch_ppocr_server_v2.0_rec](https://bj.bcebos.com/paddlehub/fastdeploy/ch_ppocr_server_v2.0_rec_infer.tar.gz) |[ppocr_keys_v1.txt](https://bj.bcebos.com/paddlehub/fastdeploy/ppocr_keys_v1.txt) | OCRv2服务器系列模型, 支持中英文、多语种文本检测,比超轻量模型更大,但效果更好| - -### OCR 模型的处理说明 - -为了让OCR系列模型在FastDeploy多个推理后端上正确推理,以上表格中的部分模型的输入shape,和PaddleOCR套件提供的模型有差异. -例如,由PaddleOCR套件库提供的英文版PP-OCRv3_det模型,输入的shape是`[-1,3,960,960]`, 而FastDeploy提供的此模型输入shape为`[-1,3,-1,-1]`. - -**差异存在的原因**: 当我们在ORT和OpenVINO上部署输入shape固定的模型时(指定了高和宽),由于OCR的输入图片尺寸是变化的,会报例如下面所示的错误,导致无法推理: -``` -Failed to Infer: Got invalid dimensions for input: x for the following indices -index: 3 Got: 608 Expected: 960 -``` -**解决办法**:除了直接下载FastDeploy提供的模型外,用户还可以使用如下工具仓库, 修改模型的输入shape. - -**仓库链接**: https://github.com/jiangjiajun/PaddleUtils - -使用示例如下: -``` -#该用例将en_PP-OCRv3_det_infer模型的输入shape, 改为[-1,3,-1,-1], 并将新模型存放至output文件夹下 -git clone git@github.com:jiangjiajun/PaddleUtils.git -cd paddle -python paddle_infer_shape.py --model_dir en_PP-OCRv3_det_infer/ \ - --model_filename inference.pdmodel \ - --params_filename inference.pdiparams \ - --save_dir output \ - --input_shape_dict="{'x':[-1,3,-1,-1]}" -``` - -#### OCR模型输入shape更改记录 -以下表格记录了FastDeploy修改过的OCR模型的输入`('输入名':[shape])`, 供用户参考. - -| OCR版本 | 模型 | 修改前 | 修改后 | -|:----|:----|:----|:----| -|PPOCRv3 |en_PP-OCRv3_det|'x':[-1,3,960,960]|'x':[-1,3,-1,-1]| -|PPOCRv2 |ch_PP-OCRv2_det|'x':[-1,3,960,960]|'x':[-1,3,-1,-1]| -|PPOCRv2 |ch_PP-OCRv2_rec|'x':[-1,3,32,100]|'x':[-1,3,-1,-1]| -|PPOCRv2_mobile |ch_ppocr_mobile_v2.0_det|'x':[-1,3,640,640]|'x':[-1,3,-1,-1]| -|PPOCRv2_mobile|ch_ppocr_mobile_v2.0_rec|'x':[-1,3,32,100]|'x':[-1,3,-1,-1]| -|PPOCRv2_server|ch_ppocr_server_v2.0_det|'x':[-1,3,640,640]|'x':[-1,3,-1,-1]| -|PPOCRv2_server |ch_ppocr_server_v2.0_rec|'x':[-1,3,32,100]|'x':[-1,3,-1,-1]| +| ch_PP-OCRv3[推荐] |[ch_PP-OCRv3_det](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar) | [ch_ppocr_mobile_v2.0_cls](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) | [ch_PP-OCRv3_rec](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar) | [ppocr_keys_v1.txt](https://bj.bcebos.com/paddlehub/fastdeploy/ppocr_keys_v1.txt) | OCRv3系列原始超轻量模型,支持中英文、多语种文本检测 | +| en_PP-OCRv3[推荐] |[en_PP-OCRv3_det](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar) | [ch_ppocr_mobile_v2.0_cls](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) | [en_PP-OCRv3_rec](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar) | [en_dict.txt](https://bj.bcebos.com/paddlehub/fastdeploy/en_dict.txt) | OCRv3系列原始超轻量模型,支持英文与数字识别,除检测模型和识别模型的训练数据与中文模型不同以外,无其他区别 | +| ch_PP-OCRv2 |[ch_PP-OCRv2_det](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar) | [ch_ppocr_mobile_v2.0_cls]https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) | [ch_PP-OCRv2_rec](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar) | [ppocr_keys_v1.txt](https://bj.bcebos.com/paddlehub/fastdeploy/ppocr_keys_v1.txt) | OCRv2系列原始超轻量模型,支持中英文、多语种文本检测 | +| ch_PP-OCRv2_mobile |[ch_ppocr_mobile_v2.0_det](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) | [ch_ppocr_mobile_v2.0_cls](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) | [ch_ppocr_mobile_v2.0_rec](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) | [ppocr_keys_v1.txt](https://bj.bcebos.com/paddlehub/fastdeploy/ppocr_keys_v1.txt) | OCRv2系列原始超轻量模型,支持中英文、多语种文本检测,比PPOCRv2更加轻量 | +| ch_PP-OCRv2_server |[ch_ppocr_server_v2.0_det](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) | [ch_ppocr_mobile_v2.0_cls](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) | [ch_ppocr_server_v2.0_rec](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) |[ppocr_keys_v1.txt](https://bj.bcebos.com/paddlehub/fastdeploy/ppocr_keys_v1.txt) | OCRv2服务器系列模型, 支持中英文、多语种文本检测,比超轻量模型更大,但效果更好| From d72d36f0270b1eba55f5cb891eb1a1aca4a44f2d Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Thu, 13 Oct 2022 02:31:35 +0000 Subject: [PATCH 10/52] Improve OCR readme --- examples/vision/ocr/README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/examples/vision/ocr/README.md b/examples/vision/ocr/README.md index b908f1a7fd..cddc02636c 100644 --- a/examples/vision/ocr/README.md +++ b/examples/vision/ocr/README.md @@ -16,3 +16,5 @@ | ch_PP-OCRv2 |[ch_PP-OCRv2_det](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar) | [ch_ppocr_mobile_v2.0_cls]https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) | [ch_PP-OCRv2_rec](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar) | [ppocr_keys_v1.txt](https://bj.bcebos.com/paddlehub/fastdeploy/ppocr_keys_v1.txt) | OCRv2系列原始超轻量模型,支持中英文、多语种文本检测 | | ch_PP-OCRv2_mobile |[ch_ppocr_mobile_v2.0_det](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) | [ch_ppocr_mobile_v2.0_cls](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) | [ch_ppocr_mobile_v2.0_rec](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) | [ppocr_keys_v1.txt](https://bj.bcebos.com/paddlehub/fastdeploy/ppocr_keys_v1.txt) | OCRv2系列原始超轻量模型,支持中英文、多语种文本检测,比PPOCRv2更加轻量 | | ch_PP-OCRv2_server |[ch_ppocr_server_v2.0_det](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) | [ch_ppocr_mobile_v2.0_cls](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) | [ch_ppocr_server_v2.0_rec](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) |[ppocr_keys_v1.txt](https://bj.bcebos.com/paddlehub/fastdeploy/ppocr_keys_v1.txt) | OCRv2服务器系列模型, 支持中英文、多语种文本检测,比超轻量模型更大,但效果更好| + +以上模型下载链接由PaddleOCR模型库提供, 详见[PP-OCR系列模型列表](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/doc/doc_ch/models_list.md) From c47c31b067417791317de2bd6f46e76849f09ca2 Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Thu, 13 Oct 2022 02:32:52 +0000 Subject: [PATCH 11/52] Improve OCR readme --- examples/vision/ocr/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/vision/ocr/README.md b/examples/vision/ocr/README.md index cddc02636c..a4f485e301 100644 --- a/examples/vision/ocr/README.md +++ b/examples/vision/ocr/README.md @@ -13,7 +13,7 @@ |:----|:----|:----|:----|:----|:--------| | ch_PP-OCRv3[推荐] |[ch_PP-OCRv3_det](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar) | [ch_ppocr_mobile_v2.0_cls](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) | [ch_PP-OCRv3_rec](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar) | [ppocr_keys_v1.txt](https://bj.bcebos.com/paddlehub/fastdeploy/ppocr_keys_v1.txt) | OCRv3系列原始超轻量模型,支持中英文、多语种文本检测 | | en_PP-OCRv3[推荐] |[en_PP-OCRv3_det](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar) | [ch_ppocr_mobile_v2.0_cls](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) | [en_PP-OCRv3_rec](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar) | [en_dict.txt](https://bj.bcebos.com/paddlehub/fastdeploy/en_dict.txt) | OCRv3系列原始超轻量模型,支持英文与数字识别,除检测模型和识别模型的训练数据与中文模型不同以外,无其他区别 | -| ch_PP-OCRv2 |[ch_PP-OCRv2_det](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar) | [ch_ppocr_mobile_v2.0_cls]https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) | [ch_PP-OCRv2_rec](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar) | [ppocr_keys_v1.txt](https://bj.bcebos.com/paddlehub/fastdeploy/ppocr_keys_v1.txt) | OCRv2系列原始超轻量模型,支持中英文、多语种文本检测 | +| ch_PP-OCRv2 |[ch_PP-OCRv2_det](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar) | [ch_ppocr_mobile_v2.0_cls](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) | [ch_PP-OCRv2_rec](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar) | [ppocr_keys_v1.txt](https://bj.bcebos.com/paddlehub/fastdeploy/ppocr_keys_v1.txt) | OCRv2系列原始超轻量模型,支持中英文、多语种文本检测 | | ch_PP-OCRv2_mobile |[ch_ppocr_mobile_v2.0_det](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) | [ch_ppocr_mobile_v2.0_cls](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) | [ch_ppocr_mobile_v2.0_rec](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) | [ppocr_keys_v1.txt](https://bj.bcebos.com/paddlehub/fastdeploy/ppocr_keys_v1.txt) | OCRv2系列原始超轻量模型,支持中英文、多语种文本检测,比PPOCRv2更加轻量 | | ch_PP-OCRv2_server |[ch_ppocr_server_v2.0_det](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) | [ch_ppocr_mobile_v2.0_cls](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) | [ch_ppocr_server_v2.0_rec](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) |[ppocr_keys_v1.txt](https://bj.bcebos.com/paddlehub/fastdeploy/ppocr_keys_v1.txt) | OCRv2服务器系列模型, 支持中英文、多语种文本检测,比超轻量模型更大,但效果更好| From 7c99c99ad323ddd033742bb652fca58e7759a2c0 Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Thu, 13 Oct 2022 02:33:51 +0000 Subject: [PATCH 12/52] Improve OCR readme --- examples/vision/ocr/README.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/examples/vision/ocr/README.md b/examples/vision/ocr/README.md index a4f485e301..1815491d3a 100644 --- a/examples/vision/ocr/README.md +++ b/examples/vision/ocr/README.md @@ -8,6 +8,7 @@ 根据不同场景, FastDeploy汇总提供如下OCR任务部署, 用户需同时下载3个模型与字典文件(或2个,分类器可选), 完成OCR整个预测流程 ### OCR 中英文系列模型 +下表中的模型下载链接由PaddleOCR模型库提供, 详见[PP-OCR系列模型列表](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/doc/doc_ch/models_list.md) | OCR版本 | 文本框检测 | 方向分类模型 | 文字识别 |字典文件| 说明 | |:----|:----|:----|:----|:----|:--------| @@ -16,5 +17,3 @@ | ch_PP-OCRv2 |[ch_PP-OCRv2_det](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar) | [ch_ppocr_mobile_v2.0_cls](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) | [ch_PP-OCRv2_rec](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar) | [ppocr_keys_v1.txt](https://bj.bcebos.com/paddlehub/fastdeploy/ppocr_keys_v1.txt) | OCRv2系列原始超轻量模型,支持中英文、多语种文本检测 | | ch_PP-OCRv2_mobile |[ch_ppocr_mobile_v2.0_det](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) | [ch_ppocr_mobile_v2.0_cls](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) | [ch_ppocr_mobile_v2.0_rec](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) | [ppocr_keys_v1.txt](https://bj.bcebos.com/paddlehub/fastdeploy/ppocr_keys_v1.txt) | OCRv2系列原始超轻量模型,支持中英文、多语种文本检测,比PPOCRv2更加轻量 | | ch_PP-OCRv2_server |[ch_ppocr_server_v2.0_det](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) | [ch_ppocr_mobile_v2.0_cls](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) | [ch_ppocr_server_v2.0_rec](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) |[ppocr_keys_v1.txt](https://bj.bcebos.com/paddlehub/fastdeploy/ppocr_keys_v1.txt) | OCRv2服务器系列模型, 支持中英文、多语种文本检测,比超轻量模型更大,但效果更好| - -以上模型下载链接由PaddleOCR模型库提供, 详见[PP-OCR系列模型列表](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/doc/doc_ch/models_list.md) From b5f808edfad5852ca7632cd94cba896c5b65debd Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Thu, 13 Oct 2022 02:35:58 +0000 Subject: [PATCH 13/52] Improve OCR readme --- examples/vision/ocr/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/vision/ocr/README.md b/examples/vision/ocr/README.md index 1815491d3a..22a72a0bbe 100644 --- a/examples/vision/ocr/README.md +++ b/examples/vision/ocr/README.md @@ -7,7 +7,7 @@ 根据不同场景, FastDeploy汇总提供如下OCR任务部署, 用户需同时下载3个模型与字典文件(或2个,分类器可选), 完成OCR整个预测流程 -### OCR 中英文系列模型 +### PP-OCR 中英文系列模型 下表中的模型下载链接由PaddleOCR模型库提供, 详见[PP-OCR系列模型列表](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/doc/doc_ch/models_list.md) | OCR版本 | 文本框检测 | 方向分类模型 | 文字识别 |字典文件| 说明 | From f037275c2edcf46b8c2e1b24ff290e372766ea68 Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Mon, 17 Oct 2022 12:16:44 +0000 Subject: [PATCH 14/52] Add Readme for vision results --- docs/api_docs/python/vision_results.md | 68 ++++++++++++++++++++++++++ 1 file changed, 68 insertions(+) create mode 100644 docs/api_docs/python/vision_results.md diff --git a/docs/api_docs/python/vision_results.md b/docs/api_docs/python/vision_results.md new file mode 100644 index 0000000000..96c7778d3f --- /dev/null +++ b/docs/api_docs/python/vision_results.md @@ -0,0 +1,68 @@ +# Description of vision model prediction results + +## ClassifyResult +The code of ClassifyResult is defined in `fastdeploy/vision/common/result.h` and is used to indicate the classification label result and confidence the image. + +API: `fastdeploy.vision.ClassifyResult` +The ClassifyResult will return: +- **label_ids**(list of int):Member variables that represent the classification label results of a single image, the number of which is determined by the topk passed in when using the classification model. For example, you can return the label results of the top 5 categories. + +- **scores**(list of float):Member variables that indicate the confidence level of a single image on the corresponding classification result, the number of which is determined by the topk passed in when using the classification model, e.g. the confidence level of a top 5 classification can be returned. + +## SegmentationResult +The code of SegmentationResult is defined in `fastdeploy/vision/common/result.h` and is used to indicate the segmentation category predicted for each pixel in the image and the probability of the segmentation category. + +API: `fastdeploy.vision.SegmentationResult` +The SegmentationResult will return: +- **label_ids**(list of int):Member variable indicating the segmentation category for each pixel of a single image +- **score_map**(list of float):Member variable, the predicted probability value of the segmentation category corresponding to label_map (specified when exporting the model `--output_op argmax`) or the probability value normalized by softmax (specified when exporting the model `--output_op softmax` or when exporting the model `--output_op none` and set the model class member attribute `apply_softmax=true` when initializing the model) +- **shape**(list of int):Member variable indicating the shape of the output image, as H*W. + + +## DetectionResult +The code of DetectionResult is defined in `fastdeploy/vision/common/result.h` and is used to indicate the target location (detection box), target class and target confidence level detected by the image. + +API: `fastdeploy.vision.DetectionResult` +- **boxes**(list of list(float)):Member variable, represents the coordinates of all target boxes detected by a single image. boxes is a list, each element of which is a list of length 4, representing a box with 4 float values in order of xmin, ymin, xmax, ymax, i.e. the coordinates of the top left and bottom right corners. +- **socres**(list of float):Member variable indicating the confidence of all targets detected by a single image. +- **label_ids**(list of int):Member variable indicating all target categories detected for a single image. +- **masks**:Member variable that represents all instances of mask detected from a single image, with the same number of elements and shape size as boxes. +- **contain_masks**:Member variable indicating whether the detection result contains the instance mask, the result of the instance segmentation model is generally set to True. + +API: `fastdeploy.vision.Mask ` +- **data**:Member variable indicating a detected mask. +- **shape**:Member variable representing the shape of the mask, e.g. (h,w). + +## FaceDetectionResult +The FaceDetectionResult code is defined in `fastdeploy/vision/common/result.h` and is used to indicate the target frames detected by face detection, face landmarks, target confidence and the number of landmarks per face. + +API: `fastdeploy.vision.FaceDetectionResult` +- **data**(list of list(float)):Member variables that represent the coordinates of all target boxes detected by a single image. boxes is a list, each element of which is a list of length 4, representing a box with 4 float values in order of xmin, ymin, xmax, ymax, i.e. the coordinates of the top left and bottom right corners +- **scores**(list of float):Member variable indicating the confidence of all targets detected by a single image +- **landmarks**(list of list(float)): Member variables that represent the key points of all faces detected by a single image +- **landmarks_per_face**(int):Member variable indicating the number of key points in each face frame + +## FaceRecognitionResult +The FaceRecognitionResult code is defined in `fastdeploy/vision/common/result.h` and is used to indicate the embedding of the image features by the face recognition model. + +API: `fastdeploy.vision.FaceRecognitionResult` +- **landmarks_per_face**(list of float):Member variables, which indicate the final extracted features embedding of the face recognition model, can be used to calculate the feature similarity between faces. + +## MattingResult +The MattingResult code is defined in `fastdeploy/vision/common/result.h` and is used to indicate the value of alpha transparency predicted by the model, the predicted outlook, etc. + +API:`fastdeploy.vision.MattingResult` +- **alpha**(list of float):This is a one-dimensional vector of predicted alpha transparency values in the range `[0.,1.]`, with length `h*w`, h,w being the height and width of the input image. +- **foreground**(list of float):This is a one-dimensional vector for the predicted foreground, the value domain is `[0.,255.]`, the length is `h*w*c`, h,w is the height and width of the input image, c is generally 3, foreground is not necessarily there, only if the model itself predicts the foreground, this property will be valid +- **contain_foreground**(bool):Indicates whether the predicted outcome includes the foreground +- **shape**(list of int): When `contain_foreground is false, the shape only contains (h,w), when contain_foreground is true, the shape contains (h,w,c), c is generally 3 + +## OCRResult +The OCRResult code is defined in `fastdeploy/vision/common/result.h` and is used to indicate the text box detected in the image, the text box orientation classification, and the text content recognized inside the text box. + +API:`fastdeploy.vision.OCRResult ` +- **boxes**: Member variable, indicates the coordinates of all target boxes detected in a single image, `boxes.size()` indicates the number of boxes detected in a single image, each box is represented by 8 int values in order of the 4 coordinate points of the box, the order is lower left, lower right, upper right, upper left +- **text**:Member variable indicating the content of the recognized text in multiple text boxes, with the same number of elements as `boxes.size()` +- **rec_scores**:Member variable indicating the confidence level of the text identified in the box, the number of elements is the same as `boxes.size()` +- **cls_scores**:Member variable indicating the confidence level of the classification result of the text box, with the same number of elements as `boxes.size()` +- **cls_scores**:Member variable indicating the orientation category of the text box, the number of elements is the same as `boxes.size(`) From 9997fed53a17097f40aefe4332ec5e92836e714a Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Mon, 17 Oct 2022 12:20:30 +0000 Subject: [PATCH 15/52] Add Readme for vision results --- docs/api_docs/python/vision_results.md | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/docs/api_docs/python/vision_results.md b/docs/api_docs/python/vision_results.md index 96c7778d3f..999188ca9e 100644 --- a/docs/api_docs/python/vision_results.md +++ b/docs/api_docs/python/vision_results.md @@ -3,8 +3,7 @@ ## ClassifyResult The code of ClassifyResult is defined in `fastdeploy/vision/common/result.h` and is used to indicate the classification label result and confidence the image. -API: `fastdeploy.vision.ClassifyResult` -The ClassifyResult will return: +API: `fastdeploy.vision.ClassifyResult`, The ClassifyResult will return: - **label_ids**(list of int):Member variables that represent the classification label results of a single image, the number of which is determined by the topk passed in when using the classification model. For example, you can return the label results of the top 5 categories. - **scores**(list of float):Member variables that indicate the confidence level of a single image on the corresponding classification result, the number of which is determined by the topk passed in when using the classification model, e.g. the confidence level of a top 5 classification can be returned. @@ -12,8 +11,7 @@ The ClassifyResult will return: ## SegmentationResult The code of SegmentationResult is defined in `fastdeploy/vision/common/result.h` and is used to indicate the segmentation category predicted for each pixel in the image and the probability of the segmentation category. -API: `fastdeploy.vision.SegmentationResult` -The SegmentationResult will return: +API: `fastdeploy.vision.SegmentationResult`, The SegmentationResult will return: - **label_ids**(list of int):Member variable indicating the segmentation category for each pixel of a single image - **score_map**(list of float):Member variable, the predicted probability value of the segmentation category corresponding to label_map (specified when exporting the model `--output_op argmax`) or the probability value normalized by softmax (specified when exporting the model `--output_op softmax` or when exporting the model `--output_op none` and set the model class member attribute `apply_softmax=true` when initializing the model) - **shape**(list of int):Member variable indicating the shape of the output image, as H*W. @@ -22,21 +20,21 @@ The SegmentationResult will return: ## DetectionResult The code of DetectionResult is defined in `fastdeploy/vision/common/result.h` and is used to indicate the target location (detection box), target class and target confidence level detected by the image. -API: `fastdeploy.vision.DetectionResult` +API: `fastdeploy.vision.DetectionResult`, The DetectionResult will return: - **boxes**(list of list(float)):Member variable, represents the coordinates of all target boxes detected by a single image. boxes is a list, each element of which is a list of length 4, representing a box with 4 float values in order of xmin, ymin, xmax, ymax, i.e. the coordinates of the top left and bottom right corners. - **socres**(list of float):Member variable indicating the confidence of all targets detected by a single image. - **label_ids**(list of int):Member variable indicating all target categories detected for a single image. - **masks**:Member variable that represents all instances of mask detected from a single image, with the same number of elements and shape size as boxes. - **contain_masks**:Member variable indicating whether the detection result contains the instance mask, the result of the instance segmentation model is generally set to True. -API: `fastdeploy.vision.Mask ` +API: `fastdeploy.vision.Mask `, The Mask will return: - **data**:Member variable indicating a detected mask. - **shape**:Member variable representing the shape of the mask, e.g. (h,w). ## FaceDetectionResult The FaceDetectionResult code is defined in `fastdeploy/vision/common/result.h` and is used to indicate the target frames detected by face detection, face landmarks, target confidence and the number of landmarks per face. -API: `fastdeploy.vision.FaceDetectionResult` +API: `fastdeploy.vision.FaceDetectionResult`, The FaceDetectionResult will return: - **data**(list of list(float)):Member variables that represent the coordinates of all target boxes detected by a single image. boxes is a list, each element of which is a list of length 4, representing a box with 4 float values in order of xmin, ymin, xmax, ymax, i.e. the coordinates of the top left and bottom right corners - **scores**(list of float):Member variable indicating the confidence of all targets detected by a single image - **landmarks**(list of list(float)): Member variables that represent the key points of all faces detected by a single image @@ -45,13 +43,13 @@ API: `fastdeploy.vision.FaceDetectionResult` ## FaceRecognitionResult The FaceRecognitionResult code is defined in `fastdeploy/vision/common/result.h` and is used to indicate the embedding of the image features by the face recognition model. -API: `fastdeploy.vision.FaceRecognitionResult` +API: `fastdeploy.vision.FaceRecognitionResult`, The FaceRecognitionResult will return: - **landmarks_per_face**(list of float):Member variables, which indicate the final extracted features embedding of the face recognition model, can be used to calculate the feature similarity between faces. ## MattingResult The MattingResult code is defined in `fastdeploy/vision/common/result.h` and is used to indicate the value of alpha transparency predicted by the model, the predicted outlook, etc. -API:`fastdeploy.vision.MattingResult` +API:`fastdeploy.vision.MattingResult`, The MattingResult will return: - **alpha**(list of float):This is a one-dimensional vector of predicted alpha transparency values in the range `[0.,1.]`, with length `h*w`, h,w being the height and width of the input image. - **foreground**(list of float):This is a one-dimensional vector for the predicted foreground, the value domain is `[0.,255.]`, the length is `h*w*c`, h,w is the height and width of the input image, c is generally 3, foreground is not necessarily there, only if the model itself predicts the foreground, this property will be valid - **contain_foreground**(bool):Indicates whether the predicted outcome includes the foreground @@ -60,7 +58,7 @@ API:`fastdeploy.vision.MattingResult` ## OCRResult The OCRResult code is defined in `fastdeploy/vision/common/result.h` and is used to indicate the text box detected in the image, the text box orientation classification, and the text content recognized inside the text box. -API:`fastdeploy.vision.OCRResult ` +API:`fastdeploy.vision.OCRResult`, The OCRResult will return: - **boxes**: Member variable, indicates the coordinates of all target boxes detected in a single image, `boxes.size()` indicates the number of boxes detected in a single image, each box is represented by 8 int values in order of the 4 coordinate points of the box, the order is lower left, lower right, upper right, upper left - **text**:Member variable indicating the content of the recognized text in multiple text boxes, with the same number of elements as `boxes.size()` - **rec_scores**:Member variable indicating the confidence level of the text identified in the box, the number of elements is the same as `boxes.size()` From 6fd8784b1970b5ab777a33304804e65907d276af Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Mon, 17 Oct 2022 12:26:37 +0000 Subject: [PATCH 16/52] Add Readme for vision results --- docs/api_docs/python/vision_results.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/api_docs/python/vision_results.md b/docs/api_docs/python/vision_results.md index 999188ca9e..80ca8788d1 100644 --- a/docs/api_docs/python/vision_results.md +++ b/docs/api_docs/python/vision_results.md @@ -1,4 +1,4 @@ -# Description of vision model prediction results +# Description of Vision Results ## ClassifyResult The code of ClassifyResult is defined in `fastdeploy/vision/common/result.h` and is used to indicate the classification label result and confidence the image. @@ -59,8 +59,8 @@ API:`fastdeploy.vision.MattingResult`, The MattingResult will return: The OCRResult code is defined in `fastdeploy/vision/common/result.h` and is used to indicate the text box detected in the image, the text box orientation classification, and the text content recognized inside the text box. API:`fastdeploy.vision.OCRResult`, The OCRResult will return: -- **boxes**: Member variable, indicates the coordinates of all target boxes detected in a single image, `boxes.size()` indicates the number of boxes detected in a single image, each box is represented by 8 int values in order of the 4 coordinate points of the box, the order is lower left, lower right, upper right, upper left -- **text**:Member variable indicating the content of the recognized text in multiple text boxes, with the same number of elements as `boxes.size()` -- **rec_scores**:Member variable indicating the confidence level of the text identified in the box, the number of elements is the same as `boxes.size()` -- **cls_scores**:Member variable indicating the confidence level of the classification result of the text box, with the same number of elements as `boxes.size()` -- **cls_scores**:Member variable indicating the orientation category of the text box, the number of elements is the same as `boxes.size(`) +- **boxes**(list of list(int)): Member variable, indicates the coordinates of all target boxes detected in a single image, `boxes.size()` indicates the number of boxes detected in a single image, each box is represented by 8 int values in order of the 4 coordinate points of the box, the order is lower left, lower right, upper right, upper left. +- **text**(list of string):Member variable indicating the content of the recognized text in multiple text boxes, with the same number of elements as `boxes.size()` +- **rec_scores**(list of float):Member variable indicating the confidence level of the text identified in the box, the number of elements is the same as `boxes.size()` +- **cls_scores**(list of float):Member variable indicating the confidence level of the classification result of the text box, with the same number of elements as `boxes.size()` +- **cls_labels**(list if int):Member variable indicating the orientation category of the text box, the number of elements is the same as `boxes.size(`) From c34823fdf6e859e186a510e944e148bf8abb1530 Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Mon, 17 Oct 2022 12:40:16 +0000 Subject: [PATCH 17/52] Add Readme for vision results --- docs/api_docs/python/vision_results_cn.md | 64 +++++++++++++++++++ ...vision_results.md => vision_results_en.md} | 0 2 files changed, 64 insertions(+) create mode 100644 docs/api_docs/python/vision_results_cn.md rename docs/api_docs/python/{vision_results.md => vision_results_en.md} (100%) diff --git a/docs/api_docs/python/vision_results_cn.md b/docs/api_docs/python/vision_results_cn.md new file mode 100644 index 0000000000..a0d2f0edf3 --- /dev/null +++ b/docs/api_docs/python/vision_results_cn.md @@ -0,0 +1,64 @@ +# 视觉模型预测结果说明 + +## ClassifyResult +ClassifyResult代码定义在`fastdeploy/vision/common/result.h`中,用于表明图像的分类结果和置信度 + +API:`fastdeploy.vision.ClassifyResult`, 该结果返回: +**label_ids**(list of int): 成员变量,表示单张图片的分类结果,其个数根据在使用分类模型时传入的topk决定,例如可以返回top 5的分类结果 +**scores**(list of float): 成员变量,表示单张图片在相应分类结果上的置信度,其个数根据在使用分类模型时传入的topk决定,例如可以返回top 5的分类置信度 + + +## SegmentationResult +SegmentationResult代码定义在`fastdeploy/vision/common/result.h`中,用于表明图像中每个像素预测出来的分割类别和分割类别的概率值 + +API:`fastdeploy.vision.SegmentationResul`, 该结果返回: +**label_map**(list of int): 成员变量,表示单张图片每个像素点的分割类别 +**score_map**(list of float): 成员变量,与label_map一一对应的所预测的分割类别概率值(当导出模型时指定`--output_op argmax`)或者经过softmax归一化化后的概率值(当导出模型时指定`--output_op softmax`或者导出模型时指定`--output_op none`同时模型初始化的时候设置模型类成员属性`apply_softmax=true`) +**shape**(list of int): 成员变量,表示输出图片的shape,为H*W + +## DetectionResult +DetectionResult代码定义在`fastdeploy/vision/common/result.h`中,用于表明图像检测出来的目标框、目标类别和目标置信度。 + +API:`fastdeploy.vision.DetectionResult` , 该结果返回: +**boxes**(list of list(float)): 成员变量,表示单张图片检测出来的所有目标框坐标。boxes是一个list,其每个元素为一个长度为4的list, 表示为一个框,每个框以4个float数值依次表示xmin, ymin, xmax, ymax, 即左上角和右下角坐标 +**scores**(list of float): 成员变量,表示单张图片检测出来的所有目标置信度 +**label_ids**(list of int): 成员变量,表示单张图片检测出来的所有目标类别 +**masks**: 成员变量,表示单张图片检测出来的所有实例mask,其元素个数及shape大小与boxes一致 +**contain_masks**: 成员变量,表示检测结果中是否包含实例mask,实例分割模型的结果此项一般为True. +fastdeploy.vision.Mask +**data**: 成员变量,表示检测到的一个mask +**shape**: 成员变量,表示mask的shape,如 `(h,w)` + + +## FaceDetectionResult +FaceDetectionResult 代码定义在`fastdeploy/vision/common/result.h`中,用于表明人脸检测出来的目标框、人脸landmarks,目标置信度和每张人脸的landmark数量。 +API:`fastdeploy.vision.FaceDetectionResult` , 该结果返回: +**boxes**(list of list(float)): 成员变量,表示单张图片检测出来的所有目标框坐标。boxes是一个list,其每个元素为一个长度为4的list, 表示为一个框,每个框以4个float数值依次表示xmin, ymin, xmax, ymax, 即左上角和右下角坐标 +**scores**(list of float): 成员变量,表示单张图片检测出来的所有目标置信度 +**landmarks**(list of list(float)): 成员变量,表示单张图片检测出来的所有人脸的关键点 +**landmarks_per_face**(int): 成员变量,表示每个人脸框中的关键点的数量 + + +## FaceRecognitionResult +FaceRecognitionResult 代码定义在`fastdeploy/vision/common/result.h`中,用于表明人脸识别模型对图像特征的embedding。 + +API:`fastdeploy.vision.FaceRecognitionResult`, 该结果返回: +**embedding**(list of float): 成员变量,表示人脸识别模型最终提取的特征embedding,可以用来计算人脸之间的特征相似度。 + + +## MattingResult +MattingResult 代码定义在`fastdeploy/vision/common/result.h`中,用于表明模型预测的alpha透明度的值,预测的前景等。 +API:`fastdeploy.vision.MattingResult`, 该结果返回: +**alpha**(list of float): 是一维向量,为预测的alpha透明度的值,值域为`[0.,1.]`,长度为`h*w`,h,w为输入图像的高和宽 +**foreground(list of float): 是一维向量,为预测的前景,值域为`[0.,255.]`,长度为`h*w*c`,h,w为输入图像的高和宽,c一般为3,`foreground`不是一定有的,只有模型本身预测了前景,这个属性才会有效 +**contain_foreground**(bool): 表示预测的结果是否包含前景 +**shape**(list of int): 表示输出结果的shape,当`contain_foreground`为false,shape只包含`(h,w)`,当`contain_foreground`为true,shape包含`(h,w,c)`, c一般为3 + +## OCRResult +OCRResult代码定义在`fastdeploy/vision/common/result.h`中,用于表明图像检测和识别出来的文本框,文本框方向分类,以及文本框内的文本内容 +API:`fastdeploy.vision.OCRResult`, 该结果返回: +**boxes**: 成员变量,表示单张图片检测出来的所有目标框坐标,boxes.size()表示单张图内检测出的框的个数,每个框以8个int数值依次表示框的4个坐标点,顺序为左下,右下,右上,左上 +**text**: 成员变量,表示多个文本框内被识别出来的文本内容,其元素个数与`boxes.size()`一致 +**rec_scores**: 成员变量,表示文本框内识别出来的文本的置信度,其元素个数与`boxes.size()`一致 +**cls_scores**: 成员变量,表示文本框的分类结果的置信度,其元素个数与`boxes.size()`一致 +**cls_labels**: 成员变量,表示文本框的方向分类类别,其元素个数与`boxes.size()`一致 diff --git a/docs/api_docs/python/vision_results.md b/docs/api_docs/python/vision_results_en.md similarity index 100% rename from docs/api_docs/python/vision_results.md rename to docs/api_docs/python/vision_results_en.md From ae11b80dc0f262dbfb0c98120899445f1de4080c Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Mon, 17 Oct 2022 12:42:58 +0000 Subject: [PATCH 18/52] Add Readme for vision results --- docs/api_docs/python/vision_results_cn.md | 56 ++++++++++++----------- 1 file changed, 30 insertions(+), 26 deletions(-) diff --git a/docs/api_docs/python/vision_results_cn.md b/docs/api_docs/python/vision_results_cn.md index a0d2f0edf3..c0d48fb149 100644 --- a/docs/api_docs/python/vision_results_cn.md +++ b/docs/api_docs/python/vision_results_cn.md @@ -4,61 +4,65 @@ ClassifyResult代码定义在`fastdeploy/vision/common/result.h`中,用于表明图像的分类结果和置信度 API:`fastdeploy.vision.ClassifyResult`, 该结果返回: -**label_ids**(list of int): 成员变量,表示单张图片的分类结果,其个数根据在使用分类模型时传入的topk决定,例如可以返回top 5的分类结果 -**scores**(list of float): 成员变量,表示单张图片在相应分类结果上的置信度,其个数根据在使用分类模型时传入的topk决定,例如可以返回top 5的分类置信度 +- **label_ids**(list of int): 成员变量,表示单张图片的分类结果,其个数根据在使用分类模型时传入的topk决定,例如可以返回top 5的分类结果 +- **scores**(list of float): 成员变量,表示单张图片在相应分类结果上的置信度,其个数根据在使用分类模型时传入的topk决定,例如可以返回top 5的分类置信度 ## SegmentationResult SegmentationResult代码定义在`fastdeploy/vision/common/result.h`中,用于表明图像中每个像素预测出来的分割类别和分割类别的概率值 API:`fastdeploy.vision.SegmentationResul`, 该结果返回: -**label_map**(list of int): 成员变量,表示单张图片每个像素点的分割类别 -**score_map**(list of float): 成员变量,与label_map一一对应的所预测的分割类别概率值(当导出模型时指定`--output_op argmax`)或者经过softmax归一化化后的概率值(当导出模型时指定`--output_op softmax`或者导出模型时指定`--output_op none`同时模型初始化的时候设置模型类成员属性`apply_softmax=true`) -**shape**(list of int): 成员变量,表示输出图片的shape,为H*W +- **label_map**(list of int): 成员变量,表示单张图片每个像素点的分割类别 +- **score_map**(list of float): 成员变量,与label_map一一对应的所预测的分割类别概率值(当导出模型时指定`--output_op argmax`)或者经过softmax归一化化后的概率值(当导出模型时指定`--output_op softmax`或者导出模型时指定`--output_op none`同时模型初始化的时候设置模型类成员属性`apply_softmax=true`) +- **shape**(list of int): 成员变量,表示输出图片的shape,为H*W ## DetectionResult DetectionResult代码定义在`fastdeploy/vision/common/result.h`中,用于表明图像检测出来的目标框、目标类别和目标置信度。 API:`fastdeploy.vision.DetectionResult` , 该结果返回: -**boxes**(list of list(float)): 成员变量,表示单张图片检测出来的所有目标框坐标。boxes是一个list,其每个元素为一个长度为4的list, 表示为一个框,每个框以4个float数值依次表示xmin, ymin, xmax, ymax, 即左上角和右下角坐标 -**scores**(list of float): 成员变量,表示单张图片检测出来的所有目标置信度 -**label_ids**(list of int): 成员变量,表示单张图片检测出来的所有目标类别 -**masks**: 成员变量,表示单张图片检测出来的所有实例mask,其元素个数及shape大小与boxes一致 -**contain_masks**: 成员变量,表示检测结果中是否包含实例mask,实例分割模型的结果此项一般为True. +- **boxes**(list of list(float)): 成员变量,表示单张图片检测出来的所有目标框坐标。boxes是一个list,其每个元素为一个长度为4的list, 表示为一个框,每个框以4个float数值依次表示xmin, ymin, xmax, ymax, 即左上角和右下角坐标 +- **scores**(list of float): 成员变量,表示单张图片检测出来的所有目标置信度 +- **label_ids**(list of int): 成员变量,表示单张图片检测出来的所有目标类别 +- **masks**: 成员变量,表示单张图片检测出来的所有实例mask,其元素个数及shape大小与boxes一致 +- **contain_masks**: 成员变量,表示检测结果中是否包含实例mask,实例分割模型的结果此项一般为True. + fastdeploy.vision.Mask -**data**: 成员变量,表示检测到的一个mask -**shape**: 成员变量,表示mask的shape,如 `(h,w)` +- **data**: 成员变量,表示检测到的一个mask +- **shape**: 成员变量,表示mask的shape,如 `(h,w)` ## FaceDetectionResult FaceDetectionResult 代码定义在`fastdeploy/vision/common/result.h`中,用于表明人脸检测出来的目标框、人脸landmarks,目标置信度和每张人脸的landmark数量。 + API:`fastdeploy.vision.FaceDetectionResult` , 该结果返回: -**boxes**(list of list(float)): 成员变量,表示单张图片检测出来的所有目标框坐标。boxes是一个list,其每个元素为一个长度为4的list, 表示为一个框,每个框以4个float数值依次表示xmin, ymin, xmax, ymax, 即左上角和右下角坐标 -**scores**(list of float): 成员变量,表示单张图片检测出来的所有目标置信度 -**landmarks**(list of list(float)): 成员变量,表示单张图片检测出来的所有人脸的关键点 -**landmarks_per_face**(int): 成员变量,表示每个人脸框中的关键点的数量 +- **boxes**(list of list(float)): 成员变量,表示单张图片检测出来的所有目标框坐标。boxes是一个list,其每个元素为一个长度为4的list, 表示为一个框,每个框以4个float数值依次表示xmin, ymin, xmax, ymax, 即左上角和右下角坐标 +- **scores**(list of float): 成员变量,表示单张图片检测出来的所有目标置信度 +- **landmarks**(list of list(float)): 成员变量,表示单张图片检测出来的所有人脸的关键点 +- **landmarks_per_face**(int): 成员变量,表示每个人脸框中的关键点的数量 ## FaceRecognitionResult FaceRecognitionResult 代码定义在`fastdeploy/vision/common/result.h`中,用于表明人脸识别模型对图像特征的embedding。 API:`fastdeploy.vision.FaceRecognitionResult`, 该结果返回: -**embedding**(list of float): 成员变量,表示人脸识别模型最终提取的特征embedding,可以用来计算人脸之间的特征相似度。 +- **embedding**(list of float): 成员变量,表示人脸识别模型最终提取的特征embedding,可以用来计算人脸之间的特征相似度。 ## MattingResult MattingResult 代码定义在`fastdeploy/vision/common/result.h`中,用于表明模型预测的alpha透明度的值,预测的前景等。 + API:`fastdeploy.vision.MattingResult`, 该结果返回: -**alpha**(list of float): 是一维向量,为预测的alpha透明度的值,值域为`[0.,1.]`,长度为`h*w`,h,w为输入图像的高和宽 -**foreground(list of float): 是一维向量,为预测的前景,值域为`[0.,255.]`,长度为`h*w*c`,h,w为输入图像的高和宽,c一般为3,`foreground`不是一定有的,只有模型本身预测了前景,这个属性才会有效 -**contain_foreground**(bool): 表示预测的结果是否包含前景 -**shape**(list of int): 表示输出结果的shape,当`contain_foreground`为false,shape只包含`(h,w)`,当`contain_foreground`为true,shape包含`(h,w,c)`, c一般为3 +- **alpha**(list of float): 是一维向量,为预测的alpha透明度的值,值域为`[0.,1.]`,长度为`h*w`,h,w为输入图像的高和宽 +- **foreground**(list of float): 是一维向量,为预测的前景,值域为`[0.,255.]`,长度为`h*w*c`,h,w为输入图像的高和宽,c一般为3,`foreground`不是一定有的,只有模型本身预测了前景,这个属性才会有效 +- **contain_foreground**(bool): 表示预测的结果是否包含前景 +- **shape**(list of int): 表示输出结果的shape,当`contain_foreground`为false,shape只包含`(h,w)`,当`contain_foreground`为true,shape包含`(h,w,c)`, c一般为3 ## OCRResult OCRResult代码定义在`fastdeploy/vision/common/result.h`中,用于表明图像检测和识别出来的文本框,文本框方向分类,以及文本框内的文本内容 + API:`fastdeploy.vision.OCRResult`, 该结果返回: -**boxes**: 成员变量,表示单张图片检测出来的所有目标框坐标,boxes.size()表示单张图内检测出的框的个数,每个框以8个int数值依次表示框的4个坐标点,顺序为左下,右下,右上,左上 -**text**: 成员变量,表示多个文本框内被识别出来的文本内容,其元素个数与`boxes.size()`一致 -**rec_scores**: 成员变量,表示文本框内识别出来的文本的置信度,其元素个数与`boxes.size()`一致 -**cls_scores**: 成员变量,表示文本框的分类结果的置信度,其元素个数与`boxes.size()`一致 -**cls_labels**: 成员变量,表示文本框的方向分类类别,其元素个数与`boxes.size()`一致 +- **boxes**(list of list(int)): 成员变量,表示单张图片检测出来的所有目标框坐标,boxes.size()表示单张图内检测出的框的个数,每个框以8个int数值依次表示框的4个坐标点,顺序为左下,右下,右上,左上 +- **text**(list of string): 成员变量,表示多个文本框内被识别出来的文本内容,其元素个数与`boxes.size()`一致 +- **rec_scores**(list of float): 成员变量,表示文本框内识别出来的文本的置信度,其元素个数与`boxes.size()`一致 +- **cls_scores**(list of float): 成员变量,表示文本框的分类结果的置信度,其元素个数与`boxes.size()`一致 +- **cls_labels**(list of int): 成员变量,表示文本框的方向分类类别,其元素个数与`boxes.size()`一致 From cef34150bce9965cba614252dc00c3e612bd4ae1 Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Mon, 17 Oct 2022 13:01:37 +0000 Subject: [PATCH 19/52] Add Readme for vision results --- docs/api_docs/python/vision_results_cn.md | 68 +++++++++++------------ docs/api_docs/python/vision_results_en.md | 36 ++++++------ 2 files changed, 52 insertions(+), 52 deletions(-) diff --git a/docs/api_docs/python/vision_results_cn.md b/docs/api_docs/python/vision_results_cn.md index c0d48fb149..cbae4cd99e 100644 --- a/docs/api_docs/python/vision_results_cn.md +++ b/docs/api_docs/python/vision_results_cn.md @@ -1,68 +1,68 @@ # 视觉模型预测结果说明 ## ClassifyResult -ClassifyResult代码定义在`fastdeploy/vision/common/result.h`中,用于表明图像的分类结果和置信度 +ClassifyResult代码定义在`fastdeploy/vision/common/result.h`中,用于表明图像的分类结果和置信度. API:`fastdeploy.vision.ClassifyResult`, 该结果返回: -- **label_ids**(list of int): 成员变量,表示单张图片的分类结果,其个数根据在使用分类模型时传入的topk决定,例如可以返回top 5的分类结果 -- **scores**(list of float): 成员变量,表示单张图片在相应分类结果上的置信度,其个数根据在使用分类模型时传入的topk决定,例如可以返回top 5的分类置信度 +- **label_ids**(list of int): 成员变量,表示单张图片的分类结果,其个数根据在使用分类模型时传入的`topk`决定,例如可以返回`top5`的分类结果. +- **scores**(list of float): 成员变量,表示单张图片在相应分类结果上的置信度,其个数根据在使用分类模型时传入的`topk`决定,例如可以返回`top5`的分类置信度. ## SegmentationResult -SegmentationResult代码定义在`fastdeploy/vision/common/result.h`中,用于表明图像中每个像素预测出来的分割类别和分割类别的概率值 +SegmentationResult代码定义在`fastdeploy/vision/ttommon/result.h`中,用于表明图像中每个像素预测出来的分割类别和分割类别的概率值. -API:`fastdeploy.vision.SegmentationResul`, 该结果返回: -- **label_map**(list of int): 成员变量,表示单张图片每个像素点的分割类别 -- **score_map**(list of float): 成员变量,与label_map一一对应的所预测的分割类别概率值(当导出模型时指定`--output_op argmax`)或者经过softmax归一化化后的概率值(当导出模型时指定`--output_op softmax`或者导出模型时指定`--output_op none`同时模型初始化的时候设置模型类成员属性`apply_softmax=true`) -- **shape**(list of int): 成员变量,表示输出图片的shape,为H*W +API:`fastdeploy.vision.SegmentationResult`, 该结果返回: +- **label_map**(list of int): 成员变量,表示单张图片每个像素点的分割类别. +- **score_map**(list of float): 成员变量,与label_map一一对应的所预测的分割类别概率值(当导出模型时指定`--output_op argmax`)或者经过softmax归一化化后的概率值(当导出模型时指定`--output_op softmax`或者导出模型时指定`--output_op none`同时模型初始化的时候设置模型类成员属性`apply_softmax=true`). +- **shape**(list of int): 成员变量,表示输出图片的尺寸,为`H*W`. ## DetectionResult -DetectionResult代码定义在`fastdeploy/vision/common/result.h`中,用于表明图像检测出来的目标框、目标类别和目标置信度。 +DetectionResult代码定义在`fastdeploy/vision/common/result.h`中,用于表明图像检测出来的目标框、目标类别和目标置信度. API:`fastdeploy.vision.DetectionResult` , 该结果返回: -- **boxes**(list of list(float)): 成员变量,表示单张图片检测出来的所有目标框坐标。boxes是一个list,其每个元素为一个长度为4的list, 表示为一个框,每个框以4个float数值依次表示xmin, ymin, xmax, ymax, 即左上角和右下角坐标 -- **scores**(list of float): 成员变量,表示单张图片检测出来的所有目标置信度 -- **label_ids**(list of int): 成员变量,表示单张图片检测出来的所有目标类别 -- **masks**: 成员变量,表示单张图片检测出来的所有实例mask,其元素个数及shape大小与boxes一致 -- **contain_masks**: 成员变量,表示检测结果中是否包含实例mask,实例分割模型的结果此项一般为True. +- **boxes**(list of list(float)): 成员变量,表示单张图片检测出来的所有目标框坐标. boxes是一个list,其每个元素为一个长度为4的list, 表示为一个框,每个框以4个float数值依次表示xmin, ymin, xmax, ymax, 即左上角和右下角坐标. +- **scores**(list of float): 成员变量,表示单张图片检测出来的所有目标置信度. +- **label_ids**(list of int): 成员变量,表示单张图片检测出来的所有目标类别. +- **masks**: 成员变量,表示单张图片检测出来的所有实例mask,其元素个数及shape大小与boxes一致. +- **contain_masks**: 成员变量,表示检测结果中是否包含实例mask,实例分割模型的结果此项一般为`True`. fastdeploy.vision.Mask -- **data**: 成员变量,表示检测到的一个mask -- **shape**: 成员变量,表示mask的shape,如 `(h,w)` +- **data**: 成员变量,表示检测到的一个mask. +- **shape**: 成员变量,表示mask的尺寸,如 `H*W`. ## FaceDetectionResult -FaceDetectionResult 代码定义在`fastdeploy/vision/common/result.h`中,用于表明人脸检测出来的目标框、人脸landmarks,目标置信度和每张人脸的landmark数量。 +FaceDetectionResult 代码定义在`fastdeploy/vision/common/result.h`中,用于表明人脸检测出来的目标框、人脸landmarks,目标置信度和每张人脸的landmark数量. API:`fastdeploy.vision.FaceDetectionResult` , 该结果返回: -- **boxes**(list of list(float)): 成员变量,表示单张图片检测出来的所有目标框坐标。boxes是一个list,其每个元素为一个长度为4的list, 表示为一个框,每个框以4个float数值依次表示xmin, ymin, xmax, ymax, 即左上角和右下角坐标 -- **scores**(list of float): 成员变量,表示单张图片检测出来的所有目标置信度 -- **landmarks**(list of list(float)): 成员变量,表示单张图片检测出来的所有人脸的关键点 -- **landmarks_per_face**(int): 成员变量,表示每个人脸框中的关键点的数量 +- **boxes**(list of list(float)): 成员变量,表示单张图片检测出来的所有目标框坐标。boxes是一个list,其每个元素为一个长度为4的list, 表示为一个框,每个框以4个float数值依次表示xmin, ymin, xmax, ymax, 即左上角和右下角坐标. +- **scores**(list of float): 成员变量,表示单张图片检测出来的所有目标置信度. +- **landmarks**(list of list(float)): 成员变量,表示单张图片检测出来的所有人脸的关键点. +- **landmarks_per_face**(int): 成员变量,表示每个人脸框中的关键点的数量. ## FaceRecognitionResult -FaceRecognitionResult 代码定义在`fastdeploy/vision/common/result.h`中,用于表明人脸识别模型对图像特征的embedding。 +FaceRecognitionResult 代码定义在`fastdeploy/vision/common/result.h`中,用于表明人脸识别模型对图像特征的embedding. API:`fastdeploy.vision.FaceRecognitionResult`, 该结果返回: -- **embedding**(list of float): 成员变量,表示人脸识别模型最终提取的特征embedding,可以用来计算人脸之间的特征相似度。 +- **embedding**(list of float): 成员变量,表示人脸识别模型最终提取的特征embedding,可以用来计算人脸之间的特征相似度. ## MattingResult -MattingResult 代码定义在`fastdeploy/vision/common/result.h`中,用于表明模型预测的alpha透明度的值,预测的前景等。 +MattingResult 代码定义在`fastdeploy/vision/common/result.h`中,用于表明模型预测的alpha透明度的值,预测的前景等. API:`fastdeploy.vision.MattingResult`, 该结果返回: -- **alpha**(list of float): 是一维向量,为预测的alpha透明度的值,值域为`[0.,1.]`,长度为`h*w`,h,w为输入图像的高和宽 -- **foreground**(list of float): 是一维向量,为预测的前景,值域为`[0.,255.]`,长度为`h*w*c`,h,w为输入图像的高和宽,c一般为3,`foreground`不是一定有的,只有模型本身预测了前景,这个属性才会有效 -- **contain_foreground**(bool): 表示预测的结果是否包含前景 -- **shape**(list of int): 表示输出结果的shape,当`contain_foreground`为false,shape只包含`(h,w)`,当`contain_foreground`为true,shape包含`(h,w,c)`, c一般为3 +- **alpha**(list of float): 是一维向量,为预测的alpha透明度的值,值域为`[0.,1.]`,长度为`H*W`,H,W为输入图像的高和宽. +- **foreground**(list of float): 是一维向量,为预测的前景,值域为`[0.,255.]`,长度为`H*W*C`,H,W为输入图像的高和宽,C一般为3,`foreground`不是一定有的,只有模型本身预测了前景,这个属性才会有效. +- **contain_foreground**(bool): 表示预测的结果是否包含前景. +- **shape**(list of int): 表示输出结果的shape,当`contain_foreground`为`false`,shape只包含`(H,W)`,当`contain_foreground`为true,shape包含`(H,W,C)`, C一般为3. ## OCRResult -OCRResult代码定义在`fastdeploy/vision/common/result.h`中,用于表明图像检测和识别出来的文本框,文本框方向分类,以及文本框内的文本内容 +OCRResult代码定义在`fastdeploy/vision/common/result.h`中,用于表明图像检测和识别出来的文本框,文本框方向分类,以及文本框内的文本内容. API:`fastdeploy.vision.OCRResult`, 该结果返回: -- **boxes**(list of list(int)): 成员变量,表示单张图片检测出来的所有目标框坐标,boxes.size()表示单张图内检测出的框的个数,每个框以8个int数值依次表示框的4个坐标点,顺序为左下,右下,右上,左上 -- **text**(list of string): 成员变量,表示多个文本框内被识别出来的文本内容,其元素个数与`boxes.size()`一致 -- **rec_scores**(list of float): 成员变量,表示文本框内识别出来的文本的置信度,其元素个数与`boxes.size()`一致 -- **cls_scores**(list of float): 成员变量,表示文本框的分类结果的置信度,其元素个数与`boxes.size()`一致 -- **cls_labels**(list of int): 成员变量,表示文本框的方向分类类别,其元素个数与`boxes.size()`一致 +- **boxes**(list of list(int)): 成员变量,表示单张图片检测出来的所有目标框坐标,boxes.size()表示单张图内检测出的框的个数,每个框以8个int数值依次表示框的4个坐标点,顺序为左下,右下,右上,左上. +- **text**(list of string): 成员变量,表示多个文本框内被识别出来的文本内容,其元素个数与`boxes.size()`一致. +- **rec_scores**(list of float): 成员变量,表示文本框内识别出来的文本的置信度,其元素个数与`boxes.size()`一致. +- **cls_scores**(list of float): 成员变量,表示文本框的分类结果的置信度,其元素个数与`boxes.size()`一致. +- **cls_labels**(list of int): 成员变量,表示文本框的方向分类类别,其元素个数与`boxes.size()`一致. diff --git a/docs/api_docs/python/vision_results_en.md b/docs/api_docs/python/vision_results_en.md index 80ca8788d1..a1561497a6 100644 --- a/docs/api_docs/python/vision_results_en.md +++ b/docs/api_docs/python/vision_results_en.md @@ -4,17 +4,17 @@ The code of ClassifyResult is defined in `fastdeploy/vision/common/result.h` and is used to indicate the classification label result and confidence the image. API: `fastdeploy.vision.ClassifyResult`, The ClassifyResult will return: -- **label_ids**(list of int):Member variables that represent the classification label results of a single image, the number of which is determined by the topk passed in when using the classification model. For example, you can return the label results of the top 5 categories. +- **label_ids**(list of int):Member variables that represent the classification label results of a single image, the number of which is determined by the `topk ` passed in when using the classification model. For example, you can return the label results of the Top 5 categories. -- **scores**(list of float):Member variables that indicate the confidence level of a single image on the corresponding classification result, the number of which is determined by the topk passed in when using the classification model, e.g. the confidence level of a top 5 classification can be returned. +- **scores**(list of float):Member variables that indicate the confidence level of a single image on the corresponding classification result, the number of which is determined by the `topk ` passed in when using the classification model, e.g. the confidence level of a Top 5 classification can be returned. ## SegmentationResult The code of SegmentationResult is defined in `fastdeploy/vision/common/result.h` and is used to indicate the segmentation category predicted for each pixel in the image and the probability of the segmentation category. API: `fastdeploy.vision.SegmentationResult`, The SegmentationResult will return: -- **label_ids**(list of int):Member variable indicating the segmentation category for each pixel of a single image -- **score_map**(list of float):Member variable, the predicted probability value of the segmentation category corresponding to label_map (specified when exporting the model `--output_op argmax`) or the probability value normalized by softmax (specified when exporting the model `--output_op softmax` or when exporting the model `--output_op none` and set the model class member attribute `apply_softmax=true` when initializing the model) -- **shape**(list of int):Member variable indicating the shape of the output image, as H*W. +- **label_ids**(list of int):Member variable indicating the segmentation category for each pixel of a single image. +- **score_map**(list of float):Member variable, the predicted probability value of the segmentation category corresponding to `label_map ` (specified when exporting the model `--output_op argmax`) or the probability value normalized by softmax (specified when exporting the model `--output_op softmax` or when exporting the model `--output_op none` and set the model class member attribute `apply_softmax=true` when initializing the model). +- **shape**(list of int):Member variable indicating the shape of the output image, as `H*W `. ## DetectionResult @@ -29,16 +29,16 @@ API: `fastdeploy.vision.DetectionResult`, The DetectionResult will return: API: `fastdeploy.vision.Mask `, The Mask will return: - **data**:Member variable indicating a detected mask. -- **shape**:Member variable representing the shape of the mask, e.g. (h,w). +- **shape**:Member variable representing the shape of the mask, e.g. `(H,W) `. ## FaceDetectionResult The FaceDetectionResult code is defined in `fastdeploy/vision/common/result.h` and is used to indicate the target frames detected by face detection, face landmarks, target confidence and the number of landmarks per face. API: `fastdeploy.vision.FaceDetectionResult`, The FaceDetectionResult will return: -- **data**(list of list(float)):Member variables that represent the coordinates of all target boxes detected by a single image. boxes is a list, each element of which is a list of length 4, representing a box with 4 float values in order of xmin, ymin, xmax, ymax, i.e. the coordinates of the top left and bottom right corners -- **scores**(list of float):Member variable indicating the confidence of all targets detected by a single image -- **landmarks**(list of list(float)): Member variables that represent the key points of all faces detected by a single image -- **landmarks_per_face**(int):Member variable indicating the number of key points in each face frame +- **data**(list of list(float)):Member variables that represent the coordinates of all target boxes detected by a single image. boxes is a list, each element of which is a list of length 4, representing a box with 4 float values in order of xmin, ymin, xmax, ymax, i.e. the coordinates of the top left and bottom right corners. +- **scores**(list of float):Member variable indicating the confidence of all targets detected by a single image. +- **landmarks**(list of list(float)): Member variables that represent the key points of all faces detected by a single image. +- **landmarks_per_face**(int):Member variable indicating the number of key points in each face frame. ## FaceRecognitionResult The FaceRecognitionResult code is defined in `fastdeploy/vision/common/result.h` and is used to indicate the embedding of the image features by the face recognition model. @@ -50,17 +50,17 @@ API: `fastdeploy.vision.FaceRecognitionResult`, The FaceRecognitionResult will r The MattingResult code is defined in `fastdeploy/vision/common/result.h` and is used to indicate the value of alpha transparency predicted by the model, the predicted outlook, etc. API:`fastdeploy.vision.MattingResult`, The MattingResult will return: -- **alpha**(list of float):This is a one-dimensional vector of predicted alpha transparency values in the range `[0.,1.]`, with length `h*w`, h,w being the height and width of the input image. -- **foreground**(list of float):This is a one-dimensional vector for the predicted foreground, the value domain is `[0.,255.]`, the length is `h*w*c`, h,w is the height and width of the input image, c is generally 3, foreground is not necessarily there, only if the model itself predicts the foreground, this property will be valid -- **contain_foreground**(bool):Indicates whether the predicted outcome includes the foreground -- **shape**(list of int): When `contain_foreground is false, the shape only contains (h,w), when contain_foreground is true, the shape contains (h,w,c), c is generally 3 +- **alpha**(list of float):This is a one-dimensional vector of predicted alpha transparency values in the range `[0.,1.]`, with length `H*W`, H,W being the height and width of the input image. +- **foreground**(list of float):This is a one-dimensional vector for the predicted foreground, the value domain is `[0.,255.]`, the length is `H*W*C`, H,W is the height and width of the input image, C is generally 3, foreground is not necessarily there, only if the model itself predicts the foreground, this property will be valid. +- **contain_foreground**(bool):Indicates whether the predicted outcome includes the foreground. +- **shape**(list of int): When `contain_foreground` is false, the shape only contains `(H,W)`, when `contain_foreground` is `true,` the shape contains `(H,W,C)`, C is generally 3. ## OCRResult The OCRResult code is defined in `fastdeploy/vision/common/result.h` and is used to indicate the text box detected in the image, the text box orientation classification, and the text content recognized inside the text box. API:`fastdeploy.vision.OCRResult`, The OCRResult will return: - **boxes**(list of list(int)): Member variable, indicates the coordinates of all target boxes detected in a single image, `boxes.size()` indicates the number of boxes detected in a single image, each box is represented by 8 int values in order of the 4 coordinate points of the box, the order is lower left, lower right, upper right, upper left. -- **text**(list of string):Member variable indicating the content of the recognized text in multiple text boxes, with the same number of elements as `boxes.size()` -- **rec_scores**(list of float):Member variable indicating the confidence level of the text identified in the box, the number of elements is the same as `boxes.size()` -- **cls_scores**(list of float):Member variable indicating the confidence level of the classification result of the text box, with the same number of elements as `boxes.size()` -- **cls_labels**(list if int):Member variable indicating the orientation category of the text box, the number of elements is the same as `boxes.size(`) +- **text**(list of string):Member variable indicating the content of the recognized text in multiple text boxes, with the same number of elements as `boxes.size()`. +- **rec_scores**(list of float):Member variable indicating the confidence level of the text identified in the box, the number of elements is the same as `boxes.size()`. +- **cls_scores**(list of float):Member variable indicating the confidence level of the classification result of the text box, with the same number of elements as `boxes.size()`. +- **cls_labels**(list of int):Member variable indicating the orientation category of the text box, the number of elements is the same as `boxes.size()`. From 532f18fda136d4dc8c62920f2b9ea18ed06043a7 Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Mon, 17 Oct 2022 13:04:27 +0000 Subject: [PATCH 20/52] Add Readme for vision results --- docs/api_docs/python/vision_results_cn.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/api_docs/python/vision_results_cn.md b/docs/api_docs/python/vision_results_cn.md index cbae4cd99e..53d4e255d8 100644 --- a/docs/api_docs/python/vision_results_cn.md +++ b/docs/api_docs/python/vision_results_cn.md @@ -55,7 +55,7 @@ API:`fastdeploy.vision.MattingResult`, 该结果返回: - **alpha**(list of float): 是一维向量,为预测的alpha透明度的值,值域为`[0.,1.]`,长度为`H*W`,H,W为输入图像的高和宽. - **foreground**(list of float): 是一维向量,为预测的前景,值域为`[0.,255.]`,长度为`H*W*C`,H,W为输入图像的高和宽,C一般为3,`foreground`不是一定有的,只有模型本身预测了前景,这个属性才会有效. - **contain_foreground**(bool): 表示预测的结果是否包含前景. -- **shape**(list of int): 表示输出结果的shape,当`contain_foreground`为`false`,shape只包含`(H,W)`,当`contain_foreground`为true,shape包含`(H,W,C)`, C一般为3. +- **shape**(list of int): 表示输出结果的shape,当`contain_foreground`为`false`,shape只包含`(H,W)`,当`contain_foreground`为`true`,shape包含`(H,W,C)`, C一般为3. ## OCRResult OCRResult代码定义在`fastdeploy/vision/common/result.h`中,用于表明图像检测和识别出来的文本框,文本框方向分类,以及文本框内的文本内容. From 302743c92c63b27dbaf393ffd9d4a0314516f27c Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Mon, 17 Oct 2022 13:06:15 +0000 Subject: [PATCH 21/52] Add Readme for vision results --- docs/api_docs/python/vision_results_en.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/api_docs/python/vision_results_en.md b/docs/api_docs/python/vision_results_en.md index a1561497a6..3fa559c38b 100644 --- a/docs/api_docs/python/vision_results_en.md +++ b/docs/api_docs/python/vision_results_en.md @@ -4,17 +4,17 @@ The code of ClassifyResult is defined in `fastdeploy/vision/common/result.h` and is used to indicate the classification label result and confidence the image. API: `fastdeploy.vision.ClassifyResult`, The ClassifyResult will return: -- **label_ids**(list of int):Member variables that represent the classification label results of a single image, the number of which is determined by the `topk ` passed in when using the classification model. For example, you can return the label results of the Top 5 categories. +- **label_ids**(list of int):Member variables that represent the classification label results of a single image, the number of which is determined by the `topk` passed in when using the classification model. For example, you can return the label results of the Top 5 categories. -- **scores**(list of float):Member variables that indicate the confidence level of a single image on the corresponding classification result, the number of which is determined by the `topk ` passed in when using the classification model, e.g. the confidence level of a Top 5 classification can be returned. +- **scores**(list of float):Member variables that indicate the confidence level of a single image on the corresponding classification result, the number of which is determined by the `topk` passed in when using the classification model, e.g. the confidence level of a Top 5 classification can be returned. ## SegmentationResult The code of SegmentationResult is defined in `fastdeploy/vision/common/result.h` and is used to indicate the segmentation category predicted for each pixel in the image and the probability of the segmentation category. API: `fastdeploy.vision.SegmentationResult`, The SegmentationResult will return: - **label_ids**(list of int):Member variable indicating the segmentation category for each pixel of a single image. -- **score_map**(list of float):Member variable, the predicted probability value of the segmentation category corresponding to `label_map ` (specified when exporting the model `--output_op argmax`) or the probability value normalized by softmax (specified when exporting the model `--output_op softmax` or when exporting the model `--output_op none` and set the model class member attribute `apply_softmax=true` when initializing the model). -- **shape**(list of int):Member variable indicating the shape of the output image, as `H*W `. +- **score_map**(list of float):Member variable, the predicted probability value of the segmentation category corresponding to `label_map` (specified when exporting the model `--output_op argmax`) or the probability value normalized by softmax (specified when exporting the model `--output_op softmax` or when exporting the model `--output_op none` and set the model class member attribute `apply_softmax=true` when initializing the model). +- **shape**(list of int):Member variable indicating the shape of the output image, as `H*W`. ## DetectionResult @@ -27,9 +27,9 @@ API: `fastdeploy.vision.DetectionResult`, The DetectionResult will return: - **masks**:Member variable that represents all instances of mask detected from a single image, with the same number of elements and shape size as boxes. - **contain_masks**:Member variable indicating whether the detection result contains the instance mask, the result of the instance segmentation model is generally set to True. -API: `fastdeploy.vision.Mask `, The Mask will return: +API: `fastdeploy.vision.Mask`, The Mask will return: - **data**:Member variable indicating a detected mask. -- **shape**:Member variable representing the shape of the mask, e.g. `(H,W) `. +- **shape**:Member variable representing the shape of the mask, e.g. `(H,W)`. ## FaceDetectionResult The FaceDetectionResult code is defined in `fastdeploy/vision/common/result.h` and is used to indicate the target frames detected by face detection, face landmarks, target confidence and the number of landmarks per face. From b9968f62adaed191ea41376086d78628f4ac1ef7 Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Mon, 17 Oct 2022 13:14:25 +0000 Subject: [PATCH 22/52] Add Readme for vision results --- docs/api_docs/python/vision_results_cn.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/api_docs/python/vision_results_cn.md b/docs/api_docs/python/vision_results_cn.md index 53d4e255d8..586464a067 100644 --- a/docs/api_docs/python/vision_results_cn.md +++ b/docs/api_docs/python/vision_results_cn.md @@ -26,7 +26,7 @@ API:`fastdeploy.vision.DetectionResult` , 该结果返回: - **masks**: 成员变量,表示单张图片检测出来的所有实例mask,其元素个数及shape大小与boxes一致. - **contain_masks**: 成员变量,表示检测结果中是否包含实例mask,实例分割模型的结果此项一般为`True`. -fastdeploy.vision.Mask +`fastdeploy.vision.Mask` , 该结果返回: - **data**: 成员变量,表示检测到的一个mask. - **shape**: 成员变量,表示mask的尺寸,如 `H*W`. From 5be415e6ee1d9a8c6a06c227915ef24fffb9e937 Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Mon, 17 Oct 2022 13:16:57 +0000 Subject: [PATCH 23/52] Add Readme for vision results --- docs/api_docs/python/vision_results_en.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/api_docs/python/vision_results_en.md b/docs/api_docs/python/vision_results_en.md index 3fa559c38b..1e97b2e9dc 100644 --- a/docs/api_docs/python/vision_results_en.md +++ b/docs/api_docs/python/vision_results_en.md @@ -25,7 +25,7 @@ API: `fastdeploy.vision.DetectionResult`, The DetectionResult will return: - **socres**(list of float):Member variable indicating the confidence of all targets detected by a single image. - **label_ids**(list of int):Member variable indicating all target categories detected for a single image. - **masks**:Member variable that represents all instances of mask detected from a single image, with the same number of elements and shape size as boxes. -- **contain_masks**:Member variable indicating whether the detection result contains the instance mask, the result of the instance segmentation model is generally set to True. +- **contain_masks**:Member variable indicating whether the detection result contains the instance mask, the result of the instance segmentation model is generally set to `True`. API: `fastdeploy.vision.Mask`, The Mask will return: - **data**:Member variable indicating a detected mask. From c718cf27a3cf22d09af4d22bf1dba4c2e792384e Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Tue, 18 Oct 2022 03:16:51 +0000 Subject: [PATCH 24/52] Add check for label file in postprocess of Rec model --- fastdeploy/vision/ocr/ppocr/recognizer.cc | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fastdeploy/vision/ocr/ppocr/recognizer.cc b/fastdeploy/vision/ocr/ppocr/recognizer.cc index f6ba5294a3..51aa26c097 100644 --- a/fastdeploy/vision/ocr/ppocr/recognizer.cc +++ b/fastdeploy/vision/ocr/ppocr/recognizer.cc @@ -158,11 +158,14 @@ bool Recognizer::Postprocess(FDTensor& infer_result, if (argmax_idx > 0 && (!(n > 0 && argmax_idx == last_index))) { score += max_value; count += 1; + if(argmax_idx > label_list.size()){ + FDERROR << "The output index is larger than the size of label_list. Please check the label file!" << std::endl; + return false; + } str_res += label_list[argmax_idx]; } last_index = argmax_idx; } - score /= count; std::get<0>(*rec_result) = str_res; From efb445fd5ae37630b9b1a3627260388d79979a07 Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Tue, 18 Oct 2022 06:24:43 +0000 Subject: [PATCH 25/52] Add check for label file in postprocess of Rec model --- fastdeploy/vision/ocr/ppocr/recognizer.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fastdeploy/vision/ocr/ppocr/recognizer.cc b/fastdeploy/vision/ocr/ppocr/recognizer.cc index 51aa26c097..deb188967c 100644 --- a/fastdeploy/vision/ocr/ppocr/recognizer.cc +++ b/fastdeploy/vision/ocr/ppocr/recognizer.cc @@ -159,7 +159,7 @@ bool Recognizer::Postprocess(FDTensor& infer_result, score += max_value; count += 1; if(argmax_idx > label_list.size()){ - FDERROR << "The output index is larger than the size of label_list. Please check the label file!" << std::endl; + FDERROR << "The output index:" << argmax_idx << " is larger than the size of label_list: "<< label_list.size() << ". Please check the label file!" << std::endl; return false; } str_res += label_list[argmax_idx]; From e93472f48fdd40285bfc06d15907f8470a2f5abd Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Tue, 18 Oct 2022 06:25:22 +0000 Subject: [PATCH 26/52] Add check for label file in postprocess of Rec model --- fastdeploy/vision/ocr/ppocr/recognizer.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fastdeploy/vision/ocr/ppocr/recognizer.cc b/fastdeploy/vision/ocr/ppocr/recognizer.cc index deb188967c..b7b9a0a6d2 100644 --- a/fastdeploy/vision/ocr/ppocr/recognizer.cc +++ b/fastdeploy/vision/ocr/ppocr/recognizer.cc @@ -159,7 +159,7 @@ bool Recognizer::Postprocess(FDTensor& infer_result, score += max_value; count += 1; if(argmax_idx > label_list.size()){ - FDERROR << "The output index:" << argmax_idx << " is larger than the size of label_list: "<< label_list.size() << ". Please check the label file!" << std::endl; + FDERROR << "The output index: " << argmax_idx << " is larger than the size of label_list: "<< label_list.size() << ". Please check the label file!" << std::endl; return false; } str_res += label_list[argmax_idx]; From 2bebd5bdb18063cbd902e311072797a17757e722 Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Tue, 18 Oct 2022 06:26:15 +0000 Subject: [PATCH 27/52] Add check for label file in postprocess of Rec model --- fastdeploy/vision/ocr/ppocr/recognizer.cc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fastdeploy/vision/ocr/ppocr/recognizer.cc b/fastdeploy/vision/ocr/ppocr/recognizer.cc index b7b9a0a6d2..9404e83630 100644 --- a/fastdeploy/vision/ocr/ppocr/recognizer.cc +++ b/fastdeploy/vision/ocr/ppocr/recognizer.cc @@ -159,7 +159,8 @@ bool Recognizer::Postprocess(FDTensor& infer_result, score += max_value; count += 1; if(argmax_idx > label_list.size()){ - FDERROR << "The output index: " << argmax_idx << " is larger than the size of label_list: "<< label_list.size() << ". Please check the label file!" << std::endl; + FDERROR << "The output index: " << argmax_idx << " is larger than the size of label_list: " + << label_list.size() << ". Please check the label file!" << std::endl; return false; } str_res += label_list[argmax_idx]; From 75f34a529566fe06b6e86de1ee8d33e3c678cf09 Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Tue, 18 Oct 2022 06:26:27 +0000 Subject: [PATCH 28/52] Add check for label file in postprocess of Rec model --- fastdeploy/vision/ocr/ppocr/recognizer.cc | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fastdeploy/vision/ocr/ppocr/recognizer.cc b/fastdeploy/vision/ocr/ppocr/recognizer.cc index 9404e83630..38d6d21352 100644 --- a/fastdeploy/vision/ocr/ppocr/recognizer.cc +++ b/fastdeploy/vision/ocr/ppocr/recognizer.cc @@ -205,3 +205,7 @@ bool Recognizer::Predict(cv::Mat* img, } // namesapce ocr } // namespace vision } // namespace fastdeploy + + + + From 74ff599e181f2755ca467ec0b2ec1a57bc440611 Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Tue, 18 Oct 2022 06:27:13 +0000 Subject: [PATCH 29/52] Add check for label file in postprocess of Rec model --- fastdeploy/vision/ocr/ppocr/recognizer.cc | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/fastdeploy/vision/ocr/ppocr/recognizer.cc b/fastdeploy/vision/ocr/ppocr/recognizer.cc index 38d6d21352..fdea6ed9cb 100644 --- a/fastdeploy/vision/ocr/ppocr/recognizer.cc +++ b/fastdeploy/vision/ocr/ppocr/recognizer.cc @@ -204,8 +204,4 @@ bool Recognizer::Predict(cv::Mat* img, } // namesapce ocr } // namespace vision -} // namespace fastdeploy - - - - +} // namespace fastdeploy \ No newline at end of file From d04eaf982b8b71ceb510c8eca5657467c5f6456d Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Tue, 18 Oct 2022 08:56:32 +0000 Subject: [PATCH 30/52] Add comments to create API docs --- docs/api_docs/python/ocr.md | 40 +++++++++++- docs/api_docs/python/requirements.txt | 1 + fastdeploy/vision/ocr/ppocr/classifier.h | 28 +++++++-- fastdeploy/vision/ocr/ppocr/dbdetector.h | 30 ++++++--- fastdeploy/vision/ocr/ppocr/ppocr_system_v2.h | 27 +++++++- fastdeploy/vision/ocr/ppocr/ppocr_system_v3.h | 25 ++++++-- fastdeploy/vision/ocr/ppocr/recognizer.h | 30 +++++++-- .../fastdeploy/vision/ocr/ppocr/__init__.py | 63 +++++++++++++++---- 8 files changed, 205 insertions(+), 39 deletions(-) diff --git a/docs/api_docs/python/ocr.md b/docs/api_docs/python/ocr.md index 4174694af2..552eabdc5f 100644 --- a/docs/api_docs/python/ocr.md +++ b/docs/api_docs/python/ocr.md @@ -1,3 +1,41 @@ # OCR API -comming soon... +## fastdeploy.vision.ocr.DBDetector + +```{eval-rst} +.. autoclass:: fastdeploy.vision.ocr.DBDetector + :members: + :inherited-members: +``` + +## fastdeploy.vision.ocr.Classifier + +```{eval-rst} +.. autoclass:: fastdeploy.vision.ocr.Classifier + :members: + :inherited-members: +``` + +## fastdeploy.vision.ocr.Recognizer + +```{eval-rst} +.. autoclass:: fastdeploy.vision.ocr.Recognizer + :members: + :inherited-members: +``` + +## fastdeploy.vision.ocr.PPOCRSystemv2 + +```{eval-rst} +.. autoclass:: fastdeploy.vision.ocr.PPOCRSystemv2 + :members: + :inherited-members: +``` + +## fastdeploy.vision.ocr.PPOCRSystemv3 + +```{eval-rst} +.. autoclass:: fastdeploy.vision.ocr.PPOCRSystemv3 + :members: + :inherited-members: +``` diff --git a/docs/api_docs/python/requirements.txt b/docs/api_docs/python/requirements.txt index 73b4a140f6..4f8fa23fed 100644 --- a/docs/api_docs/python/requirements.txt +++ b/docs/api_docs/python/requirements.txt @@ -3,3 +3,4 @@ recommonmark sphinx_markdown_tables sphinx_rtd_theme furo +myst_parser diff --git a/fastdeploy/vision/ocr/ppocr/classifier.h b/fastdeploy/vision/ocr/ppocr/classifier.h index 110ef7f370..f810f98a37 100644 --- a/fastdeploy/vision/ocr/ppocr/classifier.h +++ b/fastdeploy/vision/ocr/ppocr/classifier.h @@ -20,20 +20,36 @@ namespace fastdeploy { namespace vision { +/** \brief All OCR series model APIs are defined inside this namespace + * + */ namespace ocr { - +/*! @brief Classifier object is used to load the classification model provided by PaddleOCR. + */ class FASTDEPLOY_DECL Classifier : public FastDeployModel { public: Classifier(); + /** \brief Set path of model file, and the configuration of runtime + * + * \param[in] model_file Path of model file, e.g ./ch_ppocr_mobile_v2.0_cls_infer/model.pdmodel. + * \param[in] params_file Path of parameter file, e.g ./ch_ppocr_mobile_v2.0_cls_infer/model.pdiparams, if the model format is ONNX, this parameter will be ignored. + * \param[in] custom_option RuntimeOption for inference, the default will use cpu, and choose the backend defined in `valid_cpu_backends`. + * \param[in] model_format Model format of the loaded model, default is Paddle format. + */ Classifier(const std::string& model_file, const std::string& params_file = "", const RuntimeOption& custom_option = RuntimeOption(), const ModelFormat& model_format = ModelFormat::PADDLE); - + /// Get model's name std::string ModelName() const { return "ppocr/ocr_cls"; } - + /** \brief Predict the input image and get OCR classification model result. + * + * \param[in] im The input image data, comes from cv::imread(). + * \param[in] result The output of OCR classification model result will be writen to this structure. + * \return true if the prediction is successed, otherwise false. + */ virtual bool Predict(cv::Mat* img, std::tuple* result); - // pre & post parameters + // Pre & Post parameters float cls_thresh; std::vector cls_image_shape; int cls_batch_num; @@ -44,9 +60,9 @@ class FASTDEPLOY_DECL Classifier : public FastDeployModel { private: bool Initialize(); - + /// Preprocess the input data, and set the preprocessed results to `outputs` bool Preprocess(Mat* img, FDTensor* output); - + /// Postprocess the inferenced results, and set the final result to `result` bool Postprocess(FDTensor& infer_result, std::tuple* result); }; diff --git a/fastdeploy/vision/ocr/ppocr/dbdetector.h b/fastdeploy/vision/ocr/ppocr/dbdetector.h index ad80c13296..53bf3aceec 100644 --- a/fastdeploy/vision/ocr/ppocr/dbdetector.h +++ b/fastdeploy/vision/ocr/ppocr/dbdetector.h @@ -20,22 +20,38 @@ namespace fastdeploy { namespace vision { +/** \brief All OCR series model APIs are defined inside this namespace + * + */ namespace ocr { +/*! @brief DBDetector object is used to load the detection model provided by PaddleOCR. + */ class FASTDEPLOY_DECL DBDetector : public FastDeployModel { public: DBDetector(); - + /** \brief Set path of model file, and the configuration of runtime + * + * \param[in] model_file Path of model file, e.g ./ch_PP-OCRv3_det_infer/model.pdmodel. + * \param[in] params_file Path of parameter file, e.g ./ch_PP-OCRv3_det_infer/model.pdiparams, if the model format is ONNX, this parameter will be ignored. + * \param[in] custom_option RuntimeOption for inference, the default will use cpu, and choose the backend defined in `valid_cpu_backends`. + * \param[in] model_format Model format of the loaded model, default is Paddle format. + */ DBDetector(const std::string& model_file, const std::string& params_file = "", const RuntimeOption& custom_option = RuntimeOption(), const ModelFormat& model_format = ModelFormat::PADDLE); - + /// Get model's name std::string ModelName() const { return "ppocr/ocr_det"; } - + /** \brief Predict the input image and get OCR detection model result. + * + * \param[in] im The input image data, comes from cv::imread(). + * \param[in] boxes_result The output of OCR detection model result will be writen to this structure. + * \return true if the prediction is successed, otherwise false. + */ virtual bool Predict(cv::Mat* im, std::vector>* boxes_result); - // pre&post process parameters + // Pre & Post process parameters int max_side_len; float ratio_h{}; @@ -53,14 +69,14 @@ class FASTDEPLOY_DECL DBDetector : public FastDeployModel { private: bool Initialize(); - + /// Preprocess the input data, and set the preprocessed results to `outputs` bool Preprocess(Mat* mat, FDTensor* outputs, std::map>* im_info); - + /*! @brief Postprocess the inferenced results, and set the final result to `boxes_result` + */ bool Postprocess(FDTensor& infer_result, std::vector>* boxes_result, const std::map>& im_info); - PostProcessor post_processor_; }; diff --git a/fastdeploy/vision/ocr/ppocr/ppocr_system_v2.h b/fastdeploy/vision/ocr/ppocr/ppocr_system_v2.h index f2a8ccbed8..1b70adb5fd 100644 --- a/fastdeploy/vision/ocr/ppocr/ppocr_system_v2.h +++ b/fastdeploy/vision/ocr/ppocr/ppocr_system_v2.h @@ -27,17 +27,38 @@ namespace fastdeploy { namespace application { +/** \brief OCR system can launch detection model, classification model and recognition model sequentially. All OCR system APIs are defined inside this namespace. + * + */ namespace ocrsystem { - +/*! @brief PPOCRSystemv2 is used to load PP-OCRv2 series models provided by PaddleOCR. + */ class FASTDEPLOY_DECL PPOCRSystemv2 : public FastDeployModel { public: + /** \brief Set up the detection model path, classification model path and recognition model path respectively. + * + * \param[in] det_model Path of detection model, e.g ./ch_PP-OCRv2_det_infer + * \param[in] cls_model Path of classification model, e.g ./ch_ppocr_mobile_v2.0_cls_infer + * \param[in] rec_model Path of recognition model, e.g ./ch_PP-OCRv2_rec_infer + */ PPOCRSystemv2(fastdeploy::vision::ocr::DBDetector* det_model, fastdeploy::vision::ocr::Classifier* cls_model, fastdeploy::vision::ocr::Recognizer* rec_model); + /** \brief Classification model is optional, so this function is set up the detection model path and recognition model path respectively. + * + * \param[in] det_model Path of detection model, e.g ./ch_PP-OCRv2_det_infer + * \param[in] rec_model Path of recognition model, e.g ./ch_PP-OCRv2_rec_infer + */ PPOCRSystemv2(fastdeploy::vision::ocr::DBDetector* det_model, fastdeploy::vision::ocr::Recognizer* rec_model); + /** \brief Predict the input image and get OCR result. + * + * \param[in] im The input image data, comes from cv::imread(). + * \param[in] result The output OCR result will be writen to this structure. + * \return true if the prediction successed, otherwise false. + */ virtual bool Predict(cv::Mat* img, fastdeploy::vision::OCRResult* result); bool Initialized() const override; @@ -45,9 +66,11 @@ class FASTDEPLOY_DECL PPOCRSystemv2 : public FastDeployModel { fastdeploy::vision::ocr::DBDetector* detector_ = nullptr; fastdeploy::vision::ocr::Classifier* classifier_ = nullptr; fastdeploy::vision::ocr::Recognizer* recognizer_ = nullptr; - + /// Luanch the detection process in OCR. virtual bool Detect(cv::Mat* img, fastdeploy::vision::OCRResult* result); + /// Luanch the recognition process in OCR. virtual bool Recognize(cv::Mat* img, fastdeploy::vision::OCRResult* result); + /// Luanch the classification process in OCR. virtual bool Classify(cv::Mat* img, fastdeploy::vision::OCRResult* result); }; diff --git a/fastdeploy/vision/ocr/ppocr/ppocr_system_v3.h b/fastdeploy/vision/ocr/ppocr/ppocr_system_v3.h index d9e2d4584a..c88a0aff20 100644 --- a/fastdeploy/vision/ocr/ppocr/ppocr_system_v3.h +++ b/fastdeploy/vision/ocr/ppocr/ppocr_system_v3.h @@ -18,19 +18,36 @@ namespace fastdeploy { namespace application { +/** \brief OCR system can launch detection model, classification model and recognition model sequentially. All OCR system APIs are defined inside this namespace. + * + */ namespace ocrsystem { - +/*! @brief PPOCRSystemv3 is used to load PP-OCRv3 series models provided by PaddleOCR. + */ class FASTDEPLOY_DECL PPOCRSystemv3 : public PPOCRSystemv2 { public: + /** \brief Set up the detection model path, classification model path and recognition model path respectively. + * + * \param[in] det_model Path of detection model, e.g ./ch_PP-OCRv3_det_infer + * \param[in] cls_model Path of classification model, e.g ./ch_ppocr_mobile_v2.0_cls_infer + * \param[in] rec_model Path of recognition model, e.g ./ch_PP-OCRv3_rec_infer + */ PPOCRSystemv3(fastdeploy::vision::ocr::DBDetector* det_model, fastdeploy::vision::ocr::Classifier* cls_model, - fastdeploy::vision::ocr::Recognizer* rec_model) : PPOCRSystemv2(det_model, cls_model, rec_model) { + fastdeploy::vision::ocr::Recognizer* rec_model) + : PPOCRSystemv2(det_model, cls_model, rec_model) { // The only difference between v2 and v3 recognizer_->rec_image_shape[1] = 48; } - + /** \brief Classification model is optional, so this function is set up the detection model path and recognition model path respectively. + * + * \param[in] det_model Path of detection model, e.g ./ch_PP-OCRv3_det_infer + * \param[in] rec_model Path of recognition model, e.g ./ch_PP-OCRv3_rec_infer + */ PPOCRSystemv3(fastdeploy::vision::ocr::DBDetector* det_model, - fastdeploy::vision::ocr::Recognizer* rec_model) : PPOCRSystemv2(det_model, rec_model) { + fastdeploy::vision::ocr::Recognizer* rec_model) + : PPOCRSystemv2(det_model, rec_model) { + // The only difference between v2 and v3 recognizer_->rec_image_shape[1] = 48; } }; diff --git a/fastdeploy/vision/ocr/ppocr/recognizer.h b/fastdeploy/vision/ocr/ppocr/recognizer.h index ebe99d1e86..3ab6731ba4 100644 --- a/fastdeploy/vision/ocr/ppocr/recognizer.h +++ b/fastdeploy/vision/ocr/ppocr/recognizer.h @@ -20,22 +20,39 @@ namespace fastdeploy { namespace vision { +/** \brief All OCR series model APIs are defined inside this namespace + * + */ namespace ocr { - +/*! @brief Recognizer object is used to load the recognition model provided by PaddleOCR. + */ class FASTDEPLOY_DECL Recognizer : public FastDeployModel { public: Recognizer(); + /** \brief Set path of model file, and the configuration of runtime + * + * \param[in] model_file Path of model file, e.g ./ch_PP-OCRv3_rec_infer/model.pdmodel. + * \param[in] params_file Path of parameter file, e.g ./ch_PP-OCRv3_rec_infer/model.pdiparams, if the model format is ONNX, this parameter will be ignored. + * \param[in] label_path Path of label file used by OCR recognition model. e.g ./ppocr_keys_v1.txt + * \param[in] custom_option RuntimeOption for inference, the default will use cpu, and choose the backend defined in `valid_cpu_backends`. + * \param[in] model_format Model format of the loaded model, default is Paddle format. + */ Recognizer(const std::string& model_file, const std::string& params_file = "", const std::string& label_path = "", const RuntimeOption& custom_option = RuntimeOption(), const ModelFormat& model_format = ModelFormat::PADDLE); - + /// Get model's name std::string ModelName() const { return "ppocr/ocr_rec"; } - + /** \brief Predict the input image and get OCR recognition model result. + * + * \param[in] im The input image data, comes from cv::imread(). + * \param[in] rec_result The output of OCR recognition model result will be writen to this structure. + * \return true if the prediction is successed, otherwise false. + */ virtual bool Predict(cv::Mat* img, std::tuple* rec_result); - // pre & post parameters + // Pre & Post parameters std::vector label_list; int rec_batch_num; int rec_img_h; @@ -48,10 +65,11 @@ class FASTDEPLOY_DECL Recognizer : public FastDeployModel { private: bool Initialize(); - + /// Preprocess the input data, and set the preprocessed results to `outputs` bool Preprocess(Mat* img, FDTensor* outputs, const std::vector& rec_image_shape); - + /*! @brief Postprocess the inferenced results, and set the final result to `rec_result` + */ bool Postprocess(FDTensor& infer_result, std::tuple* rec_result); }; diff --git a/python/fastdeploy/vision/ocr/ppocr/__init__.py b/python/fastdeploy/vision/ocr/ppocr/__init__.py index 53888ba040..54e1f77ea6 100644 --- a/python/fastdeploy/vision/ocr/ppocr/__init__.py +++ b/python/fastdeploy/vision/ocr/ppocr/__init__.py @@ -24,8 +24,13 @@ def __init__(self, params_file="", runtime_option=None, model_format=ModelFormat.PADDLE): - # 调用基函数进行backend_option的初始化 - # 初始化后的option保存在self._runtime_option + """Load OCR detection model provided by PaddleOCR. + + :param model_file: (str)Path of model file, e.g ./ch_PP-OCRv3_det_infer/model.pdmodel. + :param params_file: (str)Path of parameter file, e.g ./ch_PP-OCRv3_det_infer/model.pdiparams, if the model format is ONNX, this parameter will be ignored. + :param runtime_option: (fastdeploy.RuntimeOption)RuntimeOption for inference this model, if it's None, will use the default backend on CPU. + :param model_format: (fastdeploy.ModelForamt)Model format of the loaded model. + """ super(DBDetector, self).__init__(runtime_option) if (len(model_file) == 0): @@ -33,7 +38,6 @@ def __init__(self, else: self._model = C.vision.ocr.DBDetector( model_file, params_file, self._runtime_option, model_format) - # 通过self.initialized判断整个模型的初始化是否成功 assert self.initialized, "DBDetector initialize failed." # 一些跟DBDetector模型有关的属性封装 @@ -81,8 +85,8 @@ def det_db_thresh(self, value): @det_db_box_thresh.setter def det_db_box_thresh(self, value): assert isinstance( - value, - float), "The value to set `det_db_box_thresh` must be type of float." + value, float + ), "The value to set `det_db_box_thresh` must be type of float." self._model.det_db_box_thresh = value @det_db_unclip_ratio.setter @@ -119,8 +123,13 @@ def __init__(self, params_file="", runtime_option=None, model_format=ModelFormat.PADDLE): - # 调用基函数进行backend_option的初始化 - # 初始化后的option保存在self._runtime_option + """Load OCR classification model provided by PaddleOCR. + + :param model_file: (str)Path of model file, e.g ./ch_ppocr_mobile_v2.0_cls_infer/model.pdmodel. + :param params_file: (str)Path of parameter file, e.g ./ch_ppocr_mobile_v2.0_cls_infer/model.pdiparams, if the model format is ONNX, this parameter will be ignored. + :param runtime_option: (fastdeploy.RuntimeOption)RuntimeOption for inference this model, if it's None, will use the default backend on CPU. + :param model_format: (fastdeploy.ModelForamt)Model format of the loaded model. + """ super(Classifier, self).__init__(runtime_option) if (len(model_file) == 0): @@ -128,7 +137,6 @@ def __init__(self, else: self._model = C.vision.ocr.Classifier( model_file, params_file, self._runtime_option, model_format) - # 通过self.initialized判断整个模型的初始化是否成功 assert self.initialized, "Classifier initialize failed." @property @@ -159,7 +167,8 @@ def cls_image_shape(self, value): @cls_batch_num.setter def cls_batch_num(self, value): assert isinstance( - value, int), "The value to set `cls_batch_num` must be type of int." + value, + int), "The value to set `cls_batch_num` must be type of int." self._model.cls_batch_num = value @@ -170,8 +179,14 @@ def __init__(self, label_path="", runtime_option=None, model_format=ModelFormat.PADDLE): - # 调用基函数进行backend_option的初始化 - # 初始化后的option保存在self._runtime_option + """Load OCR recognition model provided by PaddleOCR + + :param model_file: (str)Path of model file, e.g ./ch_PP-OCRv3_rec_infer/model.pdmodel. + :param params_file: (str)Path of parameter file, e.g ./ch_PP-OCRv3_rec_infer/model.pdiparams, if the model format is ONNX, this parameter will be ignored. + :param label_path: (str)Path of label file used by OCR recognition model. e.g ./ppocr_keys_v1.txt + :param runtime_option: (fastdeploy.RuntimeOption)RuntimeOption for inference this model, if it's None, will use the default backend on CPU. + :param model_format: (fastdeploy.ModelForamt)Model format of the loaded model. + """ super(Recognizer, self).__init__(runtime_option) if (len(model_file) == 0): @@ -180,7 +195,6 @@ def __init__(self, self._model = C.vision.ocr.Recognizer( model_file, params_file, label_path, self._runtime_option, model_format) - # 通过self.initialized判断整个模型的初始化是否成功 assert self.initialized, "Recognizer initialize failed." @property @@ -210,12 +224,19 @@ def rec_img_w(self, value): @rec_batch_num.setter def rec_batch_num(self, value): assert isinstance( - value, int), "The value to set `rec_batch_num` must be type of int." + value, + int), "The value to set `rec_batch_num` must be type of int." self._model.rec_batch_num = value class PPOCRSystemv3(FastDeployModel): def __init__(self, det_model=None, cls_model=None, rec_model=None): + """Load detetion, classification and recognition models to construct PP-OCRv3 + + :param det_model: (FastDeployModel) The detection model object created by fastdeploy.vision.ocr.DBDetector. + :param cls_model: (FastDeployModel) The classification model object created by fastdeploy.vision.ocr.Classifier. + :param rec_model: (FastDeployModel) The recognition model object created by fastdeploy.vision.ocr.Recognizer. + """ assert det_model is not None and rec_model is not None, "The det_model and rec_model cannot be None." if cls_model is None: self.system = C.vision.ocr.PPOCRSystemv3(det_model._model, @@ -225,11 +246,22 @@ def __init__(self, det_model=None, cls_model=None, rec_model=None): det_model._model, cls_model._model, rec_model._model) def predict(self, input_image): + """Predict an input image + + :param input_image: (numpy.ndarray)The input image data, 3-D array with layout HWC, BGR format + :return: OCRResult + """ return self.system.predict(input_image) class PPOCRSystemv2(FastDeployModel): def __init__(self, det_model=None, cls_model=None, rec_model=None): + """Load detetion, classification and recognition models to construct PP-OCRv2. + + :param det_model: (FastDeployModel) The detection model object created by fastdeploy.vision.ocr.DBDetector. + :param cls_model: (FastDeployModel) The classification model object created by fastdeploy.vision.ocr.Classifier. + :param rec_model: (FastDeployModel) The recognition model object created by fastdeploy.vision.ocr.Recognizer. + """ assert det_model is not None and rec_model is not None, "The det_model and rec_model cannot be None." if cls_model is None: self.system = C.vision.ocr.PPOCRSystemv2(det_model._model, @@ -239,4 +271,9 @@ def __init__(self, det_model=None, cls_model=None, rec_model=None): det_model._model, cls_model._model, rec_model._model) def predict(self, input_image): + """Predict an input image + + :param input_image: (numpy.ndarray)The input image data, 3-D array with layout HWC, BGR format + :return: OCRResult + """ return self.system.predict(input_image) From 72837577b22d53461a69b2b5fcc65b9f6bade074 Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Tue, 18 Oct 2022 11:23:02 +0000 Subject: [PATCH 31/52] Improve OCR comments --- fastdeploy/vision/ocr/ppocr/ppocr_system_v2.h | 6 +++--- python/fastdeploy/vision/ocr/ppocr/__init__.py | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/fastdeploy/vision/ocr/ppocr/ppocr_system_v2.h b/fastdeploy/vision/ocr/ppocr/ppocr_system_v2.h index 1b70adb5fd..04bf26b9f6 100644 --- a/fastdeploy/vision/ocr/ppocr/ppocr_system_v2.h +++ b/fastdeploy/vision/ocr/ppocr/ppocr_system_v2.h @@ -66,11 +66,11 @@ class FASTDEPLOY_DECL PPOCRSystemv2 : public FastDeployModel { fastdeploy::vision::ocr::DBDetector* detector_ = nullptr; fastdeploy::vision::ocr::Classifier* classifier_ = nullptr; fastdeploy::vision::ocr::Recognizer* recognizer_ = nullptr; - /// Luanch the detection process in OCR. + /// Launch the detection process in OCR. virtual bool Detect(cv::Mat* img, fastdeploy::vision::OCRResult* result); - /// Luanch the recognition process in OCR. + /// Launch the recognition process in OCR. virtual bool Recognize(cv::Mat* img, fastdeploy::vision::OCRResult* result); - /// Luanch the classification process in OCR. + /// Launch the classification process in OCR. virtual bool Classify(cv::Mat* img, fastdeploy::vision::OCRResult* result); }; diff --git a/python/fastdeploy/vision/ocr/ppocr/__init__.py b/python/fastdeploy/vision/ocr/ppocr/__init__.py index 54e1f77ea6..412332e3a6 100644 --- a/python/fastdeploy/vision/ocr/ppocr/__init__.py +++ b/python/fastdeploy/vision/ocr/ppocr/__init__.py @@ -231,7 +231,7 @@ def rec_batch_num(self, value): class PPOCRSystemv3(FastDeployModel): def __init__(self, det_model=None, cls_model=None, rec_model=None): - """Load detetion, classification and recognition models to construct PP-OCRv3 + """Consruct a pipeline with text detector, direction classifier and text recognizer models :param det_model: (FastDeployModel) The detection model object created by fastdeploy.vision.ocr.DBDetector. :param cls_model: (FastDeployModel) The classification model object created by fastdeploy.vision.ocr.Classifier. @@ -256,7 +256,7 @@ def predict(self, input_image): class PPOCRSystemv2(FastDeployModel): def __init__(self, det_model=None, cls_model=None, rec_model=None): - """Load detetion, classification and recognition models to construct PP-OCRv2. + """Consruct a pipeline with text detector, direction classifier and text recognizer models :param det_model: (FastDeployModel) The detection model object created by fastdeploy.vision.ocr.DBDetector. :param cls_model: (FastDeployModel) The classification model object created by fastdeploy.vision.ocr.Classifier. From 35e2edc6712188bd2d665f2d650bb2c7374c503c Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Tue, 18 Oct 2022 14:09:34 +0000 Subject: [PATCH 32/52] Rename OCR and add comments --- docs/api_docs/python/ocr.md | 8 +++--- examples/vision/ocr/PP-OCRv2/cpp/README.md | 28 +++++++++---------- examples/vision/ocr/PP-OCRv2/cpp/infer.cc | 14 +++++----- examples/vision/ocr/PP-OCRv2/python/README.md | 24 ++++++++-------- examples/vision/ocr/PP-OCRv2/python/infer.py | 4 +-- examples/vision/ocr/PP-OCRv3/cpp/README.md | 20 ++++++------- examples/vision/ocr/PP-OCRv3/cpp/infer.cc | 14 +++++----- examples/vision/ocr/PP-OCRv3/python/README.md | 16 +++++------ examples/vision/ocr/PP-OCRv3/python/infer.py | 4 +-- fastdeploy/vision.h | 4 +-- fastdeploy/vision/ocr/ocr_pybind.cc | 8 +++--- .../{ocrsys_pybind.cc => ppocr_pybind.cc} | 20 ++++++------- .../ppocr/{ppocr_system_v2.cc => ppocr_v2.cc} | 28 +++++++++---------- .../ppocr/{ppocr_system_v2.h => ppocr_v2.h} | 18 +++++++----- .../ppocr/{ppocr_system_v3.h => ppocr_v3.h} | 25 ++++++++++------- fastdeploy/vision/ocr/ppocr/recognizer.cc | 3 -- python/fastdeploy/vision/ocr/__init__.py | 4 +-- .../fastdeploy/vision/ocr/ppocr/__init__.py | 16 +++++------ 18 files changed, 131 insertions(+), 127 deletions(-) rename fastdeploy/vision/ocr/ppocr/{ocrsys_pybind.cc => ppocr_pybind.cc} (79%) rename fastdeploy/vision/ocr/ppocr/{ppocr_system_v2.cc => ppocr_v2.cc} (81%) rename fastdeploy/vision/ocr/ppocr/{ppocr_system_v2.h => ppocr_v2.h} (85%) rename fastdeploy/vision/ocr/ppocr/{ppocr_system_v3.h => ppocr_v3.h} (73%) diff --git a/docs/api_docs/python/ocr.md b/docs/api_docs/python/ocr.md index 552eabdc5f..6229182037 100644 --- a/docs/api_docs/python/ocr.md +++ b/docs/api_docs/python/ocr.md @@ -24,18 +24,18 @@ :inherited-members: ``` -## fastdeploy.vision.ocr.PPOCRSystemv2 +## fastdeploy.vision.ocr.PPOCRv2 ```{eval-rst} -.. autoclass:: fastdeploy.vision.ocr.PPOCRSystemv2 +.. autoclass:: fastdeploy.vision.ocr.PPOCRv2 :members: :inherited-members: ``` -## fastdeploy.vision.ocr.PPOCRSystemv3 +## fastdeploy.vision.ocr.PPOCRv3 ```{eval-rst} -.. autoclass:: fastdeploy.vision.ocr.PPOCRSystemv3 +.. autoclass:: fastdeploy.vision.ocr.PPOCRv3 :members: :inherited-members: ``` diff --git a/examples/vision/ocr/PP-OCRv2/cpp/README.md b/examples/vision/ocr/PP-OCRv2/cpp/README.md index f612d66014..6547872564 100644 --- a/examples/vision/ocr/PP-OCRv2/cpp/README.md +++ b/examples/vision/ocr/PP-OCRv2/cpp/README.md @@ -1,6 +1,6 @@ -# PPOCRSystemv2 C++部署示例 +# PPOCRv2 C++部署示例 -本目录下提供`infer.cc`快速完成PPOCRSystemv2在CPU/GPU,以及GPU上通过TensorRT加速部署的示例。 +本目录下提供`infer.cc`快速完成PPOCRv2在CPU/GPU,以及GPU上通过TensorRT加速部署的示例。 在部署前,需确认以下两个步骤 @@ -19,14 +19,14 @@ make -j # 下载模型,图片和字典文件 -wget https://bj.bcebos.com/paddlehub/fastdeploy/ch_PP-OCRv2_det_infer.tar.gz -tar -xvf ch_PP-OCRv2_det_infer.tar.gz +wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar +tar -xvf ch_PP-OCRv2_det_infer.tar -wget https://bj.bcebos.com/paddlehub/fastdeploy/ch_ppocr_mobile_v2.0_cls_infer.tar.gz -tar -xvf ch_ppocr_mobile_v2.0_cls_infer.tar.gz +https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar +tar -xvf ch_ppocr_mobile_v2.0_cls_infer.tar -wget https://bj.bcebos.com/paddlehub/fastdeploy/ch_PP-OCRv2_rec_infer.tar.gz -tar -xvf ch_PP-OCRv2_rec_infer.tar.gz +wgethttps://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar +tar -xvf ch_PP-OCRv2_rec_infer.tar wget https://gitee.com/paddlepaddle/PaddleOCR/raw/release/2.6/doc/imgs/12.jpg @@ -48,17 +48,17 @@ wget https://gitee.com/paddlepaddle/PaddleOCR/raw/release/2.6/ppocr/utils/ppocr_ -## PPOCRSystemv2 C++接口 +## PPOCRv2 C++接口 -### PPOCRSystemv2类 +### PPOCRv2类 ``` -fastdeploy::application::ocrsystem::PPOCRSystemv2(fastdeploy::vision::ocr::DBDetector* det_model, +fastdeploy::pipeline::PPOCRv2(fastdeploy::vision::ocr::DBDetector* det_model, fastdeploy::vision::ocr::Classifier* cls_model, fastdeploy::vision::ocr::Recognizer* rec_model); ``` -PPOCRSystemv2 的初始化,由检测,分类和识别模型串联构成 +PPOCRv2 的初始化,由检测,分类和识别模型串联构成 **参数** @@ -67,10 +67,10 @@ PPOCRSystemv2 的初始化,由检测,分类和识别模型串联构成 > * **Recognizer**(model): OCR中的识别模型 ``` -fastdeploy::application::ocrsystem::PPOCRSystemv2(fastdeploy::vision::ocr::DBDetector* det_model, +fastdeploy::pipeline::PPOCRv2(fastdeploy::vision::ocr::DBDetector* det_model, fastdeploy::vision::ocr::Recognizer* rec_model); ``` -PPOCRSystemv2 的初始化,由检测,识别模型串联构成(无分类器) +PPOCRv2 的初始化,由检测,识别模型串联构成(无分类器) **参数** diff --git a/examples/vision/ocr/PP-OCRv2/cpp/infer.cc b/examples/vision/ocr/PP-OCRv2/cpp/infer.cc index bf0ff5f27e..9d628689b5 100644 --- a/examples/vision/ocr/PP-OCRv2/cpp/infer.cc +++ b/examples/vision/ocr/PP-OCRv2/cpp/infer.cc @@ -37,12 +37,12 @@ void InitAndInfer(const std::string& det_model_dir, const std::string& cls_model assert(cls_model.Initialized()); assert(rec_model.Initialized()); - // The classification model is optional, so the OCR system can also be connected in series as follows - // auto ocr_system_v2 = fastdeploy::application::ocrsystem::PPOCRSystemv2(&det_model, &rec_model); - auto ocr_system_v2 = fastdeploy::application::ocrsystem::PPOCRSystemv2(&det_model, &cls_model, &rec_model); + // The classification model is optional, so the PP-OCR can also be connected in series as follows + // auto ppocr_v2 = fastdeploy::pipeline::PPOCRv2(&det_model, &rec_model); + auto ppocr_v2 = fastdeploy::pipeline::PPOCRv2(&det_model, &cls_model, &rec_model); - if(!ocr_system_v2.Initialized()){ - std::cerr << "Failed to initialize OCR system." << std::endl; + if(!ppocr_v2.Initialized()){ + std::cerr << "Failed to initialize PP-OCR." << std::endl; return; } @@ -50,14 +50,14 @@ void InitAndInfer(const std::string& det_model_dir, const std::string& cls_model auto im_bak = im.clone(); fastdeploy::vision::OCRResult result; - if (!ocr_system_v2.Predict(&im, &result)) { + if (!ppocr_v2.Predict(&im, &result)) { std::cerr << "Failed to predict." << std::endl; return; } std::cout << result.Str() << std::endl; - auto vis_im = fastdeploy::vision::Visualize::VisOcr(im_bak, result); + auto vis_im = fastdeploy::vision::VisOcr(im_bak, result); cv::imwrite("vis_result.jpg", vis_im); std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl; } diff --git a/examples/vision/ocr/PP-OCRv2/python/README.md b/examples/vision/ocr/PP-OCRv2/python/README.md index ee845f2ca9..c51f8781fd 100644 --- a/examples/vision/ocr/PP-OCRv2/python/README.md +++ b/examples/vision/ocr/PP-OCRv2/python/README.md @@ -1,23 +1,23 @@ -# PPOCRSystemv2 Python部署示例 +# PPOCRv2 Python部署示例 在部署前,需确认以下两个步骤 - 1. 软硬件环境满足要求,参考[FastDeploy环境要求](../../../../../docs/cn/build_and_install/download_prebuilt_libraries.md) - 2. FastDeploy Python whl包安装,参考[FastDeploy Python安装](../../../../../docs/cn/build_and_install/download_prebuilt_libraries.md) -本目录下提供`infer.py`快速完成PPOCRSystemv2在CPU/GPU,以及GPU上通过TensorRT加速部署的示例。执行如下脚本即可完成 +本目录下提供`infer.py`快速完成PPOCRv2在CPU/GPU,以及GPU上通过TensorRT加速部署的示例。执行如下脚本即可完成 ``` # 下载模型,图片和字典文件 -wget https://bj.bcebos.com/paddlehub/fastdeploy/ch_PP-OCRv2_det_infer.tar.gz -tar -xvf ch_PP-OCRv2_det_infer.tar.gz +wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar +tar -xvf ch_PP-OCRv2_det_infer.tar -wget https://bj.bcebos.com/paddlehub/fastdeploy/ch_ppocr_mobile_v2.0_cls_infer.tar.gz -tar -xvf ch_ppocr_mobile_v2.0_cls_infer.tar.gz +https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar +tar -xvf ch_ppocr_mobile_v2.0_cls_infer.tar -wget https://bj.bcebos.com/paddlehub/fastdeploy/ch_PP-OCRv2_rec_infer.tar.gz -tar -xvf ch_PP-OCRv2_rec_infer.tar.gz +wgethttps://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar +tar -xvf ch_PP-OCRv2_rec_infer.tar wget https://gitee.com/paddlepaddle/PaddleOCR/raw/release/2.6/doc/imgs/12.jpg @@ -39,12 +39,12 @@ python infer.py --det_model ch_PP-OCRv2_det_infer --cls_model ch_ppocr_mobile_v2 运行完成可视化结果如下图所示 -## PPOCRSystemv2 Python接口 +## PPOCRv2 Python接口 ``` -fd.vision.ocr.PPOCRSystemv2(det_model=det_model, cls_model=cls_model, rec_model=rec_model) +fd.vision.ocr.PPOCRv2(det_model=det_model, cls_model=cls_model, rec_model=rec_model) ``` -PPOCRSystemv2的初始化,输入的参数是检测模型,分类模型和识别模型,其中cls_model可选,如无需求,可设置为None +PPOCRv2的初始化,输入的参数是检测模型,分类模型和识别模型,其中cls_model可选,如无需求,可设置为None **参数** @@ -55,7 +55,7 @@ PPOCRSystemv2的初始化,输入的参数是检测模型,分类模型和识别 ### predict函数 > ``` -> result = ocr_system.predict(im) +> result = ppocr_v2.predict(im) > ``` > > 模型预测接口,输入是一张图片 diff --git a/examples/vision/ocr/PP-OCRv2/python/infer.py b/examples/vision/ocr/PP-OCRv2/python/infer.py index 0eaf1bd840..94d55ced97 100644 --- a/examples/vision/ocr/PP-OCRv2/python/infer.py +++ b/examples/vision/ocr/PP-OCRv2/python/infer.py @@ -111,14 +111,14 @@ def build_option(args): runtime_option=runtime_option) # 创建OCR系统,串联3个模型,其中cls_model可选,如无需求,可设置为None -ocr_system = fd.vision.ocr.PPOCRSystemv2( +ppocr_v2 = fd.vision.ocr.PPOCRv2( det_model=det_model, cls_model=cls_model, rec_model=rec_model) # 预测图片准备 im = cv2.imread(args.image) #预测并打印结果 -result = ocr_system.predict(im) +result = ppocr_v2.predict(im) print(result) diff --git a/examples/vision/ocr/PP-OCRv3/cpp/README.md b/examples/vision/ocr/PP-OCRv3/cpp/README.md index 91b0fea0d7..16a6288767 100644 --- a/examples/vision/ocr/PP-OCRv3/cpp/README.md +++ b/examples/vision/ocr/PP-OCRv3/cpp/README.md @@ -1,6 +1,6 @@ -# PPOCRSystemv3 C++部署示例 +# PPOCRv3 C++部署示例 -本目录下提供`infer.cc`快速完成PPOCRSystemv3在CPU/GPU,以及GPU上通过TensorRT加速部署的示例。 +本目录下提供`infer.cc`快速完成PPOCRv3在CPU/GPU,以及GPU上通过TensorRT加速部署的示例。 在部署前,需确认以下两个步骤 @@ -22,8 +22,8 @@ make -j wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar tar -xvf ch_PP-OCRv3_det_infer.tar -wget https://bj.bcebos.com/paddlehub/fastdeploy/ch_ppocr_mobile_v2.0_cls_infer.tar.gz -tar -xvf ch_ppocr_mobile_v2.0_cls_infer.tar.gz +https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar +tar -xvf ch_ppocr_mobile_v2.0_cls_infer.tar wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar tar -xvf ch_PP-OCRv3_rec_infer.tar @@ -48,17 +48,17 @@ wget https://gitee.com/paddlepaddle/PaddleOCR/raw/release/2.6/ppocr/utils/ppocr_ -## PPOCRSystemv3 C++接口 +## PPOCRv3 C++接口 -### PPOCRSystemv3类 +### PPOCRv3类 ``` -fastdeploy::application::ocrsystem::PPOCRSystemv3(fastdeploy::vision::ocr::DBDetector* det_model, +fastdeploy::pipeline::PPOCRv3(fastdeploy::vision::ocr::DBDetector* det_model, fastdeploy::vision::ocr::Classifier* cls_model, fastdeploy::vision::ocr::Recognizer* rec_model); ``` -PPOCRSystemv3 的初始化,由检测,分类和识别模型串联构成 +PPOCRv3 的初始化,由检测,分类和识别模型串联构成 **参数** @@ -67,10 +67,10 @@ PPOCRSystemv3 的初始化,由检测,分类和识别模型串联构成 > * **Recognizer**(model): OCR中的识别模型 ``` -fastdeploy::application::ocrsystem::PPOCRSystemv3(fastdeploy::vision::ocr::DBDetector* det_model, +fastdeploy::pipeline::PPOCRv3(fastdeploy::vision::ocr::DBDetector* det_model, fastdeploy::vision::ocr::Recognizer* rec_model); ``` -PPOCRSystemv3 的初始化,由检测,识别模型串联构成(无分类器) +PPOCRv3 的初始化,由检测,识别模型串联构成(无分类器) **参数** diff --git a/examples/vision/ocr/PP-OCRv3/cpp/infer.cc b/examples/vision/ocr/PP-OCRv3/cpp/infer.cc index a48fb6bc08..333dbaa3fa 100644 --- a/examples/vision/ocr/PP-OCRv3/cpp/infer.cc +++ b/examples/vision/ocr/PP-OCRv3/cpp/infer.cc @@ -37,12 +37,12 @@ void InitAndInfer(const std::string& det_model_dir, const std::string& cls_model assert(cls_model.Initialized()); assert(rec_model.Initialized()); - // The classification model is optional, so the OCR system can also be connected in series as follows - // auto ocr_system_v3 = fastdeploy::application::ocrsystem::PPOCRSystemv3(&det_model, &rec_model); - auto ocr_system_v3 = fastdeploy::application::ocrsystem::PPOCRSystemv3(&det_model, &cls_model, &rec_model); + // The classification model is optional, so the PP-OCR can also be connected in series as follows + // auto ppocr_v3 = fastdeploy::pipeline::PPOCRv3(&det_model, &rec_model); + auto ppocr_v3 = fastdeploy::pipeline::PPOCRv3(&det_model, &cls_model, &rec_model); - if(!ocr_system_v3.Initialized()){ - std::cerr << "Failed to initialize OCR system." << std::endl; + if(!ppocr_v3.Initialized()){ + std::cerr << "Failed to initialize PP-OCR." << std::endl; return; } @@ -50,14 +50,14 @@ void InitAndInfer(const std::string& det_model_dir, const std::string& cls_model auto im_bak = im.clone(); fastdeploy::vision::OCRResult result; - if (!ocr_system_v3.Predict(&im, &result)) { + if (!ppocr_v3.Predict(&im, &result)) { std::cerr << "Failed to predict." << std::endl; return; } std::cout << result.Str() << std::endl; - auto vis_im = fastdeploy::vision::Visualize::VisOcr(im_bak, result); + auto vis_im = fastdeploy::vision::VisOcr(im_bak, result); cv::imwrite("vis_result.jpg", vis_im); std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl; } diff --git a/examples/vision/ocr/PP-OCRv3/python/README.md b/examples/vision/ocr/PP-OCRv3/python/README.md index 0c33e28b2f..0fda05e281 100644 --- a/examples/vision/ocr/PP-OCRv3/python/README.md +++ b/examples/vision/ocr/PP-OCRv3/python/README.md @@ -1,11 +1,11 @@ -# PPOCRSystemv3 Python部署示例 +# PPOCRv3 Python部署示例 在部署前,需确认以下两个步骤 - 1. 软硬件环境满足要求,参考[FastDeploy环境要求](../../../../../docs/cn/build_and_install/download_prebuilt_libraries.md) - 2. FastDeploy Python whl包安装,参考[FastDeploy Python安装](../../../../../docs/cn/build_and_install/download_prebuilt_libraries.md) -本目录下提供`infer.py`快速完成PPOCRSystemv3在CPU/GPU,以及GPU上通过TensorRT加速部署的示例。执行如下脚本即可完成 +本目录下提供`infer.py`快速完成PPOCRv3在CPU/GPU,以及GPU上通过TensorRT加速部署的示例。执行如下脚本即可完成 ``` @@ -13,8 +13,8 @@ wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar tar xvf ch_PP-OCRv3_det_infer.tar -wget https://bj.bcebos.com/paddlehub/fastdeploy/ch_ppocr_mobile_v2.0_cls_infer.tar.gz -tar -xvf ch_ppocr_mobile_v2.0_cls_infer.tar.gz +https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar +tar -xvf ch_ppocr_mobile_v2.0_cls_infer.tar wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar tar xvf ch_PP-OCRv3_rec_infer.tar @@ -38,12 +38,12 @@ python infer.py --det_model ch_PP-OCRv3_det_infer --cls_model ch_ppocr_mobile_v2 运行完成可视化结果如下图所示 -## PPOCRSystemv3 Python接口 +## PPOCRv3 Python接口 ``` -fd.vision.ocr.PPOCRSystemv3(det_model=det_model, cls_model=cls_model, rec_model=rec_model) +fd.vision.ocr.PPOCRv3(det_model=det_model, cls_model=cls_model, rec_model=rec_model) ``` -PPOCRSystemv3的初始化,输入的参数是检测模型,分类模型和识别模型,其中cls_model可选,如无需求,可设置为None +PPOCRv3的初始化,输入的参数是检测模型,分类模型和识别模型,其中cls_model可选,如无需求,可设置为None **参数** @@ -54,7 +54,7 @@ PPOCRSystemv3的初始化,输入的参数是检测模型,分类模型和识别 ### predict函数 > ``` -> result = ocr_system.predict(im) +> result = ppocr_v3.predict(im) > ``` > > 模型预测接口,输入是一张图片 diff --git a/examples/vision/ocr/PP-OCRv3/python/infer.py b/examples/vision/ocr/PP-OCRv3/python/infer.py index 2332a2e36d..1bf7fe0282 100644 --- a/examples/vision/ocr/PP-OCRv3/python/infer.py +++ b/examples/vision/ocr/PP-OCRv3/python/infer.py @@ -111,14 +111,14 @@ def build_option(args): runtime_option=runtime_option) # 创建OCR系统,串联3个模型,其中cls_model可选,如无需求,可设置为None -ocr_system = fd.vision.ocr.PPOCRSystemv3( +ppocr_v3 = fd.vision.ocr.PPOCRv3( det_model=det_model, cls_model=cls_model, rec_model=rec_model) # 预测图片准备 im = cv2.imread(args.image) #预测并打印结果 -result = ocr_system.predict(im) +result = ppocr_v3.predict(im) print(result) diff --git a/fastdeploy/vision.h b/fastdeploy/vision.h index 8e1358d8ac..d01e2e15c5 100755 --- a/fastdeploy/vision.h +++ b/fastdeploy/vision.h @@ -42,8 +42,8 @@ #include "fastdeploy/vision/matting/ppmatting/ppmatting.h" #include "fastdeploy/vision/ocr/ppocr/classifier.h" #include "fastdeploy/vision/ocr/ppocr/dbdetector.h" -#include "fastdeploy/vision/ocr/ppocr/ppocr_system_v2.h" -#include "fastdeploy/vision/ocr/ppocr/ppocr_system_v3.h" +#include "fastdeploy/vision/ocr/ppocr/ppocr_v2.h" +#include "fastdeploy/vision/ocr/ppocr/ppocr_v3.h" #include "fastdeploy/vision/ocr/ppocr/recognizer.h" #include "fastdeploy/vision/segmentation/ppseg/model.h" #endif diff --git a/fastdeploy/vision/ocr/ocr_pybind.cc b/fastdeploy/vision/ocr/ocr_pybind.cc index f2e25b49c2..b1e2348757 100644 --- a/fastdeploy/vision/ocr/ocr_pybind.cc +++ b/fastdeploy/vision/ocr/ocr_pybind.cc @@ -17,13 +17,13 @@ namespace fastdeploy { void BindPPOCRModel(pybind11::module& m); -void BindPPOCRSystemv3(pybind11::module& m); -void BindPPOCRSystemv2(pybind11::module& m); +void BindPPOCRv3(pybind11::module& m); +void BindPPOCRv2(pybind11::module& m); void BindOcr(pybind11::module& m) { auto ocr_module = m.def_submodule("ocr", "Module to deploy OCR models"); BindPPOCRModel(ocr_module); - BindPPOCRSystemv3(ocr_module); - BindPPOCRSystemv2(ocr_module); + BindPPOCRv3(ocr_module); + BindPPOCRv2(ocr_module); } } // namespace fastdeploy diff --git a/fastdeploy/vision/ocr/ppocr/ocrsys_pybind.cc b/fastdeploy/vision/ocr/ppocr/ppocr_pybind.cc similarity index 79% rename from fastdeploy/vision/ocr/ppocr/ocrsys_pybind.cc rename to fastdeploy/vision/ocr/ppocr/ppocr_pybind.cc index 9c22c8abab..a88ae2fc7f 100644 --- a/fastdeploy/vision/ocr/ppocr/ocrsys_pybind.cc +++ b/fastdeploy/vision/ocr/ppocr/ppocr_pybind.cc @@ -15,17 +15,17 @@ #include "fastdeploy/pybind/main.h" namespace fastdeploy { -void BindPPOCRSystemv3(pybind11::module& m) { - // OCRSys - pybind11::class_( - m, "PPOCRSystemv3") +void BindPPOCRv3(pybind11::module& m) { + // PPOCRv3 + pybind11::class_( + m, "PPOCRv3") .def(pybind11::init()) .def(pybind11::init()) - .def("predict", [](application::ocrsystem::PPOCRSystemv3& self, + .def("predict", [](pipeline::PPOCRv3& self, pybind11::array& data) { auto mat = PyArrayToCvMat(data); vision::OCRResult res; @@ -34,16 +34,16 @@ void BindPPOCRSystemv3(pybind11::module& m) { }); } -void BindPPOCRSystemv2(pybind11::module& m) { - // OCRSys - pybind11::class_( - m, "PPOCRSystemv2") +void BindPPOCRv2(pybind11::module& m) { + // PPOCRv2 + pybind11::class_( + m, "PPOCRv2") .def(pybind11::init()) .def(pybind11::init()) - .def("predict", [](application::ocrsystem::PPOCRSystemv2& self, + .def("predict", [](pipeline::PPOCRv2& self, pybind11::array& data) { auto mat = PyArrayToCvMat(data); vision::OCRResult res; diff --git a/fastdeploy/vision/ocr/ppocr/ppocr_system_v2.cc b/fastdeploy/vision/ocr/ppocr/ppocr_v2.cc similarity index 81% rename from fastdeploy/vision/ocr/ppocr/ppocr_system_v2.cc rename to fastdeploy/vision/ocr/ppocr/ppocr_v2.cc index 728b9f8834..06cb476029 100644 --- a/fastdeploy/vision/ocr/ppocr/ppocr_system_v2.cc +++ b/fastdeploy/vision/ocr/ppocr/ppocr_v2.cc @@ -12,27 +12,26 @@ // See the License for the specific language governing permissions and // limitations under the License. -#include "fastdeploy/vision/ocr/ppocr/ppocr_system_v2.h" +#include "fastdeploy/vision/ocr/ppocr/ppocr_v2.h" #include "fastdeploy/utils/perf.h" #include "fastdeploy/vision/ocr/ppocr/utils/ocr_utils.h" namespace fastdeploy { -namespace application { -namespace ocrsystem { -PPOCRSystemv2::PPOCRSystemv2(fastdeploy::vision::ocr::DBDetector* det_model, +namespace pipeline { +PPOCRv2::PPOCRv2(fastdeploy::vision::ocr::DBDetector* det_model, fastdeploy::vision::ocr::Classifier* cls_model, fastdeploy::vision::ocr::Recognizer* rec_model) : detector_(det_model), classifier_(cls_model), recognizer_(rec_model) { recognizer_->rec_image_shape[1] = 32; } -PPOCRSystemv2::PPOCRSystemv2(fastdeploy::vision::ocr::DBDetector* det_model, +PPOCRv2::PPOCRv2(fastdeploy::vision::ocr::DBDetector* det_model, fastdeploy::vision::ocr::Recognizer* rec_model) : detector_(det_model), recognizer_(rec_model) { recognizer_->rec_image_shape[1] = 32; } -bool PPOCRSystemv2::Initialized() const { +bool PPOCRv2::Initialized() const { if (detector_ != nullptr && !detector_->Initialized()){ return false; @@ -48,21 +47,21 @@ bool PPOCRSystemv2::Initialized() const { return true; } -bool PPOCRSystemv2::Detect(cv::Mat* img, +bool PPOCRv2::Detect(cv::Mat* img, fastdeploy::vision::OCRResult* result) { if (!detector_->Predict(img, &(result->boxes))) { - FDERROR << "There's error while detecting image in PPOCRSystem." << std::endl; + FDERROR << "There's error while detecting image in PPOCR." << std::endl; return false; } vision::ocr::SortBoxes(result); return true; } -bool PPOCRSystemv2::Recognize(cv::Mat* img, +bool PPOCRv2::Recognize(cv::Mat* img, fastdeploy::vision::OCRResult* result) { std::tuple rec_result; if (!recognizer_->Predict(img, &rec_result)) { - FDERROR << "There's error while recognizing image in PPOCRSystem." << std::endl; + FDERROR << "There's error while recognizing image in PPOCR." << std::endl; return false; } @@ -71,12 +70,12 @@ bool PPOCRSystemv2::Recognize(cv::Mat* img, return true; } -bool PPOCRSystemv2::Classify(cv::Mat* img, +bool PPOCRv2::Classify(cv::Mat* img, fastdeploy::vision::OCRResult* result) { std::tuple cls_result; if (!classifier_->Predict(img, &cls_result)) { - FDERROR << "There's error while classifying image in PPOCRSystem." << std::endl; + FDERROR << "There's error while classifying image in PPOCR." << std::endl; return false; } @@ -85,7 +84,7 @@ bool PPOCRSystemv2::Classify(cv::Mat* img, return true; } -bool PPOCRSystemv2::Predict(cv::Mat* img, +bool PPOCRv2::Predict(cv::Mat* img, fastdeploy::vision::OCRResult* result) { result->Clear(); if (nullptr != detector_ && !Detect(img, result)) { @@ -120,6 +119,5 @@ bool PPOCRSystemv2::Predict(cv::Mat* img, return true; }; -} // namesapce ocrsystem -} // namespace application +} // namesapce pipeline } // namespace fastdeploy diff --git a/fastdeploy/vision/ocr/ppocr/ppocr_system_v2.h b/fastdeploy/vision/ocr/ppocr/ppocr_v2.h similarity index 85% rename from fastdeploy/vision/ocr/ppocr/ppocr_system_v2.h rename to fastdeploy/vision/ocr/ppocr/ppocr_v2.h index 04bf26b9f6..88d3ee1a31 100644 --- a/fastdeploy/vision/ocr/ppocr/ppocr_system_v2.h +++ b/fastdeploy/vision/ocr/ppocr/ppocr_v2.h @@ -26,14 +26,13 @@ #include "fastdeploy/vision/ocr/ppocr/utils/ocr_postprocess_op.h" namespace fastdeploy { -namespace application { -/** \brief OCR system can launch detection model, classification model and recognition model sequentially. All OCR system APIs are defined inside this namespace. +/** \brief This pipeline can launch detection model, classification model and recognition model sequentially. All OCR pipeline APIs are defined inside this namespace. * */ -namespace ocrsystem { -/*! @brief PPOCRSystemv2 is used to load PP-OCRv2 series models provided by PaddleOCR. +namespace pipeline { +/*! @brief PPOCRv2 is used to load PP-OCRv2 series models provided by PaddleOCR. */ -class FASTDEPLOY_DECL PPOCRSystemv2 : public FastDeployModel { +class FASTDEPLOY_DECL PPOCRv2 : public FastDeployModel { public: /** \brief Set up the detection model path, classification model path and recognition model path respectively. * @@ -41,7 +40,7 @@ class FASTDEPLOY_DECL PPOCRSystemv2 : public FastDeployModel { * \param[in] cls_model Path of classification model, e.g ./ch_ppocr_mobile_v2.0_cls_infer * \param[in] rec_model Path of recognition model, e.g ./ch_PP-OCRv2_rec_infer */ - PPOCRSystemv2(fastdeploy::vision::ocr::DBDetector* det_model, + PPOCRv2(fastdeploy::vision::ocr::DBDetector* det_model, fastdeploy::vision::ocr::Classifier* cls_model, fastdeploy::vision::ocr::Recognizer* rec_model); @@ -50,7 +49,7 @@ class FASTDEPLOY_DECL PPOCRSystemv2 : public FastDeployModel { * \param[in] det_model Path of detection model, e.g ./ch_PP-OCRv2_det_infer * \param[in] rec_model Path of recognition model, e.g ./ch_PP-OCRv2_rec_infer */ - PPOCRSystemv2(fastdeploy::vision::ocr::DBDetector* det_model, + PPOCRv2(fastdeploy::vision::ocr::DBDetector* det_model, fastdeploy::vision::ocr::Recognizer* rec_model); /** \brief Predict the input image and get OCR result. @@ -74,6 +73,11 @@ class FASTDEPLOY_DECL PPOCRSystemv2 : public FastDeployModel { virtual bool Classify(cv::Mat* img, fastdeploy::vision::OCRResult* result); }; +namespace application { +namespace ocrsystem { + typedef pipeline::PPOCRv2 PPOCRSystemv2; } // namespace ocrsystem } // namespace application + +} // namespace pipeline } // namespace fastdeploy diff --git a/fastdeploy/vision/ocr/ppocr/ppocr_system_v3.h b/fastdeploy/vision/ocr/ppocr/ppocr_v3.h similarity index 73% rename from fastdeploy/vision/ocr/ppocr/ppocr_system_v3.h rename to fastdeploy/vision/ocr/ppocr/ppocr_v3.h index c88a0aff20..e248eca75e 100644 --- a/fastdeploy/vision/ocr/ppocr/ppocr_system_v3.h +++ b/fastdeploy/vision/ocr/ppocr/ppocr_v3.h @@ -14,17 +14,16 @@ #pragma once -#include "fastdeploy/vision/ocr/ppocr/ppocr_system_v2.h" +#include "fastdeploy/vision/ocr/ppocr/ppocr_v2.h" namespace fastdeploy { -namespace application { -/** \brief OCR system can launch detection model, classification model and recognition model sequentially. All OCR system APIs are defined inside this namespace. +/** \brief This pipeline can launch detection model, classification model and recognition model sequentially. All OCR pipeline APIs are defined inside this namespace. * */ -namespace ocrsystem { -/*! @brief PPOCRSystemv3 is used to load PP-OCRv3 series models provided by PaddleOCR. +namespace pipeline { +/*! @brief PPOCRv3 is used to load PP-OCRv3 series models provided by PaddleOCR. */ -class FASTDEPLOY_DECL PPOCRSystemv3 : public PPOCRSystemv2 { +class FASTDEPLOY_DECL PPOCRv3 : public PPOCRv2 { public: /** \brief Set up the detection model path, classification model path and recognition model path respectively. * @@ -32,10 +31,10 @@ class FASTDEPLOY_DECL PPOCRSystemv3 : public PPOCRSystemv2 { * \param[in] cls_model Path of classification model, e.g ./ch_ppocr_mobile_v2.0_cls_infer * \param[in] rec_model Path of recognition model, e.g ./ch_PP-OCRv3_rec_infer */ - PPOCRSystemv3(fastdeploy::vision::ocr::DBDetector* det_model, + PPOCRv3(fastdeploy::vision::ocr::DBDetector* det_model, fastdeploy::vision::ocr::Classifier* cls_model, fastdeploy::vision::ocr::Recognizer* rec_model) - : PPOCRSystemv2(det_model, cls_model, rec_model) { + : PPOCRv2(det_model, cls_model, rec_model) { // The only difference between v2 and v3 recognizer_->rec_image_shape[1] = 48; } @@ -44,14 +43,20 @@ class FASTDEPLOY_DECL PPOCRSystemv3 : public PPOCRSystemv2 { * \param[in] det_model Path of detection model, e.g ./ch_PP-OCRv3_det_infer * \param[in] rec_model Path of recognition model, e.g ./ch_PP-OCRv3_rec_infer */ - PPOCRSystemv3(fastdeploy::vision::ocr::DBDetector* det_model, + PPOCRv3(fastdeploy::vision::ocr::DBDetector* det_model, fastdeploy::vision::ocr::Recognizer* rec_model) - : PPOCRSystemv2(det_model, rec_model) { + : PPOCRv2(det_model, rec_model) { // The only difference between v2 and v3 recognizer_->rec_image_shape[1] = 48; } }; +} // namespace pipeline + +namespace application { +namespace ocrsystem { + typedef pipeline::PPOCRv3 PPOCRSystemv3; } // namespace ocrsystem } // namespace application + } // namespace fastdeploy diff --git a/fastdeploy/vision/ocr/ppocr/recognizer.cc b/fastdeploy/vision/ocr/ppocr/recognizer.cc index fdea6ed9cb..e2a93e9a6a 100644 --- a/fastdeploy/vision/ocr/ppocr/recognizer.cc +++ b/fastdeploy/vision/ocr/ppocr/recognizer.cc @@ -56,9 +56,6 @@ Recognizer::Recognizer(const std::string& model_file, runtime_option.model_format = model_format; runtime_option.model_file = model_file; runtime_option.params_file = params_file; - runtime_option.DeletePaddleBackendPass("matmul_transpose_reshape_fuse_pass"); - runtime_option.DeletePaddleBackendPass( - "matmul_transpose_reshape_mkldnn_fuse_pass"); initialized = Initialize(); diff --git a/python/fastdeploy/vision/ocr/__init__.py b/python/fastdeploy/vision/ocr/__init__.py index 7ff77734c1..3b2cea05de 100644 --- a/python/fastdeploy/vision/ocr/__init__.py +++ b/python/fastdeploy/vision/ocr/__init__.py @@ -13,8 +13,8 @@ # limitations under the License. from __future__ import absolute_import -from .ppocr import PPOCRSystemv3 -from .ppocr import PPOCRSystemv2 +from .ppocr import PPOCRv3 +from .ppocr import PPOCRv2 from .ppocr import DBDetector from .ppocr import Classifier from .ppocr import Recognizer diff --git a/python/fastdeploy/vision/ocr/ppocr/__init__.py b/python/fastdeploy/vision/ocr/ppocr/__init__.py index 412332e3a6..6668ff3f20 100644 --- a/python/fastdeploy/vision/ocr/ppocr/__init__.py +++ b/python/fastdeploy/vision/ocr/ppocr/__init__.py @@ -229,7 +229,7 @@ def rec_batch_num(self, value): self._model.rec_batch_num = value -class PPOCRSystemv3(FastDeployModel): +class PPOCRv3(FastDeployModel): def __init__(self, det_model=None, cls_model=None, rec_model=None): """Consruct a pipeline with text detector, direction classifier and text recognizer models @@ -239,10 +239,10 @@ def __init__(self, det_model=None, cls_model=None, rec_model=None): """ assert det_model is not None and rec_model is not None, "The det_model and rec_model cannot be None." if cls_model is None: - self.system = C.vision.ocr.PPOCRSystemv3(det_model._model, - rec_model._model) + self.system = C.vision.ocr.PPOCRv3(det_model._model, + rec_model._model) else: - self.system = C.vision.ocr.PPOCRSystemv3( + self.system = C.vision.ocr.PPOCRv3( det_model._model, cls_model._model, rec_model._model) def predict(self, input_image): @@ -254,7 +254,7 @@ def predict(self, input_image): return self.system.predict(input_image) -class PPOCRSystemv2(FastDeployModel): +class PPOCRv2(FastDeployModel): def __init__(self, det_model=None, cls_model=None, rec_model=None): """Consruct a pipeline with text detector, direction classifier and text recognizer models @@ -264,10 +264,10 @@ def __init__(self, det_model=None, cls_model=None, rec_model=None): """ assert det_model is not None and rec_model is not None, "The det_model and rec_model cannot be None." if cls_model is None: - self.system = C.vision.ocr.PPOCRSystemv2(det_model._model, - rec_model._model) + self.system = C.vision.ocr.PPOCRv2(det_model._model, + rec_model._model) else: - self.system = C.vision.ocr.PPOCRSystemv2( + self.system = C.vision.ocr.PPOCRv2( det_model._model, cls_model._model, rec_model._model) def predict(self, input_image): From 950ec46b45ddb61207a9f0910c7abac220fb59aa Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Wed, 19 Oct 2022 03:15:47 +0000 Subject: [PATCH 33/52] Make sure previous python example works --- examples/vision/ocr/PP-OCRv2/python/infer.py | 2 +- examples/vision/ocr/PP-OCRv3/python/infer.py | 2 +- python/fastdeploy/vision/ocr/__init__.py | 2 ++ .../fastdeploy/vision/ocr/ppocr/__init__.py | 20 +++++++++++++++++++ 4 files changed, 24 insertions(+), 2 deletions(-) diff --git a/examples/vision/ocr/PP-OCRv2/python/infer.py b/examples/vision/ocr/PP-OCRv2/python/infer.py index 94d55ced97..984ede8e71 100644 --- a/examples/vision/ocr/PP-OCRv2/python/infer.py +++ b/examples/vision/ocr/PP-OCRv2/python/infer.py @@ -110,7 +110,7 @@ def build_option(args): rec_label_file, runtime_option=runtime_option) -# 创建OCR系统,串联3个模型,其中cls_model可选,如无需求,可设置为None +# 创建PP-OCR,串联3个模型,其中cls_model可选,如无需求,可设置为None ppocr_v2 = fd.vision.ocr.PPOCRv2( det_model=det_model, cls_model=cls_model, rec_model=rec_model) diff --git a/examples/vision/ocr/PP-OCRv3/python/infer.py b/examples/vision/ocr/PP-OCRv3/python/infer.py index 1bf7fe0282..46df9c5078 100644 --- a/examples/vision/ocr/PP-OCRv3/python/infer.py +++ b/examples/vision/ocr/PP-OCRv3/python/infer.py @@ -110,7 +110,7 @@ def build_option(args): rec_label_file, runtime_option=runtime_option) -# 创建OCR系统,串联3个模型,其中cls_model可选,如无需求,可设置为None +# 创建PP-OCR,串联3个模型,其中cls_model可选,如无需求,可设置为None ppocr_v3 = fd.vision.ocr.PPOCRv3( det_model=det_model, cls_model=cls_model, rec_model=rec_model) diff --git a/python/fastdeploy/vision/ocr/__init__.py b/python/fastdeploy/vision/ocr/__init__.py index 3b2cea05de..98e210d3b0 100644 --- a/python/fastdeploy/vision/ocr/__init__.py +++ b/python/fastdeploy/vision/ocr/__init__.py @@ -15,6 +15,8 @@ from .ppocr import PPOCRv3 from .ppocr import PPOCRv2 +from .ppocr import PPOCRSystemv3 +from .ppocr import PPOCRSystemv2 from .ppocr import DBDetector from .ppocr import Classifier from .ppocr import Recognizer diff --git a/python/fastdeploy/vision/ocr/ppocr/__init__.py b/python/fastdeploy/vision/ocr/ppocr/__init__.py index 6668ff3f20..106f0b6789 100644 --- a/python/fastdeploy/vision/ocr/ppocr/__init__.py +++ b/python/fastdeploy/vision/ocr/ppocr/__init__.py @@ -254,6 +254,16 @@ def predict(self, input_image): return self.system.predict(input_image) +class PPOCRSystemv3(PPOCRv3): + def __init__(self, det_model=None, cls_model=None, rec_model=None): + print("DEPRECATED: fd.vision.ocr.PPOCRSystemv3 is deprecated, " + "please use fd.vision.ocr.PPOCRv3 instead.") + super(PPOCRSystemv3, self).__init__(det_model, cls_model, rec_model) + + def predict(self, input_image): + return super(PPOCRSystemv3, self).predict(input_image) + + class PPOCRv2(FastDeployModel): def __init__(self, det_model=None, cls_model=None, rec_model=None): """Consruct a pipeline with text detector, direction classifier and text recognizer models @@ -277,3 +287,13 @@ def predict(self, input_image): :return: OCRResult """ return self.system.predict(input_image) + + +class PPOCRSystemv2(PPOCRv2): + def __init__(self, det_model=None, cls_model=None, rec_model=None): + print("DEPRECATED: fd.vision.ocr.PPOCRSystemv2 is deprecated, " + "please use fd.vision.ocr.PPOCRv2 instead.") + super(PPOCRSystemv2, self).__init__(det_model, cls_model, rec_model) + + def predict(self, input_image): + return super(PPOCRSystemv2, self).predict(input_image) From aabc074bf5b8f19810a7af47e1bf68a4cffe43b5 Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Wed, 19 Oct 2022 06:10:15 +0000 Subject: [PATCH 34/52] Make sure previous python example works --- python/fastdeploy/vision/ocr/ppocr/__init__.py | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/python/fastdeploy/vision/ocr/ppocr/__init__.py b/python/fastdeploy/vision/ocr/ppocr/__init__.py index 106f0b6789..e361a3a8ac 100644 --- a/python/fastdeploy/vision/ocr/ppocr/__init__.py +++ b/python/fastdeploy/vision/ocr/ppocr/__init__.py @@ -256,8 +256,9 @@ def predict(self, input_image): class PPOCRSystemv3(PPOCRv3): def __init__(self, det_model=None, cls_model=None, rec_model=None): - print("DEPRECATED: fd.vision.ocr.PPOCRSystemv3 is deprecated, " - "please use fd.vision.ocr.PPOCRv3 instead.") + logging.warning( + "DEPRECATED: fd.vision.ocr.PPOCRSystemv3 is deprecated, " + "please use fd.vision.ocr.PPOCRv3 instead.") super(PPOCRSystemv3, self).__init__(det_model, cls_model, rec_model) def predict(self, input_image): @@ -291,8 +292,9 @@ def predict(self, input_image): class PPOCRSystemv2(PPOCRv2): def __init__(self, det_model=None, cls_model=None, rec_model=None): - print("DEPRECATED: fd.vision.ocr.PPOCRSystemv2 is deprecated, " - "please use fd.vision.ocr.PPOCRv2 instead.") + logging.warning( + "DEPRECATED: fd.vision.ocr.PPOCRSystemv2 is deprecated, " + "please use fd.vision.ocr.PPOCRv2 instead.") super(PPOCRSystemv2, self).__init__(det_model, cls_model, rec_model) def predict(self, input_image): From 6efbbf4c486f4903d25e5ebcafe99f73b6f80bba Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Wed, 2 Nov 2022 09:59:14 +0000 Subject: [PATCH 35/52] Fix Rec model bug --- fastdeploy/vision/ocr/ppocr/recognizer.cc | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fastdeploy/vision/ocr/ppocr/recognizer.cc b/fastdeploy/vision/ocr/ppocr/recognizer.cc index 4ca52df12d..da03109734 100644 --- a/fastdeploy/vision/ocr/ppocr/recognizer.cc +++ b/fastdeploy/vision/ocr/ppocr/recognizer.cc @@ -165,7 +165,9 @@ bool Recognizer::Postprocess(FDTensor& infer_result, last_index = argmax_idx; } score /= count; - + if (std::isnan(score)) { + continue; + } std::get<0>(*rec_result) = str_res; std::get<1>(*rec_result) = score; From f6b24ca967bc71ed6e8dde7a7f43f9b8dd3056f8 Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Wed, 2 Nov 2022 11:30:59 +0000 Subject: [PATCH 36/52] Fix Rec model bug --- fastdeploy/vision/ocr/ppocr/recognizer.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fastdeploy/vision/ocr/ppocr/recognizer.cc b/fastdeploy/vision/ocr/ppocr/recognizer.cc index da03109734..284e5ba45f 100644 --- a/fastdeploy/vision/ocr/ppocr/recognizer.cc +++ b/fastdeploy/vision/ocr/ppocr/recognizer.cc @@ -166,7 +166,7 @@ bool Recognizer::Postprocess(FDTensor& infer_result, } score /= count; if (std::isnan(score)) { - continue; + score = 0.f; } std::get<0>(*rec_result) = str_res; std::get<1>(*rec_result) = score; From db51e742325c882080c3f771c05d70c23426a1b1 Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Wed, 2 Nov 2022 13:41:38 +0000 Subject: [PATCH 37/52] Fix rec model bug --- fastdeploy/vision/ocr/ppocr/recognizer.cc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fastdeploy/vision/ocr/ppocr/recognizer.cc b/fastdeploy/vision/ocr/ppocr/recognizer.cc index 284e5ba45f..f0564ce339 100644 --- a/fastdeploy/vision/ocr/ppocr/recognizer.cc +++ b/fastdeploy/vision/ocr/ppocr/recognizer.cc @@ -164,8 +164,8 @@ bool Recognizer::Postprocess(FDTensor& infer_result, } last_index = argmax_idx; } - score /= count; - if (std::isnan(score)) { + score /= (count + 1e-6); + if (count == 0 || std::isnan(score)) { score = 0.f; } std::get<0>(*rec_result) = str_res; From ce7affe30cf41c60879f5e38f0032acb7d02d473 Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Tue, 8 Nov 2022 08:11:43 +0000 Subject: [PATCH 38/52] Add SetTrtMaxBatchSize function for TensorRT --- fastdeploy/backends/paddle/paddle_backend.cc | 5 +++-- fastdeploy/runtime.cc | 3 +++ fastdeploy/runtime.h | 5 ++++- 3 files changed, 10 insertions(+), 3 deletions(-) diff --git a/fastdeploy/backends/paddle/paddle_backend.cc b/fastdeploy/backends/paddle/paddle_backend.cc index 70d8305c51..c1ecacee2f 100644 --- a/fastdeploy/backends/paddle/paddle_backend.cc +++ b/fastdeploy/backends/paddle/paddle_backend.cc @@ -36,7 +36,7 @@ void PaddleBackend::BuildOption(const PaddleBackendOption& option) { FDWARNING << "Detect that tensorrt cache file has been set to " << option.trt_option.serialize_file << ", but while enable paddle2trt, please notice that the cache file will save to the directory where paddle model saved." << std::endl; use_static = true; } - config_.EnableTensorRtEngine(option.trt_option.max_workspace_size, 32, 3, precision, use_static); + config_.EnableTensorRtEngine(option.trt_option.max_workspace_size, option.trt_option.max_batch_size, 3, precision, use_static); SetTRTDynamicShapeToConfig(option); #else FDWARNING << "The FastDeploy is not compiled with TensorRT backend, so will fallback to GPU with Paddle Inference Backend." << std::endl; @@ -112,8 +112,9 @@ bool PaddleBackend::InitFromPaddle(const std::string& model_file, FDWARNING << "Detect that tensorrt cache file has been set to " << option.trt_option.serialize_file << ", but while enable paddle2trt, please notice that the cache file will save to the directory where paddle model saved." << std::endl; use_static = true; } - config_.EnableTensorRtEngine(option.trt_option.max_workspace_size, 32, 3, paddle_infer::PrecisionType::kInt8, use_static, false); + config_.EnableTensorRtEngine(option.trt_option.max_workspace_size, option.trt_option.max_batch_size, 3, paddle_infer::PrecisionType::kInt8, use_static, false); SetTRTDynamicShapeToConfig(option); + #endif } } diff --git a/fastdeploy/runtime.cc b/fastdeploy/runtime.cc index 94ea9de0b0..4dd1bac59d 100755 --- a/fastdeploy/runtime.cc +++ b/fastdeploy/runtime.cc @@ -371,6 +371,9 @@ void RuntimeOption::SetTrtInputShape(const std::string& input_name, void RuntimeOption::SetTrtMaxWorkspaceSize(size_t max_workspace_size) { trt_max_workspace_size = max_workspace_size; } +void RuntimeOption::SetTrtMaxBatchSize(size_t max_batch_size){ + trt_max_batch_size = max_batch_size; +} void RuntimeOption::EnableTrtFP16() { trt_enable_fp16 = true; } diff --git a/fastdeploy/runtime.h b/fastdeploy/runtime.h index 7ab6f1fb25..8330f412e8 100644 --- a/fastdeploy/runtime.h +++ b/fastdeploy/runtime.h @@ -200,6 +200,9 @@ struct FASTDEPLOY_DECL RuntimeOption { /// Set max_workspace_size for TensorRT, default 1<<30 void SetTrtMaxWorkspaceSize(size_t trt_max_workspace_size); + /// Set max_batch_size for TensorRT, default 32 + void SetTrtMaxBatchSize(size_t max_batch_size); + /** * @brief Enable FP16 inference while using TensorRT backend. Notice: not all the GPU device support FP16, on those device doesn't support FP16, FastDeploy will fallback to FP32 automaticly */ @@ -339,7 +342,7 @@ struct FASTDEPLOY_DECL RuntimeOption { std::string model_file = ""; // Path of model file std::string params_file = ""; // Path of parameters file, can be empty // format of input model - ModelFormat model_format = ModelFormat::AUTOREC; + ModelFormat model_format = ModelFormat::AUTOREC; }; /*! @brief Runtime object used to inference the loaded model on different devices From 9dc9ac683fa2d3e6f703a6fa10b88992a7e4744e Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Tue, 8 Nov 2022 11:48:17 +0000 Subject: [PATCH 39/52] Add SetTrtMaxBatchSize Pybind --- fastdeploy/pybind/runtime.cc | 1 + 1 file changed, 1 insertion(+) diff --git a/fastdeploy/pybind/runtime.cc b/fastdeploy/pybind/runtime.cc index 11cf9bf4ed..759c555309 100644 --- a/fastdeploy/pybind/runtime.cc +++ b/fastdeploy/pybind/runtime.cc @@ -42,6 +42,7 @@ void BindRuntime(pybind11::module& m) { .def("set_lite_power_mode", &RuntimeOption::SetLitePowerMode) .def("set_trt_input_shape", &RuntimeOption::SetTrtInputShape) .def("set_trt_max_workspace_size", &RuntimeOption::SetTrtMaxWorkspaceSize) + .def("set_trt_max_batch_size", &RuntimeOption::SetTrtMaxBatchSize) .def("enable_paddle_to_trt", &RuntimeOption::EnablePaddleToTrt) .def("enable_trt_fp16", &RuntimeOption::EnableTrtFP16) .def("disable_trt_fp16", &RuntimeOption::DisableTrtFP16) From 17370996d6ef1b2c3e78e9ed755c9f94829dddde Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Wed, 9 Nov 2022 01:50:17 +0000 Subject: [PATCH 40/52] Add set_trt_max_batch_size python function --- python/fastdeploy/runtime.py | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/python/fastdeploy/runtime.py b/python/fastdeploy/runtime.py index e8a6058a46..4d0311d4b9 100755 --- a/python/fastdeploy/runtime.py +++ b/python/fastdeploy/runtime.py @@ -18,6 +18,7 @@ from . import c_lib_wrap as C from . import rknpu2 + class Runtime: """FastDeploy Runtime object. """ @@ -207,10 +208,12 @@ def use_cpu(self): """ return self._option.use_cpu() - def use_rknpu2(self,rknpu2_name=rknpu2.CpuName.RK3588,rknpu2_core=rknpu2.CoreMask.RKNN_NPU_CORE_0): + def use_rknpu2(self, + rknpu2_name=rknpu2.CpuName.RK3588, + rknpu2_core=rknpu2.CoreMask.RKNN_NPU_CORE_0): """Inference with CPU """ - return self._option.use_rknpu2(rknpu2_name,rknpu2_core) + return self._option.use_rknpu2(rknpu2_name, rknpu2_core) def set_cpu_thread_num(self, thread_num=-1): """Set number of threads if inference with CPU @@ -344,6 +347,11 @@ def set_trt_max_workspace_size(self, trt_max_workspace_size): """ return self._option.set_trt_max_workspace_size(trt_max_workspace_size) + def set_trt_max_batch_size(self, trt_max_batch_size): + """Set max batch size while using TensorRT backend. + """ + return self._option.set_trt_max_batch_size(trt_max_batch_size) + def enable_paddle_trt_collect_shape(self): return self._option.enable_paddle_trt_collect_shape() From b2daadc343b75eed519c0ec80fcf082f6e595aaa Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Fri, 11 Nov 2022 05:16:21 +0000 Subject: [PATCH 41/52] Set TRT dynamic shape in PPOCR examples --- examples/vision/ocr/PP-OCRv2/cpp/infer.cc | 22 ++++++++++++--- examples/vision/ocr/PP-OCRv2/python/infer.py | 28 +++++++++++++------- examples/vision/ocr/PP-OCRv3/cpp/infer.cc | 22 ++++++++++++--- examples/vision/ocr/PP-OCRv3/python/infer.py | 26 +++++++++++++----- 4 files changed, 77 insertions(+), 21 deletions(-) diff --git a/examples/vision/ocr/PP-OCRv2/cpp/infer.cc b/examples/vision/ocr/PP-OCRv2/cpp/infer.cc index 9d628689b5..7ec873a882 100644 --- a/examples/vision/ocr/PP-OCRv2/cpp/infer.cc +++ b/examples/vision/ocr/PP-OCRv2/cpp/infer.cc @@ -29,9 +29,25 @@ void InitAndInfer(const std::string& det_model_dir, const std::string& cls_model auto rec_model_file = rec_model_dir + sep + "inference.pdmodel"; auto rec_params_file = rec_model_dir + sep + "inference.pdiparams"; - auto det_model = fastdeploy::vision::ocr::DBDetector(det_model_file, det_params_file, option); - auto cls_model = fastdeploy::vision::ocr::Classifier(cls_model_file, cls_params_file, option); - auto rec_model = fastdeploy::vision::ocr::Recognizer(rec_model_file, rec_params_file, rec_label_file, option); + auto det_option = option; + auto cls_option = option; + auto rec_option = option; + + // If use TRT backend, the dynamic shape will be set as follow. + det_option.SetTrtInputShape("x", {1, 3, 50, 50}, {1, 3, 640, 640}, + {1, 3, 1536, 1536}); + cls_option.SetTrtInputShape("x", {1, 3, 48, 10},{1, 3, 48, 320},{1, 3, 48, 1024}); + rec_option.SetTrtInputShape("x", {1, 3, 32, 10}, {1, 3, 32, 320}, + {1, 3, 32, 2304}); + + // Users could save TRT cache file to diskas follow. + // det_option.SetTrtCacheFile(det_model_dir + sep + "det_trt_cache.trt"); + // cls_option.SetTrtCacheFile(cls_model_dir + sep + "cls_trt_cache.trt"); + // rec_option.SetTrtCacheFile(rec_model_dir + sep + "rec_trt_cache.trt"); + + auto det_model = fastdeploy::vision::ocr::DBDetector(det_model_file, det_params_file, det_option); + auto cls_model = fastdeploy::vision::ocr::Classifier(cls_model_file, cls_params_file, cls_option); + auto rec_model = fastdeploy::vision::ocr::Recognizer(rec_model_file, rec_params_file, rec_label_file, rec_option); assert(det_model.Initialized()); assert(cls_model.Initialized()); diff --git a/examples/vision/ocr/PP-OCRv2/python/infer.py b/examples/vision/ocr/PP-OCRv2/python/infer.py index 984ede8e71..1bb94eb7e6 100644 --- a/examples/vision/ocr/PP-OCRv2/python/infer.py +++ b/examples/vision/ocr/PP-OCRv2/python/infer.py @@ -96,19 +96,29 @@ def build_option(args): rec_params_file = os.path.join(args.rec_model, "inference.pdiparams") rec_label_file = args.rec_label_file -# 对于三个模型,均采用同样的部署配置 -# 用户也可根据自行需求分别配置 -runtime_option = build_option(args) +det_option = runtime_option +cls_option = runtime_option +rec_option = runtime_option + +# 当使用TRT时,分别给三个Runtime设置动态shape +det_option.set_trt_input_shape("x", [1, 3, 50, 50], [1, 3, 640, 640], + [1, 3, 1536, 1536]) +cls_option.set_trt_input_shape("x", [1, 3, 48, 10], [1, 3, 48, 320], + [1, 3, 48, 1024]) +rec_option.set_trt_input_shape("x", [1, 3, 32, 10], [1, 3, 32, 320], + [1, 3, 32, 2304]) + +# 用户可以把TRT引擎文件保存至本地 +# det_option.set_trt_cache_file(args.det_model + "/det_trt_cache.trt") +# cls_option.set_trt_cache_file(args.cls_model + "/cls_trt_cache.trt") +# rec_option.set_trt_cache_file(args.rec_model + "/rec_trt_cache.trt") det_model = fd.vision.ocr.DBDetector( - det_model_file, det_params_file, runtime_option=runtime_option) + det_model_file, det_params_file, runtime_option=det_option) cls_model = fd.vision.ocr.Classifier( - cls_model_file, cls_params_file, runtime_option=runtime_option) + cls_model_file, cls_params_file, runtime_option=cls_option) rec_model = fd.vision.ocr.Recognizer( - rec_model_file, - rec_params_file, - rec_label_file, - runtime_option=runtime_option) + rec_model_file, rec_params_file, rec_label_file, runtime_option=rec_option) # 创建PP-OCR,串联3个模型,其中cls_model可选,如无需求,可设置为None ppocr_v2 = fd.vision.ocr.PPOCRv2( diff --git a/examples/vision/ocr/PP-OCRv3/cpp/infer.cc b/examples/vision/ocr/PP-OCRv3/cpp/infer.cc index 333dbaa3fa..fcf15e87da 100644 --- a/examples/vision/ocr/PP-OCRv3/cpp/infer.cc +++ b/examples/vision/ocr/PP-OCRv3/cpp/infer.cc @@ -29,9 +29,25 @@ void InitAndInfer(const std::string& det_model_dir, const std::string& cls_model auto rec_model_file = rec_model_dir + sep + "inference.pdmodel"; auto rec_params_file = rec_model_dir + sep + "inference.pdiparams"; - auto det_model = fastdeploy::vision::ocr::DBDetector(det_model_file, det_params_file, option); - auto cls_model = fastdeploy::vision::ocr::Classifier(cls_model_file, cls_params_file, option); - auto rec_model = fastdeploy::vision::ocr::Recognizer(rec_model_file, rec_params_file, rec_label_file, option); + auto det_option = option; + auto cls_option = option; + auto rec_option = option; + + // If use TRT backend, the dynamic shape will be set as follow. + det_option.SetTrtInputShape("x", {1, 3, 50, 50}, {1, 3, 640, 640}, + {1, 3, 1536, 1536}); + cls_option.SetTrtInputShape("x", {1, 3, 48, 10},{1, 3, 48, 320},{1, 3, 48, 1024}); + rec_option.SetTrtInputShape("x", {1, 3, 48, 10}, {1, 3, 48, 320}, + {1, 3, 48, 2304}); + + // Users could save TRT cache file to disk as follow. + // det_option.SetTrtCacheFile(det_model_dir + sep + "det_trt_cache.trt"); + // cls_option.SetTrtCacheFile(cls_model_dir + sep + "cls_trt_cache.trt"); + // rec_option.SetTrtCacheFile(rec_model_dir + sep + "rec_trt_cache.trt"); + + auto det_model = fastdeploy::vision::ocr::DBDetector(det_model_file, det_params_file, det_option); + auto cls_model = fastdeploy::vision::ocr::Classifier(cls_model_file, cls_params_file, cls_option); + auto rec_model = fastdeploy::vision::ocr::Recognizer(rec_model_file, rec_params_file, rec_label_file, rec_option); assert(det_model.Initialized()); assert(cls_model.Initialized()); diff --git a/examples/vision/ocr/PP-OCRv3/python/infer.py b/examples/vision/ocr/PP-OCRv3/python/infer.py index 46df9c5078..43b9b630ca 100644 --- a/examples/vision/ocr/PP-OCRv3/python/infer.py +++ b/examples/vision/ocr/PP-OCRv3/python/infer.py @@ -100,15 +100,29 @@ def build_option(args): # 用户也可根据自行需求分别配置 runtime_option = build_option(args) +det_option = runtime_option +cls_option = runtime_option +rec_option = runtime_option + +# 当使用TRT时,分别给三个Runtime设置动态shape +det_option.set_trt_input_shape("x", [1, 3, 50, 50], [1, 3, 640, 640], + [1, 3, 1536, 1536]) +cls_option.set_trt_input_shape("x", [1, 3, 48, 10], [1, 3, 48, 320], + [1, 3, 48, 1024]) +rec_option.set_trt_input_shape("x", [1, 3, 48, 10], [1, 3, 48, 320], + [1, 3, 48, 2304]) + +# 用户可以把TRT引擎文件保存至本地 +# det_option.set_trt_cache_file(args.det_model + "/det_trt_cache.trt") +# cls_option.set_trt_cache_file(args.cls_model + "/cls_trt_cache.trt") +# rec_option.set_trt_cache_file(args.rec_model + "/rec_trt_cache.trt") + det_model = fd.vision.ocr.DBDetector( - det_model_file, det_params_file, runtime_option=runtime_option) + det_model_file, det_params_file, runtime_option=det_option) cls_model = fd.vision.ocr.Classifier( - cls_model_file, cls_params_file, runtime_option=runtime_option) + cls_model_file, cls_params_file, runtime_option=cls_option) rec_model = fd.vision.ocr.Recognizer( - rec_model_file, - rec_params_file, - rec_label_file, - runtime_option=runtime_option) + rec_model_file, rec_params_file, rec_label_file, runtime_option=rec_option) # 创建PP-OCR,串联3个模型,其中cls_model可选,如无需求,可设置为None ppocr_v3 = fd.vision.ocr.PPOCRv3( From ad7d3a35b6b1a638dd8fe620642468ecb80a8d57 Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Fri, 11 Nov 2022 06:00:47 +0000 Subject: [PATCH 42/52] Set TRT dynamic shape in PPOCR examples --- examples/vision/ocr/PP-OCRv2/cpp/infer.cc | 2 +- examples/vision/ocr/PP-OCRv3/cpp/infer.cc | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/examples/vision/ocr/PP-OCRv2/cpp/infer.cc b/examples/vision/ocr/PP-OCRv2/cpp/infer.cc index 7ec873a882..4b9f4d58c0 100644 --- a/examples/vision/ocr/PP-OCRv2/cpp/infer.cc +++ b/examples/vision/ocr/PP-OCRv2/cpp/infer.cc @@ -36,7 +36,7 @@ void InitAndInfer(const std::string& det_model_dir, const std::string& cls_model // If use TRT backend, the dynamic shape will be set as follow. det_option.SetTrtInputShape("x", {1, 3, 50, 50}, {1, 3, 640, 640}, {1, 3, 1536, 1536}); - cls_option.SetTrtInputShape("x", {1, 3, 48, 10},{1, 3, 48, 320},{1, 3, 48, 1024}); + cls_option.SetTrtInputShape("x", {1, 3, 48, 10}, {1, 3, 48, 320}, {1, 3, 48, 1024}); rec_option.SetTrtInputShape("x", {1, 3, 32, 10}, {1, 3, 32, 320}, {1, 3, 32, 2304}); diff --git a/examples/vision/ocr/PP-OCRv3/cpp/infer.cc b/examples/vision/ocr/PP-OCRv3/cpp/infer.cc index fcf15e87da..da950872b5 100644 --- a/examples/vision/ocr/PP-OCRv3/cpp/infer.cc +++ b/examples/vision/ocr/PP-OCRv3/cpp/infer.cc @@ -36,7 +36,7 @@ void InitAndInfer(const std::string& det_model_dir, const std::string& cls_model // If use TRT backend, the dynamic shape will be set as follow. det_option.SetTrtInputShape("x", {1, 3, 50, 50}, {1, 3, 640, 640}, {1, 3, 1536, 1536}); - cls_option.SetTrtInputShape("x", {1, 3, 48, 10},{1, 3, 48, 320},{1, 3, 48, 1024}); + cls_option.SetTrtInputShape("x", {1, 3, 48, 10}, {1, 3, 48, 320}, {1, 3, 48, 1024}); rec_option.SetTrtInputShape("x", {1, 3, 48, 10}, {1, 3, 48, 320}, {1, 3, 48, 2304}); From 266d5d620698d2e77d00483dda25ac79674fd92d Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Fri, 11 Nov 2022 06:02:32 +0000 Subject: [PATCH 43/52] Set TRT dynamic shape in PPOCR examples --- examples/vision/ocr/PP-OCRv2/cpp/infer.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/vision/ocr/PP-OCRv2/cpp/infer.cc b/examples/vision/ocr/PP-OCRv2/cpp/infer.cc index 4b9f4d58c0..2537c12bb9 100644 --- a/examples/vision/ocr/PP-OCRv2/cpp/infer.cc +++ b/examples/vision/ocr/PP-OCRv2/cpp/infer.cc @@ -40,7 +40,7 @@ void InitAndInfer(const std::string& det_model_dir, const std::string& cls_model rec_option.SetTrtInputShape("x", {1, 3, 32, 10}, {1, 3, 32, 320}, {1, 3, 32, 2304}); - // Users could save TRT cache file to diskas follow. + // Users could save TRT cache file to disk as follow. // det_option.SetTrtCacheFile(det_model_dir + sep + "det_trt_cache.trt"); // cls_option.SetTrtCacheFile(cls_model_dir + sep + "cls_trt_cache.trt"); // rec_option.SetTrtCacheFile(rec_model_dir + sep + "rec_trt_cache.trt"); From 7a447e11cb8fa194b6e998c1ee14f747e78e968f Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Mon, 21 Nov 2022 07:14:17 +0000 Subject: [PATCH 44/52] Fix PPOCRv2 python example --- examples/vision/ocr/PP-OCRv2/python/infer.py | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/examples/vision/ocr/PP-OCRv2/python/infer.py b/examples/vision/ocr/PP-OCRv2/python/infer.py index 1bb94eb7e6..f3274c9e46 100644 --- a/examples/vision/ocr/PP-OCRv2/python/infer.py +++ b/examples/vision/ocr/PP-OCRv2/python/infer.py @@ -96,6 +96,10 @@ def build_option(args): rec_params_file = os.path.join(args.rec_model, "inference.pdiparams") rec_label_file = args.rec_label_file +# 对于三个模型,均采用同样的部署配置 +# 用户也可根据自行需求分别配置 +runtime_option = build_option(args) + det_option = runtime_option cls_option = runtime_option rec_option = runtime_option From 1da1d074facab7c6c57c3ea97ee825e8765089c4 Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Tue, 22 Nov 2022 15:41:44 +0000 Subject: [PATCH 45/52] Fix PPOCR dynamic input shape bug --- examples/vision/ocr/PP-OCRv2/cpp/infer.cc | 16 ++++++--- examples/vision/ocr/PP-OCRv2/python/infer.py | 37 +++++++++++-------- examples/vision/ocr/PP-OCRv3/cpp/infer.cc | 18 ++++++---- examples/vision/ocr/PP-OCRv3/python/infer.py | 38 ++++++++++++-------- 4 files changed, 70 insertions(+), 39 deletions(-) diff --git a/examples/vision/ocr/PP-OCRv2/cpp/infer.cc b/examples/vision/ocr/PP-OCRv2/cpp/infer.cc index 2537c12bb9..f05015d86d 100644 --- a/examples/vision/ocr/PP-OCRv2/cpp/infer.cc +++ b/examples/vision/ocr/PP-OCRv2/cpp/infer.cc @@ -34,11 +34,12 @@ void InitAndInfer(const std::string& det_model_dir, const std::string& cls_model auto rec_option = option; // If use TRT backend, the dynamic shape will be set as follow. - det_option.SetTrtInputShape("x", {1, 3, 50, 50}, {1, 3, 640, 640}, - {1, 3, 1536, 1536}); - cls_option.SetTrtInputShape("x", {1, 3, 48, 10}, {1, 3, 48, 320}, {1, 3, 48, 1024}); - rec_option.SetTrtInputShape("x", {1, 3, 32, 10}, {1, 3, 32, 320}, - {1, 3, 32, 2304}); + // We recommend that users set the length and height of the detection model to a multiple of 32. + det_option.SetTrtInputShape("x", {1, 3, 64,64}, {1, 3, 640, 640}, + {1, 3, 960, 960}); + cls_option.SetTrtInputShape("x", {1, 3, 48, 10}, {10, 3, 48, 320}, {64, 3, 48, 1024}); + rec_option.SetTrtInputShape("x", {1, 3, 32, 10}, {10, 3, 32, 320}, + {64, 3, 32, 2304}); // Users could save TRT cache file to disk as follow. // det_option.SetTrtCacheFile(det_model_dir + sep + "det_trt_cache.trt"); @@ -103,6 +104,11 @@ int main(int argc, char* argv[]) { } else if (flag == 2) { option.UseGpu(); option.UseTrtBackend(); + } else if (flag == 3) { + option.UseGpu(); + option.UseTrtBackend(); + option.EnablePaddleTrtCollectShape(); + option.EnablePaddleToTrt(); } std::string det_model_dir = argv[1]; diff --git a/examples/vision/ocr/PP-OCRv2/python/infer.py b/examples/vision/ocr/PP-OCRv2/python/infer.py index f3274c9e46..67140ec71a 100644 --- a/examples/vision/ocr/PP-OCRv2/python/infer.py +++ b/examples/vision/ocr/PP-OCRv2/python/infer.py @@ -72,6 +72,12 @@ def build_option(args): assert args.device.lower( ) == "gpu", "TensorRT backend require inference on device GPU." option.use_trt_backend() + elif args.backend.lower() == "pptrt": + assert args.device.lower( + ) == "gpu", "Paddle-TensorRT backend require inference on device GPU." + option.use_trt_backend() + option.enable_paddle_trt_collect_shape() + option.enable_paddle_to_trt() elif args.backend.lower() == "ort": option.use_ort_backend() elif args.backend.lower() == "paddle": @@ -100,27 +106,30 @@ def build_option(args): # 用户也可根据自行需求分别配置 runtime_option = build_option(args) +# 当使用TRT时,分别给三个模型的runtime设置动态shape,并完成模型的创建. +# 注意: 需要在检测模型创建完成后,再设置分类模型的动态输入并创建分类模型, 识别模型同理. +# 如果用户想要自己改动检测模型的输入shape, 我们建议用户把检测模型的长和高设置为32的倍数. det_option = runtime_option -cls_option = runtime_option -rec_option = runtime_option - -# 当使用TRT时,分别给三个Runtime设置动态shape -det_option.set_trt_input_shape("x", [1, 3, 50, 50], [1, 3, 640, 640], - [1, 3, 1536, 1536]) -cls_option.set_trt_input_shape("x", [1, 3, 48, 10], [1, 3, 48, 320], - [1, 3, 48, 1024]) -rec_option.set_trt_input_shape("x", [1, 3, 32, 10], [1, 3, 32, 320], - [1, 3, 32, 2304]) - +det_option.set_trt_input_shape("x", [1, 3, 64, 64], [1, 3, 640, 640], + [1, 3, 960, 960]) # 用户可以把TRT引擎文件保存至本地 # det_option.set_trt_cache_file(args.det_model + "/det_trt_cache.trt") -# cls_option.set_trt_cache_file(args.cls_model + "/cls_trt_cache.trt") -# rec_option.set_trt_cache_file(args.rec_model + "/rec_trt_cache.trt") - det_model = fd.vision.ocr.DBDetector( det_model_file, det_params_file, runtime_option=det_option) + +cls_option = runtime_option +cls_option.set_trt_input_shape("x", [1, 3, 48, 10], [10, 3, 48, 320], + [64, 3, 48, 1024]) +# 用户可以把TRT引擎文件保存至本地 +# cls_option.set_trt_cache_file(args.cls_model + "/cls_trt_cache.trt") cls_model = fd.vision.ocr.Classifier( cls_model_file, cls_params_file, runtime_option=cls_option) + +rec_option = runtime_option +rec_option.set_trt_input_shape("x", [1, 3, 32, 10], [10, 3, 32, 320], + [64, 3, 32, 2304]) +# 用户可以把TRT引擎文件保存至本地 +# rec_option.set_trt_cache_file(args.rec_model + "/rec_trt_cache.trt") rec_model = fd.vision.ocr.Recognizer( rec_model_file, rec_params_file, rec_label_file, runtime_option=rec_option) diff --git a/examples/vision/ocr/PP-OCRv3/cpp/infer.cc b/examples/vision/ocr/PP-OCRv3/cpp/infer.cc index da950872b5..911b311e3c 100644 --- a/examples/vision/ocr/PP-OCRv3/cpp/infer.cc +++ b/examples/vision/ocr/PP-OCRv3/cpp/infer.cc @@ -33,12 +33,13 @@ void InitAndInfer(const std::string& det_model_dir, const std::string& cls_model auto cls_option = option; auto rec_option = option; - // If use TRT backend, the dynamic shape will be set as follow. - det_option.SetTrtInputShape("x", {1, 3, 50, 50}, {1, 3, 640, 640}, - {1, 3, 1536, 1536}); - cls_option.SetTrtInputShape("x", {1, 3, 48, 10}, {1, 3, 48, 320}, {1, 3, 48, 1024}); - rec_option.SetTrtInputShape("x", {1, 3, 48, 10}, {1, 3, 48, 320}, - {1, 3, 48, 2304}); + // If use TRT backend, the dynamic shape will be set as follow. + // We recommend that users set the length and height of the detection model to a multiple of 32. + det_option.SetTrtInputShape("x", {1, 3, 64,64}, {1, 3, 640, 640}, + {1, 3, 960, 960}); + cls_option.SetTrtInputShape("x", {1, 3, 48, 10}, {10, 3, 48, 320}, {64, 3, 48, 1024}); + rec_option.SetTrtInputShape("x", {1, 3, 48, 10}, {10, 3, 48, 320}, + {64, 3, 48, 2304}); // Users could save TRT cache file to disk as follow. // det_option.SetTrtCacheFile(det_model_dir + sep + "det_trt_cache.trt"); @@ -103,6 +104,11 @@ int main(int argc, char* argv[]) { } else if (flag == 2) { option.UseGpu(); option.UseTrtBackend(); + } else if (flag == 3) { + option.UseGpu(); + option.UseTrtBackend(); + option.EnablePaddleTrtCollectShape(); + option.EnablePaddleToTrt(); } std::string det_model_dir = argv[1]; diff --git a/examples/vision/ocr/PP-OCRv3/python/infer.py b/examples/vision/ocr/PP-OCRv3/python/infer.py index 43b9b630ca..e6e8dbd61a 100644 --- a/examples/vision/ocr/PP-OCRv3/python/infer.py +++ b/examples/vision/ocr/PP-OCRv3/python/infer.py @@ -15,6 +15,7 @@ import fastdeploy as fd import cv2 import os +import copy def parse_arguments(): @@ -72,6 +73,12 @@ def build_option(args): assert args.device.lower( ) == "gpu", "TensorRT backend require inference on device GPU." option.use_trt_backend() + elif args.backend.lower() == "pptrt": + assert args.device.lower( + ) == "gpu", "Paddle-TensorRT backend require inference on device GPU." + option.use_trt_backend() + option.enable_paddle_trt_collect_shape() + option.enable_paddle_to_trt() elif args.backend.lower() == "ort": option.use_ort_backend() elif args.backend.lower() == "paddle": @@ -100,27 +107,30 @@ def build_option(args): # 用户也可根据自行需求分别配置 runtime_option = build_option(args) +# 当使用TRT时,分别给三个模型的runtime设置动态shape,并完成模型的创建. +# 注意: 需要在检测模型创建完成后,再设置分类模型的动态输入并创建分类模型, 识别模型同理. +# 如果用户想要自己改动检测模型的输入shape, 我们建议用户把检测模型的长和高设置为32的倍数. det_option = runtime_option -cls_option = runtime_option -rec_option = runtime_option - -# 当使用TRT时,分别给三个Runtime设置动态shape -det_option.set_trt_input_shape("x", [1, 3, 50, 50], [1, 3, 640, 640], - [1, 3, 1536, 1536]) -cls_option.set_trt_input_shape("x", [1, 3, 48, 10], [1, 3, 48, 320], - [1, 3, 48, 1024]) -rec_option.set_trt_input_shape("x", [1, 3, 48, 10], [1, 3, 48, 320], - [1, 3, 48, 2304]) - +det_option.set_trt_input_shape("x", [1, 3, 64, 64], [1, 3, 640, 640], + [1, 3, 960, 960]) # 用户可以把TRT引擎文件保存至本地 # det_option.set_trt_cache_file(args.det_model + "/det_trt_cache.trt") -# cls_option.set_trt_cache_file(args.cls_model + "/cls_trt_cache.trt") -# rec_option.set_trt_cache_file(args.rec_model + "/rec_trt_cache.trt") - det_model = fd.vision.ocr.DBDetector( det_model_file, det_params_file, runtime_option=det_option) + +cls_option = runtime_option +cls_option.set_trt_input_shape("x", [1, 3, 48, 10], [10, 3, 48, 320], + [64, 3, 48, 1024]) +# 用户可以把TRT引擎文件保存至本地 +# cls_option.set_trt_cache_file(args.cls_model + "/cls_trt_cache.trt") cls_model = fd.vision.ocr.Classifier( cls_model_file, cls_params_file, runtime_option=cls_option) + +rec_option = runtime_option +rec_option.set_trt_input_shape("x", [1, 3, 48, 10], [10, 3, 48, 320], + [64, 3, 48, 2304]) +# 用户可以把TRT引擎文件保存至本地 +# rec_option.set_trt_cache_file(args.rec_model + "/rec_trt_cache.trt") rec_model = fd.vision.ocr.Recognizer( rec_model_file, rec_params_file, rec_label_file, runtime_option=rec_option) From 06eedd6226ec05e03bbcee1aff5be76f48c28098 Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Wed, 23 Nov 2022 01:17:27 +0000 Subject: [PATCH 46/52] Remove useless code --- examples/vision/ocr/PP-OCRv3/python/infer.py | 1 - 1 file changed, 1 deletion(-) diff --git a/examples/vision/ocr/PP-OCRv3/python/infer.py b/examples/vision/ocr/PP-OCRv3/python/infer.py index e6e8dbd61a..8baa97ea97 100644 --- a/examples/vision/ocr/PP-OCRv3/python/infer.py +++ b/examples/vision/ocr/PP-OCRv3/python/infer.py @@ -15,7 +15,6 @@ import fastdeploy as fd import cv2 import os -import copy def parse_arguments(): From d57b18f3a0b948f99e4933e8ed5b63aa56e1f095 Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Thu, 24 Nov 2022 13:15:39 +0000 Subject: [PATCH 47/52] Fix PPOCR bug --- fastdeploy/vision/ocr/ppocr/ppocr_v2.cc | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/fastdeploy/vision/ocr/ppocr/ppocr_v2.cc b/fastdeploy/vision/ocr/ppocr/ppocr_v2.cc index e6e89299b6..daa40692d4 100755 --- a/fastdeploy/vision/ocr/ppocr/ppocr_v2.cc +++ b/fastdeploy/vision/ocr/ppocr/ppocr_v2.cc @@ -93,17 +93,19 @@ bool PPOCRv2::BatchPredict(const std::vector& images, std::vector* text_ptr = &ocr_result.text; std::vector* rec_scores_ptr = &ocr_result.rec_scores; - if (!classifier_->BatchPredict(image_list, cls_labels_ptr, cls_scores_ptr)) { - FDERROR << "There's error while recognizing image in PPOCR." << std::endl; - return false; - }else{ - for (size_t i_img = 0; i_img < image_list.size(); ++i_img) { - if(cls_labels_ptr->at(i_img) % 2 == 1 && cls_scores_ptr->at(i_img) > classifier_->postprocessor_.cls_thresh_) { - cv::rotate(image_list[i_img], image_list[i_img], 1); + if (nullptr != classifier_){ + if (!classifier_->BatchPredict(image_list, cls_labels_ptr, cls_scores_ptr)) { + FDERROR << "There's error while recognizing image in PPOCR." << std::endl; + return false; + }else{ + for (size_t i_img = 0; i_img < image_list.size(); ++i_img) { + if(cls_labels_ptr->at(i_img) % 2 == 1 && cls_scores_ptr->at(i_img) > classifier_->postprocessor_.cls_thresh_) { + cv::rotate(image_list[i_img], image_list[i_img], 1); + } } } } - + if (!recognizer_->BatchPredict(image_list, text_ptr, rec_scores_ptr)) { FDERROR << "There's error while recognizing image in PPOCR." << std::endl; return false; From e2f97102c8e81bff1fc0dd21391221191a6bfe22 Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Tue, 29 Nov 2022 01:53:59 +0000 Subject: [PATCH 48/52] Remove useless comments in PaddleSeg example --- .../paddleseg/quantize/cpp/README.md | 2 +- .../paddleseg/quantize/cpp/infer.cc | 24 ------------------- 2 files changed, 1 insertion(+), 25 deletions(-) diff --git a/examples/vision/segmentation/paddleseg/quantize/cpp/README.md b/examples/vision/segmentation/paddleseg/quantize/cpp/README.md index 34b98790f4..0fbaac173d 100644 --- a/examples/vision/segmentation/paddleseg/quantize/cpp/README.md +++ b/examples/vision/segmentation/paddleseg/quantize/cpp/README.md @@ -1,5 +1,5 @@ # PaddleSeg 量化模型 C++部署示例 -本目录下提供的`infer.cc`,可以帮助用户快速完成PaddleSeg量化模型在CPU/GPU上的部署推理加速. +本目录下提供的`infer.cc`,可以帮助用户快速完成PaddleSeg量化模型在CPU上的部署推理加速. ## 部署准备 ### FastDeploy环境准备 diff --git a/examples/vision/segmentation/paddleseg/quantize/cpp/infer.cc b/examples/vision/segmentation/paddleseg/quantize/cpp/infer.cc index 2f3d53e794..2611e2456a 100644 --- a/examples/vision/segmentation/paddleseg/quantize/cpp/infer.cc +++ b/examples/vision/segmentation/paddleseg/quantize/cpp/infer.cc @@ -43,28 +43,6 @@ void InitAndInfer(const std::string& model_dir, const std::string& image_file, } -// int main(int argc, char* argv[]) { -// if (argc < 3) { -// std::cout -// << "Usage: infer_demo path/to/model_dir path/to/image run_option, " -// "e.g ./infer_model ./ppseg_model_dir ./test.jpeg 0" -// << std::endl; -// std::cout << "The data type of run_option is int, 0: run with cpu; 1: run " -// "with gpu; 2: run with gpu and use tensorrt backend." -// << std::endl; -// return -1; -// } - -// fastdeploy::RuntimeOption option; -// option.UseCpu(); -// option.UsePaddleInferBackend(); -// std::cout<<"Xyy-debug, enable Paddle Backend==!"; - -// std::string model_dir = argv[1]; -// std::string test_image = argv[2]; -// InitAndInfer(model_dir, test_image, option); -// return 0; -// } int main(int argc, char* argv[]) { if (argc < 4) { @@ -86,11 +64,9 @@ int main(int argc, char* argv[]) { if (flag == 0) { option.UseCpu(); option.UseOrtBackend(); - std::cout<<"Use ORT!"< Date: Tue, 29 Nov 2022 06:38:20 +0000 Subject: [PATCH 49/52] Fix quantize docs readme --- docs/cn/quantize.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/cn/quantize.md b/docs/cn/quantize.md index 7717176f63..57f5837d8a 100644 --- a/docs/cn/quantize.md +++ b/docs/cn/quantize.md @@ -27,7 +27,7 @@ FastDeploy基于PaddleSlim的Auto Compression Toolkit(ACT), 给用户提供了 ### 使用FastDeploy一键模型自动化压缩工具来量化模型 FastDeploy基于PaddleSlim的Auto Compression Toolkit(ACT), 给用户提供了一键模型自动化压缩的工具,请参考如下文档进行一键模型自动化压缩。 -- [FastDeploy 一键模型自动化压缩](../../tools/auto_compression/) +- [FastDeploy 一键模型自动化压缩](../../tools/common_tools/auto_compression/) 当用户获得产出的压缩模型之后,即可以使用FastDeploy来部署压缩模型。 From 05cb9c71b8fd8b223f5a14c986cbe54a6641ad39 Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Wed, 30 Nov 2022 06:14:06 +0000 Subject: [PATCH 50/52] Fix PP-OCRv2 readme --- examples/vision/ocr/PP-OCRv2/cpp/README.md | 2 +- examples/vision/ocr/PP-OCRv2/python/README.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/examples/vision/ocr/PP-OCRv2/cpp/README.md b/examples/vision/ocr/PP-OCRv2/cpp/README.md index afc35d50ba..965ece7167 100644 --- a/examples/vision/ocr/PP-OCRv2/cpp/README.md +++ b/examples/vision/ocr/PP-OCRv2/cpp/README.md @@ -26,7 +26,7 @@ tar -xvf ch_PP-OCRv2_det_infer.tar wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar tar -xvf ch_ppocr_mobile_v2.0_cls_infer.tar -wgethttps://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar +wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar tar -xvf ch_PP-OCRv2_rec_infer.tar wget https://gitee.com/paddlepaddle/PaddleOCR/raw/release/2.6/doc/imgs/12.jpg diff --git a/examples/vision/ocr/PP-OCRv2/python/README.md b/examples/vision/ocr/PP-OCRv2/python/README.md index a846f19c0f..89e5fc0738 100644 --- a/examples/vision/ocr/PP-OCRv2/python/README.md +++ b/examples/vision/ocr/PP-OCRv2/python/README.md @@ -16,7 +16,7 @@ tar -xvf ch_PP-OCRv2_det_infer.tar wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar tar -xvf ch_ppocr_mobile_v2.0_cls_infer.tar -wgethttps://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar +wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar tar -xvf ch_PP-OCRv2_rec_infer.tar wget https://gitee.com/paddlepaddle/PaddleOCR/raw/release/2.6/doc/imgs/12.jpg From 5b9ab5107fcf2f577c4718725672f354b1facfc8 Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Thu, 1 Dec 2022 06:55:07 +0000 Subject: [PATCH 51/52] Modify dynamic shape in PP-OCRv2 example --- examples/vision/ocr/PP-OCRv2/cpp/infer.cc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/examples/vision/ocr/PP-OCRv2/cpp/infer.cc b/examples/vision/ocr/PP-OCRv2/cpp/infer.cc index f05015d86d..7bac320d51 100644 --- a/examples/vision/ocr/PP-OCRv2/cpp/infer.cc +++ b/examples/vision/ocr/PP-OCRv2/cpp/infer.cc @@ -37,9 +37,9 @@ void InitAndInfer(const std::string& det_model_dir, const std::string& cls_model // We recommend that users set the length and height of the detection model to a multiple of 32. det_option.SetTrtInputShape("x", {1, 3, 64,64}, {1, 3, 640, 640}, {1, 3, 960, 960}); - cls_option.SetTrtInputShape("x", {1, 3, 48, 10}, {10, 3, 48, 320}, {64, 3, 48, 1024}); + cls_option.SetTrtInputShape("x", {1, 3, 48, 10}, {10, 3, 48, 320}, {32, 3, 48, 1024}); rec_option.SetTrtInputShape("x", {1, 3, 32, 10}, {10, 3, 32, 320}, - {64, 3, 32, 2304}); + {32, 3, 32, 2304}); // Users could save TRT cache file to disk as follow. // det_option.SetTrtCacheFile(det_model_dir + sep + "det_trt_cache.trt"); From 6caa8bfc2de1603dca1f17e58af4e0523dd6079d Mon Sep 17 00:00:00 2001 From: yunyaoXYY Date: Fri, 2 Dec 2022 02:49:03 +0000 Subject: [PATCH 52/52] Modify TRT dynamic shape for PP-OCRv2 --- examples/vision/ocr/PP-OCRv2/python/infer.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/examples/vision/ocr/PP-OCRv2/python/infer.py b/examples/vision/ocr/PP-OCRv2/python/infer.py index 0dce3f95c0..af915143af 100644 --- a/examples/vision/ocr/PP-OCRv2/python/infer.py +++ b/examples/vision/ocr/PP-OCRv2/python/infer.py @@ -119,7 +119,7 @@ def build_option(args): cls_option = runtime_option cls_option.set_trt_input_shape("x", [1, 3, 48, 10], [10, 3, 48, 320], - [64, 3, 48, 1024]) + [32, 3, 48, 1024]) # 用户可以把TRT引擎文件保存至本地 # cls_option.set_trt_cache_file(args.cls_model + "/cls_trt_cache.trt") cls_model = fd.vision.ocr.Classifier( @@ -127,7 +127,7 @@ def build_option(args): rec_option = runtime_option rec_option.set_trt_input_shape("x", [1, 3, 32, 10], [10, 3, 32, 320], - [64, 3, 32, 2304]) + [32, 3, 32, 2304]) # 用户可以把TRT引擎文件保存至本地 # rec_option.set_trt_cache_file(args.rec_model + "/rec_trt_cache.trt") rec_model = fd.vision.ocr.Recognizer(