Skip to content

Latest commit

 

History

History

ppocr

PaddleOCR (PPOCR)

PaddleOCR provides multilingual OCR based on the PaddlePaddle lightweight OCR system, supporting recognition of 80+ languages.

Mutiple examples are provided as the following:

  1. PPOCR Detect - Takes an image and detects areas of text.
  2. PPOCR Recognise - Takes an area of text and performs OCR on it.
  3. PPOCR System - Combines both Detect and Recognise.

Example data

Make sure you have downloaded the data files first for the examples. You only need to do this once for all examples.

cd example/
git clone https://github.com/swdee/go-rknnlite-data.git data

PPOCR Detect

Usage

Run the PPOCR Recognition example.

cd example/ppocr
go run detect.go common.go

This will result in the output of:

Driver Version: 0.8.2, API Version: 1.6.0 (9a7b5d24c@2023-12-13T17:31:11)
Model Input Number: 1, Ouput Number: 1
Input tensors:
  index=0, name=x, n_dims=4, dims=[1, 480, 480, 3], n_elems=691200, size=691200, fmt=NHWC, type=INT8, qnt_type=AFFINE, zp=-14, scale=0.018658
Output tensors:
  index=0, name=sigmoid_0.tmp_0, n_dims=4, dims=[1, 1, 480, 480], n_elems=230400, size=230400, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003922
Model first run speed: inference=27.746374ms, post processing=2.968795ms, total time=30.715169ms
[0]: [(27, 459), (136, 459), (136, 478), (27, 478)] 0.991298
[1]: [(27, 428), (371, 427), (371, 444), (27, 445)] 0.912538
[2]: [(28, 398), (361, 397), (361, 413), (28, 414)] 0.953752
[3]: [(368, 368), (476, 368), (476, 388), (368, 388)] 0.989887
[4]: [(27, 365), (282, 365), (282, 384), (27, 384)] 0.975041
[5]: [(26, 334), (342, 334), (342, 352), (26, 352)] 0.956719
[6]: [(26, 303), (253, 303), (253, 320), (26, 320)] 0.974053
[7]: [(25, 270), (179, 270), (179, 289), (25, 289)] 0.990559
[8]: [(26, 240), (242, 240), (242, 259), (26, 259)] 0.986159
[9]: [(413, 233), (429, 233), (429, 305), (413, 305)] 0.970001
[10]: [(26, 209), (235, 209), (235, 227), (26, 227)] 0.995540
[11]: [(26, 178), (300, 179), (300, 196), (26, 195)] 0.991055
[12]: [(28, 143), (280, 144), (280, 164), (28, 163)] 0.974824
[13]: [(27, 112), (333, 113), (333, 135), (27, 134)] 0.899956
[14]: [(26, 81), (172, 81), (172, 103), (26, 103)] 0.994091
[15]: [(28, 38), (302, 39), (302, 71), (28, 70)] 0.960498
Saved image to ../data/ppocr-det-out.png
Benchmark time=3.540086219s, count=100, average total time=35.400862ms
done

Bounding boxes have been drawn around detected text areas.

detect-out.jpg

Background

This PPOCR Detect example is a Go conversion of the C API example.

PPOCR Recognise

Usage

Run the PPOCR Recognition example.

cd example/ppocr
go run recognise.go common.go

This will result in the output of:

Driver Version: 0.8.2, API Version: 1.6.0 (9a7b5d24c@2023-12-13T17:31:11)
Model Input Number: 1, Ouput Number: 1
Input tensors:
  index=0, name=x, n_dims=4, dims=[1, 48, 320, 3], n_elems=46080, size=92160, fmt=NHWC, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
Output tensors:
  index=0, name=softmax_11.tmp_0, n_dims=3, dims=[1, 40, 6625, 0], n_elems=265000, size=530000, fmt=UNDEFINED, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
Model first run speed: inference=26.486118ms, post processing=461.404µs, total time=26.947522ms
Recognize result: JOINT, score=0.71
Benchmark time=2.528564774s, count=100, average total time=25.285647ms
done

Sample images input and text detected.

Input Image Text Recognised Confidence Score
joint.png JOINT 0.71
region.jpg 浙G·Z6825 0.65
cn-text.png 中华老字号 0.71
mozzarella.jpg MOZZARELLA - 188 0.67

Other Language Models

The Model ppocrv4_rec-rk3588.rknn provided in this example has only been trained on English alphabet and Chinese characters. For other languages see the vendors documentation for downloading these models. These instructions are based on those here.

Download the inference model for the language you require, in this example we download the Japanese model.

wget https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/japan_PP-OCRv3_rec_infer.tar 

Unpack the model file.

tar -xvf japan_PP-OCRv3_rec_infer.tar

Download the dictionary file from this directory.

wget https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/main/ppocr/utils/dict/japan_dict.txt

Then convert this model to ONNX format using Paddle2ONNX.

Install Paddle2ONNX

pip3 install paddlepaddle
pip3 install paddle2onnx

Convert to ONNX

paddle2onnx --model_dir ./japan_PP-OCRv3_rec_infer/ \
--model_filename inference.pdmodel \
--params_filename inference.pdiparams \
--save_file ./japanv3-rec.onnx

Change the Input shape parameters.

python3 -m paddle2onnx.optimize --input_model japanv3-rec.onnx \
--output_model japanv3-rec.onnx --input_shape_dict "{'x':[1,3,48,320]}"

Download the export script to convert the ONNX file to RKNN.

wget https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/main/deploy/fastdeploy/rockchip/rknpu2_tools/export.py

Download the export script config file.

wget https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/main/deploy/fastdeploy/rockchip/rknpu2_tools/config/ppocrv3_rec.yaml

Edit the config file and modify the model_path to point to our ONNX input model and set output_folder to current directory.

model_path: ./japanv3-rec.onnx

output_folder: "./"

Compile ONNX to RKNN which creates the file japanv3-rec_rk3588_unquantized.rknn.

python3 export.py --config_path ppocrv3_rec.yaml --target_platform rk3588

Edit the character keys file japan_dict.txt as the number of characters in this file is not the same as those trained on the model (for some unknown reason). Make the following changes;

  1. Add the word blank at the top of the on line 1.
  2. Scroll to end of file and replace the last line which is a single space character with the word __space__.
  3. Add on a new line after the __space__ character, the word @dummy.

You can now use the compiled RKNN and dictionary keys file to perform OCR on an image.

go run recognise.go common.go -k japan_dict.txt -m japanv3-rec_rk3588_unquantized.rknn -i jptext.jpg
Input Image Text Recognised Confidence Score
jptext.jpg つま味のある 0.76

Whilst the text on the image above is some what accurate I found the Japanese version to be rather poor. It does not do well with Horizontally written text or hand written kana. Some others also found this here and here.

Whist the PaddleOCR is a good project it has become unmaintained and dated, there is a discussion on how to improve the situation. Hopefully the other languages get updates to v4 models in the future.

Background

This PPOCR Recognise example is a Go conversion of the C API example.

PPOCR System

Usage

Run the PPOCR Recognition example.

cd example/ppocr
go run system.go common.go

This will result in the output of:

Driver Version: 0.8.2, API Version: 1.6.0 (9a7b5d24c@2023-12-13T17:31:11)
Model Input Number: 1, Ouput Number: 1
Input tensors:
  index=0, name=x, n_dims=4, dims=[1, 48, 320, 3], n_elems=46080, size=92160, fmt=NHWC, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
Output tensors:
  index=0, name=softmax_11.tmp_0, n_dims=3, dims=[1, 40, 6625, 0], n_elems=265000, size=530000, fmt=UNDEFINED, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
Driver Version: 0.8.2, API Version: 1.6.0 (9a7b5d24c@2023-12-13T17:31:11)
Model Input Number: 1, Ouput Number: 1
Input tensors:
  index=0, name=x, n_dims=4, dims=[1, 480, 480, 3], n_elems=691200, size=691200, fmt=NHWC, type=INT8, qnt_type=AFFINE, zp=-14, scale=0.018658
Output tensors:
  index=0, name=sigmoid_0.tmp_0, n_dims=4, dims=[1, 1, 480, 480], n_elems=230400, size=230400, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003922
[0]: [(28, 38), (302, 39), (302, 71), (28, 70)] 0.960498
Recognize result: 纯臻营养护发素, score=0.71
[1]: [(26, 81), (172, 81), (172, 103), (26, 103)] 0.994091
Recognize result: 产品信息/参数, score=0.71
[2]: [(27, 112), (333, 113), (333, 135), (27, 134)] 0.899956
Recognize result: (45元/每公斤,100公斤起订), score=0.69
[3]: [(28, 143), (280, 144), (280, 164), (28, 163)] 0.974824
Recognize result: 每瓶22元,1000瓶起订), score=0.70
[4]: [(26, 178), (300, 179), (300, 196), (26, 195)] 0.991055
Recognize result: (品牌】:代加工方式/OEMODM, score=0.67
[5]: [(26, 209), (235, 209), (235, 227), (26, 227)] 0.995540
Recognize result: 【品名】:纯臻营养护发素, score=0.70
[6]: [(26, 240), (242, 240), (242, 259), (26, 259)] 0.986159
Recognize result: 【产品编号】:YM-X-3011, score=0.71
[7]: [(413, 233), (429, 233), (429, 305), (413, 305)] 0.970001
Recognize result: ODMOEM, score=0.71
[8]: [(25, 270), (179, 270), (179, 289), (25, 289)] 0.990559
Recognize result: 【净含量】:220ml, score=0.71
[9]: [(26, 303), (253, 303), (253, 320), (26, 320)] 0.974053
Recognize result: 【适用人群】:适合所有肤质, score=0.71
[10]: [(26, 334), (342, 334), (342, 352), (26, 352)] 0.956719
Recognize result: (主要成分》:皖蜡硬脂醇、燕麦-葡聚, score=0.59
[11]: [(27, 365), (282, 365), (282, 384), (27, 384)] 0.975041
Recognize result: 糖、椰油酰胺丙基甜菜碱、泛酸, score=0.68
[12]: [(368, 368), (476, 368), (476, 388), (368, 388)] 0.989887
Recognize result: (成品包材), score=0.71
[13]: [(28, 398), (361, 397), (361, 413), (28, 414)] 0.953752
Recognize result: 干型功能:可降较以发确员,从而大有, score=0.41
[14]: [(27, 428), (371, 427), (371, 444), (27, 445)] 0.912538
Recognize result: 即时语久改基发光器的双果,给干强的头, score=0.47
[15]: [(27, 459), (136, 459), (136, 478), (27, 478)] 0.991298
Recognize result: 发足够的滋养, score=0.71
Run speed:
  Detect processing=32.056505ms
  Recognise processing=362.731907ms
  Total time=394.788412ms
done

As can be seen all of the text area's from the image processed at the PPOCR Detect stage have had OCR applied using PPOCR Recognise. Displayed are the Chinese and English characters read from the OCR process.

Background

This PPOCR System example is a Go conversion of the C API example.