Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

多线程加速GPU推理 #10433

Closed
hanliangwei opened this issue Jul 19, 2023 · 11 comments
Closed

多线程加速GPU推理 #10433

hanliangwei opened this issue Jul 19, 2023 · 11 comments
Assignees
Labels

Comments

@hanliangwei
Copy link

hanliangwei commented Jul 19, 2023

请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem

  • 系统环境/System Environment:win11
  • 版本号/Version:Paddle: PaddleOCR:2.6 问题相关组件/Related components:

在ocr的gpu推理时,由于图过多,性能不能满足,所以想用多线程试试的,发现predictor不支持多线程,所以尝试了加锁方案以及创建多个predictor方案,发现性能并没有提升,不知道是为什么。 现在采用的是ch_PP-OCRv3_rec_infer里的模型加ppocr_keys_v1.txt,实际上需要识别的只有印刷体的数字和字母。请问下有什么优化方案呢?
image

@livingbody
Copy link
Contributor

使用 en_PP-OCRv3_rec_infer 模型

@livingbody
Copy link
Contributor

为什么不支持多线程?
我cpu测试支持啊。

@livingbody
Copy link
Contributor

from paddleocr import PaddleOCR
from pprint import pprint


import threading  
  
def worker(num):  
    """每个线程的工作内容"""  
    print('Worker', num)  
    default_engine = PaddleOCR(use_angle_cls=True, lang="ch",
                           use_gpu=False,
                           det_db_box_thresh=0,
                           det_db_thresh=0.1,  
                           det_db_unclip_ratio=2.0, # 调整此参数可能这张图片不漏行了,另一张图片又有漏行
                           cls_model_dir="d:/ocr/ch_ppocr_mobile_v2.0_cls_infer",
                           det_model_dir="d:/ocr/ch_PP-OCRv3_det_infer", 
                           rec_model_dir="d:/ocr/ch_PP-OCRv3_rec_infer")

    response = default_engine .ocr('2.png')
    print(f'thread{i}')
    # pprint(response)
  
threads = []  
for i in range(5):  
    # 创建线程并加入线程列表  
    t = threading.Thread(target=worker, args=(i,))  
    threads.append(t)  
    # 启动线程  
    t.start()  
  
# 等待所有线程结束  
for t in threads:  
    t.join()
    

@hanliangwei
Copy link
Author

更新
可能是我一些操作失误,现在将多线程跑起来了,然后也采用了英文模型,en_PP-OCRv3_rec_infer,发现采用英文模型后速度确实会有所提升,但是多线程部分与单线程跑起来没有差异
image

@ToddBear ToddBear added the good first issue Good for newcomers label Jul 20, 2023
@livingbody
Copy link
Contributor

更新 可能是我一些操作失误,现在将多线程跑起来了,然后也采用了英文模型,en_PP-OCRv3_rec_infer,发现采用英文模型后速度确实会有所提升,但是多线程部分与单线程跑起来没有差异 image

贴出来让俺学习学习。

@livingbody
Copy link
Contributor

图片老不显示看不到

@livingbody
Copy link
Contributor

不够快的话多开一些线程,做个对比就出来了!

@livingbody
Copy link
Contributor

把初始化放在外面

from paddleocr import PaddleOCR
from pprint import pprint


import threading  
default_engine = PaddleOCR(use_angle_cls=True, lang="ch",
                           use_gpu=False,
                           det_db_box_thresh=0,
                           det_db_thresh=0.1,  
                           det_db_unclip_ratio=2.0, # 调整此参数可能这张图片不漏行了,另一张图片又有漏行
                           cls_model_dir="d:/ocr/ch_ppocr_mobile_v2.0_cls_infer",
                           det_model_dir="d:/ocr/ch_PP-OCRv3_det_infer", 
                           rec_model_dir="d:/ocr/ch_PP-OCRv3_rec_infer")  
def worker(num):  
    """每个线程的工作内容"""  
    print('Worker', num)  


    response = default_engine .ocr('2.png')
    print(f'thread{i}')
    # pprint(response)
  
threads = []  
for i in range(5):  
    # 创建线程并加入线程列表  
    t = threading.Thread(target=worker, args=(i,))  
    threads.append(t)  
    # 启动线程  
    t.start()  
  
# 等待所有线程结束  
for t in threads:  
    t.join()
    

@EasyIsAllYouNeed
Copy link

把初始化放在外面

from paddleocr import PaddleOCR
from pprint import pprint


import threading  
default_engine = PaddleOCR(use_angle_cls=True, lang="ch",
                           use_gpu=False,
                           det_db_box_thresh=0,
                           det_db_thresh=0.1,  
                           det_db_unclip_ratio=2.0, # 调整此参数可能这张图片不漏行了,另一张图片又有漏行
                           cls_model_dir="d:/ocr/ch_ppocr_mobile_v2.0_cls_infer",
                           det_model_dir="d:/ocr/ch_PP-OCRv3_det_infer", 
                           rec_model_dir="d:/ocr/ch_PP-OCRv3_rec_infer")  
def worker(num):  
    """每个线程的工作内容"""  
    print('Worker', num)  


    response = default_engine .ocr('2.png')
    print(f'thread{i}')
    # pprint(response)
  
threads = []  
for i in range(5):  
    # 创建线程并加入线程列表  
    t = threading.Thread(target=worker, args=(i,))  
    threads.append(t)  
    # 启动线程  
    t.start()  
  
# 等待所有线程结束  
for t in threads:  
    t.join()
    

default_engine 可能不支持可重入,多线程情况下会异常。fastdeploy有多线程/多进程示例,建议用fd提供的方法
https://github.com/PaddlePaddle/FastDeploy/blob/develop/tutorials/multi_thread/python/pipeline/multi_thread_process_ocr.py

@UserWangZz
Copy link
Collaborator

该issue长时间未更新,暂将此issue关闭,如有需要可重新开启。

@freemedom
Copy link

ocr = PaddleOCR(use_angle_cls=True, lang="ch",
                show_log=False, use_gpu=True, use_mp=True)
with ThreadPoolExecutor(max_workers=4) as executor:
      result = ocr.ocr(filepath, cls=True)
      ...

确实可以提高速度,并且显存占用也增加了(1个线程时大概1GB,4线程大概3GB),但是会大概有1/10的ocr调用出现下面这个问题
(PreconditionNotMet) Tensor holds no memory. Call Tensor::mutable_data firstly.

PaddlePaddle/Paddle#56193

@PaddlePaddle PaddlePaddle locked and limited conversation to collaborators Nov 22, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

7 participants