Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline模式下,设置为 线程模型(is_thread_op: True)之后,在有多个请求的情况下,predict无法利用CPU所有核。 #985

Closed
zhfkt opened this issue Jan 21, 2021 · 0 comments
Assignees
Labels
question Further information is requested

Comments

@zhfkt
Copy link

zhfkt commented Jan 21, 2021

直接按照官方的这个例子跑Serving pipeline的OCR。

https://github.com/PaddlePaddle/Serving/blob/v0.4.0/python/examples/pipeline/ocr/README_CN.md

在有多个请求的情况下,当设置为进程模型(在config.yml中is_thread_op: False)的时候可以利用CPU所有核心去predict。但是设置为 线程模型(is_thread_op: True)之后,predict无法利用CPU所有核去predict,只能利用local predictor中的thread_num个CPU核心数去predict

复现步骤:

  1. 先按照 https://github.com/PaddlePaddle/Serving/blob/v0.4.0/python/examples/pipeline/ocr/README_CN.md 配置。

  2. 设置config.yml中的字段
    a. worker_num: 10
    b. op的det和rec下 concurrency: 8

  3. https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/models_list.md 下载大模型 https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_infer.tar (下载大模型的原因是一张图片predict可以跑30秒左右)

  4. 通过 https://github.com/PaddlePaddle/PaddleOCR/blob/develop/deploy/pdserving/inference_to_serving.py 脚本转换模型到Serving格式

  5. 设置config.yml中 rec模型路径 op -> rec下的model_config为新的转换后Serving格式的大模型路径
    model_config: inference/ch_ppocr_server_v1.1_rec_infer/serving_server_dir

  6. 进程模式下(config.yml设置 is_thread_op: False),在两个独立窗口中同时执行 脚本 python pipeline_http_client.py(模拟同时有两个请求),会发现cpu可以利用4个核心去跑(因为local predictor中的thread_num默认值为2,现在有两个请求,生成两个进程 -> 2个进程*2个thread = 会利用CPU4个核心)
    下图中显示了,在htop中多进程会利用4个CPU核心跑满100%:

2

  1. 但是在线程模式下(config.yml设置 is_thread_op: True),在两个独立窗口中同时执行 脚本 python pipeline_http_client.py(模拟同时有两个请求),会发现cpu只能用2个核心去跑
    下图中显示了,在htop中多线程模式只能利用2个(local predictor中的thread_num默认值)CPU核心跑满100%:

1

观察发现,在线程模式下,只能利用local predictor中的thread_num个CPU核心数去predict,在有多个请求的情况下(同时发送一个相同的request,在这里是在两个窗口中同时执行 python pipeline_http_client.py),无法利用CPU所有核。仿佛参数 concurrency 和 worker_num 没有使用一样。

Pls review whether it is a bug or not.

Thank you !

@TeslaZhao TeslaZhao self-assigned this Jan 22, 2021
@TeslaZhao TeslaZhao added the question Further information is requested label Jan 22, 2021
@paddle-bot paddle-bot bot closed this as completed Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants