Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用Serving預測,顯存異常增長 #1724

Closed
NatLee opened this issue Mar 17, 2022 · 2 comments
Closed

使用Serving預測,顯存異常增長 #1724

NatLee opened this issue Mar 17, 2022 · 2 comments
Assignees
Labels

Comments

@NatLee
Copy link

NatLee commented Mar 17, 2022

各位先進跟大佬好

我照着官方教程並使用docker部署模型,遇到一個問題,就是啓動服務放一天預測,發現顯存會異常增長

操作過程


我用docker-compose建立image並run一個可以使用轉換過的模型進行預測的container

FROM registry.baidubce.com/paddlepaddle/paddle:2.2.2-gpu-cuda10.2-cudnn7

RUN git clone https://github.com/PaddlePaddle/Serving
RUN bash Serving/tools/paddle_env_install.sh
RUN cd Serving && pip3 install -r python/requirements.txt

RUN pip3 install paddle-serving-client==0.8.2
RUN pip3 install paddle-serving-app==0.8.2
RUN pip3 install paddle-serving-server-gpu==0.8.2.post102
version: "3.2"
services:
  ernie_service:
    container_name: "ernie_service"
    runtime: "nvidia"
    build:
      context: .
      dockerfile: Dockerfile
    restart: always
    volumes:
        - /etc/localtime:/etc/localtime:ro
        - ./serving_server:/serving_server
    ports:
      - 9292:9292
    command: bash -c "python -m paddle_serving_server.serve --model /serving_server --port 9292 --gpu_id 0 --thread 10"

./serving_server資料夾內放的是可服務化的ERNIE-1.0情感分類模型(就是用官方教程導出的東西)

經過一段時間預測發現顯存使用量從原本的1~2 GB異常增長到7.6 GB

client.py的片段代碼如下

TOKENIZER = ErnieTokenizer.from_pretrained("checkpoints/model_100")
CLIEN_CONFIG_FILE = "./serving_client/serving_client_conf.prototxt"
PREDICT_SERVERS = ["127.0.0.1:9292"]
MAX_SEQ_LENGTH = 128

CLIENT = Client()
CLIENT.load_client_config(CLIEN_CONFIG_FILE)
CLIENT.connect(PREDICT_SERVERS)

input_ids, token_type_ids = batchify_fn(batch)
fetch_map = CLIENT.predict(
  feed={"input_ids": input_ids, "token_type_ids": token_type_ids},
  fetch=["save_infer_model/scale_0.tmp_1"],
  batch=True
)

Service端的就沒有代碼了,因爲是直接使用paddle_serving_server.serve啓動的


想請問是我姿勢不正確還是遇到什麼問題?

非常感謝!

@TeslaZhao TeslaZhao self-assigned this Mar 21, 2022
@TeslaZhao TeslaZhao added the SDK label Mar 21, 2022
@TeslaZhao
Copy link
Collaborator

Client端不会使用显存,显存是在Server端使用。推理过程框架会根据输入参数的batch和数据长度会额外分配一些显存用于计算和cache。建议开启 --ir_optim 会将多个OP合成Pass,或者通过编译源码设置更大的初始化显存数量

@TeslaZhao TeslaZhao added 显存 and removed SDK labels Mar 21, 2022
@NatLee
Copy link
Author

NatLee commented Mar 21, 2022

感謝解說
那麼,如果打開--ir_optim 的話,是針對單一batch還是多個來源的輸入?

@paddle-bot paddle-bot bot closed this as completed Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants