[Inference] Fix mem release problem #32654

jiweibo · 2021-04-28T11:40:44Z

PR types

Others

PR changes

Others

Describe

目前Predictor释放的时候，发现有一瞬间会占用所有显卡的显存。

定位到该问题在于 Predictor内部的scope_在析构的时候会遍历所有的卡，依次调用memory::Release(place)接口，该接口需调用cuda底层函数，所以会申请cuda handle等，占用显存。

    scope_.reset(new paddle::framework::Scope(), [](framework::Scope *scope) {
      delete scope;
#if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP)
      for (int dev_id = 0; dev_id < paddle::platform::GetCUDADeviceCount();
           ++dev_id) {
        memory::Release(platform::CUDAPlace(dev_id));
      }
#endif
#ifdef PADDLE_WITH_XPU
      for (int dev_id = 0; dev_id < paddle::platform::GetXPUDeviceCount();
           ++dev_id) {
        memory::Release(platform::XPUPlace(dev_id));
      }
#endif
      memory::Release(platform::CPUPlace());
    });

在Predictor Clone()接口调用后，Scope_的声明周期可能比Predictor要长，所以无法获取用户指定的显卡即device_id，该pr的修改会导致未定义的问题：https://github.com/PaddlePaddle/Paddle/pull/28409/files#diff-f6feda974e038d722114830a39bea985fd814e28bd86bcd33248aece3c3181a4R178

所以在此处，我们去除全部遍历显卡，依次释放的逻辑，恢复原有代码逻辑，这样会导致，权重所占据的显存最后会归还显存池，但不会压缩显存池的大小。

旧有问题现象：
测试代码：https://github.com/PaddlePaddle/Paddle-Inference-Demo/tree/master/c%2B%2B/test/shrink_memory
1、初始化Predictor后（config设置initGpu为500M），显存为780M(handle + 权重等)
2、batch_size为100运行一次，显存为4418M
3、调用ShrinkMemory接口后，显存占用为1292M（handle + 权重 + 其它?）
4、batch_size为2运行一次，显存占用为1728M
5、Predictor析构后，0卡显存占用为792M，其它卡占用280M。

更改代码逻辑后，

5、Predictor析构后，显存占用1292M，但不影响其它卡显存占用。

paddle-bot-old · 2021-04-28T11:40:47Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

CLAassistant · 2021-04-28T11:40:52Z

All committers have signed the CLA.

fix mem release error.

321328e

jiweibo force-pushed the fix_mem_release branch from 2a0fc3b to 321328e Compare April 28, 2021 11:44

jiweibo requested a review from Shixiaowei02 April 28, 2021 11:47

jiweibo mentioned this pull request Apr 28, 2021

[Cherry-pick] Fix mem release error. #32655

Merged

Shixiaowei02 approved these changes Apr 28, 2021

View reviewed changes

jiweibo merged commit dec8ab8 into PaddlePaddle:develop Apr 29, 2021

jiweibo deleted the fix_mem_release branch April 29, 2021 02:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Inference] Fix mem release problem #32654

[Inference] Fix mem release problem #32654

jiweibo commented Apr 28, 2021

paddle-bot-old bot commented Apr 28, 2021

CLAassistant commented Apr 28, 2021 •

edited

Loading

[Inference] Fix mem release problem #32654

[Inference] Fix mem release problem #32654

Conversation

jiweibo commented Apr 28, 2021

PR types

PR changes

Describe

paddle-bot-old bot commented Apr 28, 2021

CLAassistant commented Apr 28, 2021 • edited Loading

CLAassistant commented Apr 28, 2021 •

edited

Loading