Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

running unit test parallely with a same GPU on Linux/windows GPU #29523

Merged
merged 1 commit into from
Jan 13, 2021

Conversation

zhwesky2010
Copy link
Contributor

@zhwesky2010 zhwesky2010 commented Dec 9, 2020

PR types

Others

PR changes

Others

Describe

GPU利用率较低,因此提高GPU与CPU利用率,降低总单测运行时间。

  1. 通过离线脚本,统计分析当前所有单测运行时的峰值显存
  2. 通过峰值显存设计出CPU单测名单379个,其分配显存0;
  3. 通过Paddle/tools/parallel_unittests_rule.py来固定单测名单,CPU单测在Linux/Windows GPU上8个并行;
  4. 新增单测,由于需要依赖离线采集的名单,因此不使用并行策略,自动独占资源;

收益

对并行的379个单测,进行前后时间对比:

  1. Linux-GPU(PR_CI_Coverage): 【并行前:670s】 -> 【并行后:150s】,减少8.6min
  2. Linux-GPU(PR_CI_Py3): 【并行前:832s】 -> 【并行后:153.9s】,减少11.3min
  3. Windows-GPU(PR_CI_Windows): 【并行前:790s】 -> 【并行后:233.27s】,减少9.3min

离线数据收集方法

  1. 起一个ctest进程,该进程将1个单测连续地串行地运行5次
  2. 记录该ctest进程的PID,轮询查询该PID进程是否结束
  3. 该PID进程持续期间,循环查询Nvidia-smi状态,记录该ctest进程持续期间的 峰值占用显存、峰值显存利用率、峰值GPU利用率 指标
  4. 该PID进程退出后,结束查询
  5. 对Linux/Windows所有GPU个单测,按上述步骤统计一次

@paddle-bot-old
Copy link

paddle-bot-old bot commented Dec 9, 2020

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@zhwesky2010 zhwesky2010 force-pushed the parallel_test_win branch 2 times, most recently from dd1d098 to cfaf040 Compare December 10, 2020 03:40
@zhwesky2010 zhwesky2010 force-pushed the parallel_test_win branch 2 times, most recently from 9ccf6e5 to 22916a6 Compare December 14, 2020 08:21
@zhwesky2010 zhwesky2010 force-pushed the parallel_test_win branch 4 times, most recently from ccdd8ba to 85008af Compare December 14, 2020 14:24
@zhwesky2010 zhwesky2010 force-pushed the parallel_test_win branch 3 times, most recently from d283cf3 to d3b7d9b Compare January 5, 2021 16:47
@zhwesky2010 zhwesky2010 force-pushed the parallel_test_win branch 2 times, most recently from d3b7d9b to 8110b12 Compare January 6, 2021 03:24
@zhwesky2010 zhwesky2010 changed the title running unit test sigle GPU parallely on Linux/windows GPU running GPU unit test parallely with a same GPU on Linux/windows Jan 13, 2021
@zhwesky2010 zhwesky2010 requested a review from luotao1 January 13, 2021 02:53
Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhwesky2010 zhwesky2010 changed the title running GPU unit test parallely with a same GPU on Linux/windows running unit test parallely with a same GPU on Linux/windows GPU Jan 13, 2021
@zhwesky2010 zhwesky2010 merged commit b1d8ff4 into PaddlePaddle:develop Jan 13, 2021
XieYunshen added a commit to XieYunshen/Paddle that referenced this pull request Feb 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants