Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Parallel UT]Improve Parallel UT level on Windows/Linux #31377

Merged
merged 8 commits into from
Mar 31, 2021

Conversation

zhwesky2010
Copy link
Contributor

@zhwesky2010 zhwesky2010 commented Mar 2, 2021

PR types

Performance optimization

PR changes

Others

Describe

[Parallel UT]improve Parallel UT level on Windows/Linux.

It's follow up of #29523 .

通过分析单测资源数据,提高单测并发度,降低Linux/Windows GPU的单测运行时间。

测试数据:

- PR_CI_Windows:

  • Windows上效果明显,以Windows11机器(2080Ti显卡机器)作对比,约60min-> 48min,降低大概12min
  • 预估本次Windows降低10-12min,收益约15-20%,但还有1080Ti、1080等其他不同显卡与CPU配置,具体数据待观察;

- PR_CI_Py3、PR_CI_Covarage:

  • Linux没有Windows效果突出,分析原因是:
      1. Linux的并行已经很高了,例如Py3的基础并行就有4,再往上提高几倍,CPU的调度能力也是有限的,提升空间有限;而Windows的基础并行为1,提升空间更大;
      1. Linux的2卡、独占单测的总耗时近一半,本次被优化的很少,是由于之前做本地数据采集时,没有跑分布式,这里可能还有3-5min的优化空间,待进一步分析;
  • 预估本次Linux降低约5min,约Windows一半;

[专利一种提高深度学习框架单测运行效率的方案]整体收益结论:

  • 第1期【2020/01】:Windows、Linux整体降低8min,约14%收益

  • 第2期【2020/04】:Windows上线0.5周降低6.5min,本周还会降低约5min,约18%

总计

  • Windows 共计32%收益
  • Linux收益较为集中的多卡单测还未进行,加上这部分(约3-5min),预计Linux共计约20%收益

@paddle-bot-old
Copy link

paddle-bot-old bot commented Mar 2, 2021

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot-old
Copy link

paddle-bot-old bot commented Mar 2, 2021

✅ This PR's description meets the template requirements!
Please wait for other CI results.

@zhwesky2010 zhwesky2010 changed the title [Parallel UT]improve Parallel UT level on Windows/Linux [Parallel UT]Improve Parallel UT level on Windows/Linux Mar 3, 2021
@zhwesky2010 zhwesky2010 force-pushed the improve_UT_level branch 2 times, most recently from dd24c39 to af7aaf0 Compare March 5, 2021 07:58
XieYunshen
XieYunshen previously approved these changes Mar 16, 2021
Copy link
Contributor

@XieYunshen XieYunshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

'test_group_norm_op',
'test_seed_op',
'test_activation_nn_grad',
'test_profiler',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_profilerhttps://xly.bce.baidu.com/paddlepaddle/paddle/newipipe/detail/2529648/job/3531761 会随机挂,可以考虑从这个list里挪走?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@zhwesky2010 zhwesky2010 force-pushed the improve_UT_level branch 2 times, most recently from 92a7d49 to cc8c5ce Compare March 30, 2021 08:03
Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@qili93
Copy link
Contributor

qili93 commented Mar 31, 2021

同意豁免PR-CI-ROCM-Compile,代码和ROCM无关

@zhwesky2010 zhwesky2010 merged commit b05f614 into PaddlePaddle:develop Mar 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants