Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

delete rank switch in broadcast_function.h for compile #42645

Merged
merged 8 commits into from
May 16, 2022

Conversation

AnnaTrainingG
Copy link
Contributor

@AnnaTrainingG AnnaTrainingG commented May 10, 2022

PR types

Others

PR changes

Others

Describe

Broadcast中关于rank的例化会导致底层模板展开较多,造成reduce_sum_grad_kernel.cu.o文件体积过大,修改后可以降低.o体积及编译时间
image

OP-benchmark失败原因:
本次修改,将影响所有调用broadcast调用的OP:

  1. 本次修改不会影响到matmul 反向,与本次修改无关;
  2. subtract为机器波动,相同commit,第一次运行能够符合

image

3.p_norm_kernel_2 forward中调用的为elementwiseKernel
4.p_norm_kernel_2 backward调用的为Eigen,均与本次修改无关
本地测试如下:
image

@paddle-bot-old
Copy link

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

bmb0537
bmb0537 previously approved these changes May 13, 2022
@AnnaTrainingG AnnaTrainingG changed the title compile kp delete rank switch in broadcast_function.h for compile May 16, 2022
Copy link
Contributor

@ZzSean ZzSean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for OP-Benchmark

@AnnaTrainingG AnnaTrainingG merged commit 8501fb0 into PaddlePaddle:develop May 16, 2022
lanxianghit pushed a commit that referenced this pull request Jun 6, 2022
删除Broadcast function中rank例化以及Elementwise调用,降低编译时间。
从develop分支中的#42645 PR修改而来,由于develop分支与release分支相差较大,无法实现cherry-pick,因此针对release2.3重新提交PR.
Broadcast中关于rank的例化会导致底层模板展开较多,造成reduce_sum_grad_kernel.cu.o文件体积过大,修改后可以降低.o体积及编译时间
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants