Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hybrid performance] optimize grad_add device #33946

Merged

Conversation

wangxicoding
Copy link
Contributor

@wangxicoding wangxicoding commented Jul 3, 2021

PR types

Performance optimization

PR changes

Others

Describe

优化grad_add op的device属性,从使用对应梯度op的device变为生成对应梯度op的device。
效果,在pipeline并行中:

  1. 当前develop
    grad_add op切分到使用对应梯度op的上,在pipeline send/recv时会发送多个子梯度
    image
  2. PR优化
    grad_add op切分到生成对应梯度op的上,在pipeline send/recv时会发送最后累加的梯度
    image

V100 32G,gpt2-en模型测试

卡数 优化 dtype speed(tokens/s) (S) 提升
4卡pp baseline fp32 16522  
  fp16 30215  
4卡pp grad_add设备优化 fp32 16748 1.37%
  fp16 31304 3.6%

@paddle-bot-old
Copy link

paddle-bot-old bot commented Jul 3, 2021

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@wangxicoding wangxicoding changed the title optimize grad_add device [hybrid performance] optimize grad_add device Jul 5, 2021
Copy link

@sandyhouse sandyhouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@zhiqiu zhiqiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@wangxicoding wangxicoding merged commit 75d247b into PaddlePaddle:develop Jul 5, 2021
@wangxicoding wangxicoding deleted the optimize_grad_add_device branch July 5, 2021 08:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants