[hybrid performance] optimize grad_add device #33946

wangxicoding · 2021-07-03T10:59:15Z

Performance optimization

Others

优化grad_add op的device属性，从使用对应梯度op的device变为生成对应梯度op的device。
效果，在pipeline并行中：

V100 32G，gpt2-en模型测试

卡数	优化	dtype	speed(tokens/s) （S）	提升
4卡pp	baseline	fp32	16522
		fp16	30215
4卡pp	grad_add设备优化	fp32	16748	1.37%
		fp16	31304	3.6%

paddle-bot-old · 2021-07-03T10:59:18Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

sandyhouse

LGTM

zhiqiu

lgtm

optimize grad add device

6d29ce6

wangxicoding requested review from gongweibao, sandyhouse and zhiqiu July 3, 2021 11:55

wangxicoding changed the title ~~optimize grad_add device~~ [hybrid performance] optimize grad_add device Jul 5, 2021

sandyhouse approved these changes Jul 5, 2021

View reviewed changes

zhiqiu approved these changes Jul 5, 2021

View reviewed changes

wangxicoding merged commit 75d247b into PaddlePaddle:develop Jul 5, 2021

wangxicoding deleted the optimize_grad_add_device branch July 5, 2021 08:08

Provide feedback