[hybrid performance] all reduce fusion for sharding #34480

FeixLiu · 2021-07-29T06:34:19Z

New features

Others

allreduce fuse supports for sharding_optimizer

Using GPT model, 8 * V100, fuse_grad_in_size=32MB

dp=4 sharding=2

	No Fuse	Fused	Gain
throughput	135503 tokens/s	138892 tokens/s	+2.7%
allreduce number	57	15	-73%

dp_hybrid_sharding

dp_pp_hybrid_sharding

paddle-bot-old · 2021-07-29T06:34:43Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

python/paddle/distributed/fleet/meta_optimizers/sharding/utils.py

wangxicoding

LGTM

wangxicoding reviewed Jul 29, 2021

View reviewed changes

python/paddle/distributed/fleet/meta_optimizers/sharding/utils.py Outdated Show resolved Hide resolved

all reduce fusion for shardinug, test=develop

e6e26a5

FeixLiu force-pushed the allreduce_fuse_sharding branch from 862ffb4 to e6e26a5 Compare July 29, 2021 11:23

wangxicoding approved these changes Jul 30, 2021

View reviewed changes

wangxicoding merged commit 423ea97 into PaddlePaddle:develop Jul 30, 2021

FeixLiu deleted the allreduce_fuse_sharding branch July 30, 2021 02:06

FeixLiu changed the title ~~all reduce fusion for shardinug~~ [hybrid performance] all reduce fusion for shardinug Oct 11, 2021

FeixLiu changed the title ~~[hybrid performance] all reduce fusion for shardinug~~ [hybrid performance] all reduce fusion for sharding Oct 11, 2021

Provide feedback