fused linear and selective recompute #620

FeixLiu · 2022-08-11T08:52:29Z

Part of pr #613, only contains the fused linear part and selective recompute part.

All below tests are carried out on 345M gpt model

tensor_fusion + fused_linear + selective_recompute

baseline speed	optimize speed	gain
181012	240203	+32.7%

fused_linear + selective_recompute

baseline speed	optimize speed	gain
19072	21241	+11.4%

sneaxiy · 2022-08-11T09:11:31Z

examples/gpt/hybrid_parallel/README.md

@@ -95,6 +95,7 @@ GPT训练默认使用AdamW优化器以及cosine 学习率衰减，这里通过
  num_train_epochs: 1
  seed: 1024
  use_recompute: False
+  recompute_granularity:


这个要填full吗？

不需要，这个recompute是false，空着就行，在backend收到的是一个None

examples/gpt/tools.py

FeixLiu added 3 commits August 11, 2022 16:51

fused linear and selective recompute

db6ab64

make all the same

b840d6d

move args checker to tools

4d02bcc

sneaxiy reviewed Aug 11, 2022

View reviewed changes

fix fused linear

49ad9e0

ForFishes approved these changes Aug 11, 2022

View reviewed changes

ForFishes merged commit 6c12050 into PaddlePaddle:develop Aug 11, 2022

FeixLiu deleted the fused_linear_and_selective_recompute branch August 11, 2022 10:58