Add fused attention op backward and python layer. #36498

limin2021 · 2021-10-18T05:12:08Z

PR types

New features

PR changes

OPs

Describe

功能：本PR的目标是提高attention模块的计算性能。
为了减少框架层对op的调度开销，本PR通过在C++层手动实现attention模块，对外提供attention 大op；
为了减少防存开销，本PR采取了两种优化方法：
（1）在q,k,v计算时通过共享输入X，将该处的gemm，transpose和bias add从三次调用减少为一次；
（2）使用kernel融合优化技术，在不同cuda kernel之间通过寄存器传输数据；
fused_attention_op 实现的计算逻辑：
fused_attention_op与paddle已有的MultiHeadAttention layer的不同：
（1）计算逻辑范围扩大了，详见上面的伪代码。
（2）q, k, v的weight存储格式不一样。
原有的：保存在三个weight张量中，WQ, WK, WV
本PR：保存在一个weight张量中，qkv_weight
由WQ, WK, WV得到qkv_weight的方法：
实现:
本PR是fused_attention_op 的反向实现，具体细节：

（1）fused_attention_op.cc and fused_attention_op.cu
The C++ impl of backward for fused_attention_op.
Related preceding RRs:
#34883, #35308, #35350 #35621 , #35903, #35905

（2）functional/fused_attention/fused_mult_head_attention():
Add static graph construction method.

（3）test_fused_attention_op.py
Add code to test the correctness of backward of fused_attention_op.

（4）fused_transformer.py/FusedMultiHeadAttention layer:
Add FusedMultiHeadAttention layer.

（5）test_fused_attention_op_api.py
Test the correctness of fused_attention_op python API, both dynamic and static graph.

Unittest results

… fused_attention_op_2_fw

This reverts commit 2e3f4f2.

This reverts commit 8a4c2a8.

… fused_attention_op_2_fw

… modify_fused_attention_functional_api_path

…github.com/limin2021/Paddle into fused_attention_bw

… fused_attention_bw

功能：本PR的目标是提高attention模块的计算性能。为了减少框架层对op的调度开销，本PR通过在C++层手动实现attention模块，对外提供attention 大op；为了减少防存开销，本PR采取了两种优化方法：（1）在q,k,v计算时通过共享输入X，将该处的gemm，transpose和bias add从三次调用减少为一次；（2）使用kernel融合优化技术，在不同cuda kernel之间通过寄存器传输数据；

limin2021 and others added 30 commits September 22, 2021 05:01

Add fused_attention_op: add impl wrappers.

f5eee9f

Add fused_attention_op: forward.

e16e3b3

Add fused_attention_op: forward impl.

42f0372

Remove useless code.

c6aebef

Remove useless code.

2c0ab6c

Remove docs.

ece3c08

Minors.

b18b405

Minors.

b939159

Update test_fused_attention_op.py

07fd753

Merge branch 'PaddlePaddle:develop' into fused_attention_op_2_fw

ef89a94

Remove static construction of python api.

b44d882

Modifications accordding to reviews.

ff3df46

Modifications accordding to Xreki's review.

8a4c2a8

Modifications unittest/cmakefile.txt.

739d9ca

Fetch new fused_dropout_helper.h from PaddlePaddle#35843.

1d9e125

Remove include fused_attention_op.h.

4dd4260

Polish names of variants.

2e3f4f2

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

f17c444

… fused_attention_op_2_fw

Revert "Polish names of variants."

13d4ff3

This reverts commit 2e3f4f2.

Revert "Modifications accordding to Xreki's review."

300ec35

This reverts commit 8a4c2a8.

Move fused_multi_head_attention from common.py.

7b28f7c

Modify copyright and names with number.

30fef54

Remove HIP and use OpTest and remove print.

766ef85

Minors.

0bc03a6

Polish functional.fused_attention_op.

99e36f9

Minors.

2d9f727

Remove commits of tools/__pycache__/.

f35b3c7

Minors.

1433ba6

Add english doc for functional.fused_multi_head_attention

cf7be13

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

ae875ca

… fused_attention_op_2_fw

limin2021 and others added 3 commits October 25, 2021 11:31

Merge branch 'develop' into fused_attention_bw

ffa11b5

Minors.

c65617c

Minors.

43666eb

zkh2016 previously approved these changes Oct 25, 2021

View reviewed changes

limin2021 added 6 commits October 25, 2021 11:21

Move fused_attention function api path to incubate.

1f94fba

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

abc2d20

… modify_fused_attention_functional_api_path

Move fused_feedforward functional api path to incubate.

da2e6e8

Remove functional api in incubate/__init__.py.

6680652

rm incubate/layer/fused_transformer.py.

617a647

Merge branch 'modify_fused_attention_functional_api_path' of https://…

3be3a8e

…github.com/limin2021/Paddle into fused_attention_bw

limin2021 dismissed zkh2016’s stale review via 3be3a8e October 25, 2021 12:38

limin2021 added 3 commits October 25, 2021 12:39

Modify functional api path in sample code.

753abc5

Merge branch 'modify_fused_attention_functional_api_path' of https://…

dd16512

…github.com/limin2021/Paddle into fused_attention_bw

Add FusedMultiHeadAttention api in incubate/nn/__init__.py

2b3c379

zkh2016 previously approved these changes Oct 26, 2021

View reviewed changes

xingfeng01 previously approved these changes Oct 26, 2021

View reviewed changes

lanxianghit previously approved these changes Oct 26, 2021

View reviewed changes

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

5f54a0f

… fused_attention_bw

limin2021 dismissed stale reviews from lanxianghit, xingfeng01, and zkh2016 via 5f54a0f October 26, 2021 06:12

xingfeng01 approved these changes Oct 26, 2021

View reviewed changes

zkh2016 approved these changes Oct 26, 2021

View reviewed changes

lanxianghit approved these changes Oct 26, 2021

View reviewed changes

TCChenlong approved these changes Oct 26, 2021

View reviewed changes

lanxianghit merged commit 5119428 into PaddlePaddle:develop Oct 26, 2021

zkh2016 mentioned this pull request Oct 26, 2021

Fused transformer encoder layer and fused feedforward layer #36604

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fused attention op backward and python layer. #36498

Add fused attention op backward and python layer. #36498

limin2021 commented Oct 18, 2021 •

edited

Loading

Add fused attention op backward and python layer. #36498

Add fused attention op backward and python layer. #36498

Conversation

limin2021 commented Oct 18, 2021 • edited Loading

PR types

PR changes

Describe

limin2021 commented Oct 18, 2021 •

edited

Loading