Add fused_attention_op: add impl wrappers. #35903

limin2021 · 2021-09-22T05:10:12Z

PR types

Function optimization

PR changes

OPs

Describe

The first PR of "add fused_attention_op":
1.Add impl wrappers for gemm and fmha parts in fused_attention_op.
2.Fix bugs in layer_norm and attn_bias_add.cu.h.
3.Fix bugs in elementwise_op_impl.cu.h for ternary elementwise_add impl.

paddle-bot-old · 2021-09-22T05:10:19Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

xingfeng01 · 2021-09-22T15:46:22Z

paddle/fluid/operators/fused/fmha_ref.h

@@ -0,0 +1,324 @@
+/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.


fmha_ref.h 是使用 paddle 堆起来的 attn 吗？感觉名字有点歧义，后续可以改下名字。

是使用 paddle 堆起来的 attn。下一个PR修改名字。

功能：本PR的目标是提高attention模块的计算性能。为了减少框架层对op的调度开销，本PR通过在C++层手动实现attention模块，对外提供attention 大op；为了减少防存开销，本PR采取了两种优化方法：（1）在q,k,v计算时通过共享输入X，将该处的gemm，transpose和bias add从三次调用减少为一次；（2）使用kernel融合优化技术，在不同cuda kernel之间通过寄存器传输数据；

Add fused_attention_op: add impl wrappers.

f5eee9f

xingfeng01 reviewed Sep 22, 2021

View reviewed changes

xingfeng01 approved these changes Sep 23, 2021

View reviewed changes

lanxianghit approved these changes Sep 23, 2021

View reviewed changes

AnnaTrainingG merged commit 88ea8e6 into PaddlePaddle:develop Sep 23, 2021

This was referenced Sep 23, 2021

Fused attention op forward #35905

Merged

Fused attention op backward #35935

Closed

AnnaTrainingG pushed a commit to AnnaTrainingG/Paddle that referenced this pull request Sep 29, 2021

Add fused_attention_op: add impl wrappers. (PaddlePaddle#35903)

5af2390

This was referenced Oct 18, 2021

Add fused attention op backward and python layer. #36498

Merged

[cherry-pick] Cherry pick fused attn fw #36636

Closed

limin2021 added a commit to limin2021/Paddle that referenced this pull request Oct 24, 2021

Add fused_attention_op: add impl wrappers. (PaddlePaddle#35903)

a9618bf

This was referenced Oct 25, 2021

[cherry-pick] Cherry pick fused attn fw #36677

Closed

[cherry-pick-2.2] Fused attention op forward #36708

Merged

[cherry-pick-2.2]Add fused attention op backward and python layer. #36752

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fused_attention_op: add impl wrappers. #35903

Add fused_attention_op: add impl wrappers. #35903

limin2021 commented Sep 22, 2021

paddle-bot-old bot commented Sep 22, 2021

xingfeng01 Sep 22, 2021

limin2021 Sep 23, 2021

		@@ -0,0 +1,324 @@
		/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.

Add fused_attention_op: add impl wrappers. #35903

Add fused_attention_op: add impl wrappers. #35903

Conversation

limin2021 commented Sep 22, 2021

PR types

PR changes

Describe

paddle-bot-old bot commented Sep 22, 2021

xingfeng01 Sep 22, 2021

Choose a reason for hiding this comment

limin2021 Sep 23, 2021

Choose a reason for hiding this comment