Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fuse vit attention for faster-rcnn on BML #54139

Merged
merged 1 commit into from
Jun 8, 2023

Conversation

abenmao
Copy link
Contributor

@abenmao abenmao commented May 26, 2023

PR types

Performance optimization

PR changes

Others

Description

For Vison Transformer in Faster-RCNN model, adds a fuse that replaces the scaled_dp_attention in the self-attention with a single op.
On Cascade Lake, this optimization leads to a 1.67X speedup in the VisionTransformer block and a 1.46X improvement (reducing the latency from 3.8s to 2.6s) in the end-to-end inference time for Faster-RCNN model (from BML).
On Sapphire Rapids, the optimization has the potential to achieve a speedup of bf16/int8 by reducing the amount of memory access.

@paddle-bot
Copy link

paddle-bot bot commented May 26, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added contributor External developers status: proposed labels May 26, 2023
@CLAassistant
Copy link

CLAassistant commented May 26, 2023

CLA assistant check
All committers have signed the CLA.

@paddle-bot
Copy link

paddle-bot bot commented May 26, 2023

❌ The PR is not created using PR's template. You can refer to this Demo.
Please use PR's template, it helps save our maintainers' time so that more developers get helped.

@abenmao abenmao force-pushed the fused_vit_atten branch 3 times, most recently from 5c3c75b to 3cb4b1f Compare May 29, 2023 08:46
@abenmao abenmao force-pushed the fused_vit_atten branch 3 times, most recently from 25cb7d8 to 2f4146d Compare May 30, 2023 06:30
@yaomichael
Copy link

@YuanRisheng

YuanRisheng
YuanRisheng previously approved these changes May 31, 2023
Copy link
Contributor

@YuanRisheng YuanRisheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@abenmao abenmao force-pushed the fused_vit_atten branch from 30831ea to 2d5ed47 Compare June 2, 2023 01:23
@abenmao
Copy link
Contributor Author

abenmao commented Jun 2, 2023

@YuanRisheng Rebased. Please help have a check again. Thanks!

@weishengying
Copy link
Contributor

LGTM

Copy link
Contributor

@qingqing01 qingqing01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 考虑到算子库及OneDNN很多算子都已经迁移到Phi算子,融合算子也有已经迁移到Phi的,倾向于直接使用Phi实现。
  2. PR描述里也给下性能提升数据吧

@paddle-bot
Copy link

paddle-bot bot commented Jun 7, 2023

很抱歉,经过我们的反复讨论,你的PR暂未达到合入标准,请阅读飞桨原生算子开发规范,你可以重新提交新的PR,我们先将此PR关闭,感谢你的贡献。
Sorry to inform you that through our discussion, your PR fails to meet the merging standard (Reference: Paddle Custom Operator Design Doc). You can also submit an new one. Thank you.

@abenmao abenmao reopened this Jun 7, 2023
@abenmao
Copy link
Contributor Author

abenmao commented Jun 7, 2023

@qingqing01 您好,1. 对于第一个问题,CI里提到要通过@risheng等人的review,和相关人员以及reviewer进行了讨论,会后@YuanRisheng 这边做了进一步的处理判断,最后确认目前加op的方法是可以的(LGTM)。 2. 性能提升相关的数据加到PR描述里了。

@abenmao abenmao requested a review from qingqing01 June 7, 2023 10:06
@qingqing01 qingqing01 merged commit fc88020 into PaddlePaddle:develop Jun 8, 2023
@risemeup1
Copy link
Contributor

@abenmao 你好,paddle/fluid/operators/fused/self_dp_attention_op.h和self_dp_attention_op.cc文件引入之后,在gcc12下会报错一个未初始化的错误,幸苦修复一下
image

@abenmao
Copy link
Contributor Author

abenmao commented Jun 10, 2023

你好 @risemeup1 问题已收到,情况确认后会及时联系。

@risemeup1
Copy link
Contributor

你好 @risemeup1 问题已收到,情况确认后会及时联系。

ok

@abenmao
Copy link
Contributor Author

abenmao commented Jun 11, 2023

@risemeup1 你好,在gcc12.1下进行了排查,发现是gcc12.1这个版本已知的一个issue,具体bug参见链接 : https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593 我们如果要workaround这个问题的话,可以在相关的地方加一个flag,用来把它进行隐藏,具体方法如下:
MobaXterm_IFWWWXgyWN

@risemeup1
Copy link
Contributor

@risemeup1 你好,在gcc12.1下进行了排查,发现是gcc12.1这个版本已知的一个issue,具体bug参见链接 : https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593 我们如果要workaround这个问题的话,可以在相关的地方加一个flag,用来把它进行隐藏,具体方法如下: MobaXterm_IFWWWXgyWN

这边幸苦提一个PR绕过一下吧,我觉得这样改没问题的

@risemeup1
Copy link
Contributor

@risemeup1 你好,在gcc12.1下进行了排查,发现是gcc12.1这个版本已知的一个issue,具体bug参见链接 : https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593 我们如果要workaround这个问题的话,可以在相关的地方加一个flag,用来把它进行隐藏,具体方法如下: MobaXterm_IFWWWXgyWN

这边幸苦提一个PR绕过一下吧,我觉得这样改没问题的

其实我理解这个bug是不是可以通过初始化赋值的方法也能解决啊

@abenmao
Copy link
Contributor Author

abenmao commented Jun 12, 2023

这边幸苦提一个PR绕过一下吧,我觉得这样改没问题的

其实我理解这个bug是不是可以通过初始化赋值的方法也能解决啊

@risemeup1 这个未初始化的赋值是gnu自己里边的代码没有初始化:
chrome_H7HzaFD0mr

我们目前只能workaround

@abenmao
Copy link
Contributor Author

abenmao commented Jun 12, 2023

@risemeup1 刚要提pr,rebase发现你已经提交了~

@risemeup1
Copy link
Contributor

@risemeup1 刚要提pr,rebase发现你已经提交了~

对,已经合入了,比较急,所以就周末提了,幸苦

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers Intel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants