Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fused_mt Branch Migration #64125

Merged
merged 27 commits into from
May 23, 2024
Merged

Conversation

penPenf28
Copy link
Contributor

@penPenf28 penPenf28 commented May 8, 2024

PR Category

Inference

PR Types

New features

Description

移植如下内容,并做了一定程度上的适配

重点修改paddle/fluid/operators/fused/fused_multi_transformer_op.cu,增加了GQA支持,目前仅支持flash_attention_v2的底层实现(only float16/bfloat16)

本地单测,移除了padding输入为定长的测试,只支持variable长度

  • test_fused_multi_transformer_op.py pass

Copy link

paddle-bot bot commented May 8, 2024

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@CLAassistant
Copy link

CLAassistant commented May 8, 2024

CLA assistant check
All committers have signed the CLA.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@paddle-bot paddle-bot bot added the contributor External developers label May 8, 2024
@penPenf28 penPenf28 marked this pull request as ready for review May 8, 2024 12:45
@penPenf28 penPenf28 changed the title Fused_mt branch Migration Fused_mt Branch Migration May 11, 2024
XieYunshen
XieYunshen previously approved these changes May 17, 2024
Copy link
Contributor

@XieYunshen XieYunshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 单测删除

Copy link
Contributor

@tianshuo78520a tianshuo78520a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTP for print

Copy link
Contributor

@XiaoguangHu01 XiaoguangHu01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@heavengate heavengate merged commit f8f9bfa into PaddlePaddle:develop May 23, 2024
32 checks passed
chen2016013 pushed a commit to chen2016013/Paddle that referenced this pull request May 26, 2024
* Merge fused_mt branch

* Adjusted fuse_mt_int8

* Revert attention_layer_norm.h

* Revert paddle/phi/kernels/fusion/gpu/fmha_ref.h

* Add win support and refine format.

* Reformat for win.

* Removed redundant files, now only supports flash_attn_v2 and variable length

* Refine static_fused_ft test

* Refine fused_mt related testcase

* Remove custom_adll_reduce

* Remove operator cublaslt and revert parallel test

* Refine empty seq_len

* Refine ft

* Refine ft_static test

* Remove float32 support and static parallel ft test

* Refine type static error.

* Fix doc type error

* Fuse_mt code format

* Remove some redundant code

* Remove redundant attention_layer_norm.h

* Remove redundant code in ft_op

* Remove Redundant code and skip fuse_mt doctest

* Remove redundant fmha_ref mmha_util and other code

* Remove redundant kernel

* Remove redundant file

* Refine fuse_mt code

* Refine cublaslt comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers
Projects
None yet
Development

Successfully merging this pull request may close these issues.