Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CPU] Fuse SDPA before/after Reshape+Transpose Node to SDPA #26819

Conversation

xipingyan
Copy link
Contributor

@xipingyan xipingyan commented Sep 27, 2024

Details:

  • Pattern: QKV_Reshape -> QKV_Transpose -> SDPA->OUT_Transpse->OUT_Reshape
  • Fuse this pattern to: SDPA
  • This hotspot can be observed after [CPU]whisper readvalue optimize #26130, this PR's implementation doesn't depend on it.

Tickets:

  • 153616

Signed-off-by: xipingya <xiping.yan@intel.com>

# Conflicts:
#	src/plugins/intel_cpu/src/transformations/transformation_pipeline.cpp
@xipingyan xipingyan requested review from a team as code owners September 27, 2024 05:25
@xipingyan xipingyan requested review from luo-cheng2021 and removed request for a team September 27, 2024 05:25
@github-actions github-actions bot added the category: CPU OpenVINO CPU plugin label Sep 27, 2024
@dmitry-gorokhov dmitry-gorokhov added this to the 2024.5 milestone Sep 27, 2024
@xipingyan xipingyan force-pushed the xp/mha_fuse_transpose_whisper_to_master branch 2 times, most recently from 05a9f8a to febd8fc Compare September 28, 2024 07:50
Signed-off-by: xipingya <xiping.yan@intel.com>
Copy link
Contributor

@luo-cheng2021 luo-cheng2021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Signed-off-by: xipingya <xiping.yan@intel.com>
Signed-off-by: xipingya <xiping.yan@intel.com>
@xipingyan
Copy link
Contributor Author

hi @luo-cheng2021 , about arm test fail issue, I found the arm plugin seem not to implement memory permute.
The smoke_ConcatSDPTransposeTestSetState's all test success, because it executes use_one_token kernel_single_token.
But current PR's test fails if L dim > 1, it will execute kernel.

because arm doesn't support SDPA with stride.

Signed-off-by: xipingya <xiping.yan@intel.com>
@yuxu42
Copy link
Contributor

yuxu42 commented Oct 9, 2024

Hi @dmitry-gorokhov could you please take a review? Thanks!

@@ -853,6 +854,7 @@ void Transformations::PostLpt() {

CPU_REGISTER_PASS_COMMON(postLPTPassManager, ov::pass::transpose_sinking::TSShapeOfForward);
CPU_REGISTER_PASS_COMMON(postLPTPassManager, StatefulSDPAFusion);
CPU_REGISTER_PASS_X64(postLPTPassManager, ov::intel_cpu::SDPAFuseTransposeReshape);
Copy link
Contributor

@dmitry-gorokhov dmitry-gorokhov Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xipingyan I don't see any x64 specific dependencies in the impl. as follow-up task can we try to make this optimization Common and see how it works on ARM platform?,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @dmitry-gorokhov , I found Arm's SDPA kernel doesn't support memory permute.

If SDPA doesn't fuse concat, code will go into Arm implementation branch, we will get random result in each inference.

@dmitry-gorokhov dmitry-gorokhov added this pull request to the merge queue Oct 21, 2024
Merged via the queue into openvinotoolkit:master with commit ebdf1fc Oct 21, 2024
153 checks passed
CuriousPanCake pushed a commit to CuriousPanCake/openvino that referenced this pull request Nov 6, 2024
…toolkit#26819)

### Details:
- *Pattern: QKV_Reshape -> QKV_Transpose ->
SDPA->OUT_Transpse->OUT_Reshape*
 - *Fuse this pattern to: SDPA*
- *This hotspot can be observed after
openvinotoolkit#26130, this PR's
implementation doesn't depend on it.*

### Tickets:
 - *153616*

---------

Signed-off-by: xipingya <xiping.yan@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: CPU OpenVINO CPU plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants