-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CPU] Fuse SDPA before/after Reshape+Transpose Node to SDPA #26819
[CPU] Fuse SDPA before/after Reshape+Transpose Node to SDPA #26819
Conversation
Signed-off-by: xipingya <xiping.yan@intel.com> # Conflicts: # src/plugins/intel_cpu/src/transformations/transformation_pipeline.cpp
Remove debug log. Signed-off-by: xipingya <xiping.yan@intel.com>
…hub.com/xipingyan/openvino into xp/mha_fuse_transpose_whisper_to_master
Signed-off-by: xipingya <xiping.yan@intel.com>
05a9f8a
to
febd8fc
Compare
src/plugins/intel_cpu/src/transformations/cpu_opset/common/pass/sdpa_fuse_transpose_reshape.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_cpu/src/transformations/cpu_opset/common/pass/sdpa_fuse_transpose_reshape.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_cpu/src/transformations/cpu_opset/common/pass/sdpa_fuse_transpose_reshape.cpp
Outdated
Show resolved
Hide resolved
Signed-off-by: xipingya <xiping.yan@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Signed-off-by: xipingya <xiping.yan@intel.com>
Signed-off-by: xipingya <xiping.yan@intel.com>
hi @luo-cheng2021 , about arm test fail issue, I found the arm plugin seem not to implement memory permute. |
because arm doesn't support SDPA with stride. Signed-off-by: xipingya <xiping.yan@intel.com>
Hi @dmitry-gorokhov could you please take a review? Thanks! |
src/plugins/intel_cpu/src/transformations/cpu_opset/x64/pass/sdpa_fuse_transpose_reshape.cpp
Show resolved
Hide resolved
@@ -853,6 +854,7 @@ void Transformations::PostLpt() { | |||
|
|||
CPU_REGISTER_PASS_COMMON(postLPTPassManager, ov::pass::transpose_sinking::TSShapeOfForward); | |||
CPU_REGISTER_PASS_COMMON(postLPTPassManager, StatefulSDPAFusion); | |||
CPU_REGISTER_PASS_X64(postLPTPassManager, ov::intel_cpu::SDPAFuseTransposeReshape); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xipingyan I don't see any x64 specific dependencies in the impl. as follow-up task can we try to make this optimization Common and see how it works on ARM platform?,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @dmitry-gorokhov , I found Arm's SDPA kernel doesn't support memory permute.
If SDPA doesn't fuse concat, code will go into Arm implementation branch, we will get random result in each inference.
…toolkit#26819) ### Details: - *Pattern: QKV_Reshape -> QKV_Transpose -> SDPA->OUT_Transpse->OUT_Reshape* - *Fuse this pattern to: SDPA* - *This hotspot can be observed after openvinotoolkit#26130, this PR's implementation doesn't depend on it.* ### Tickets: - *153616* --------- Signed-off-by: xipingya <xiping.yan@intel.com>
Details:
Tickets: