-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
[XPU][bugfix] fix rope for llama4 and deepseek #25145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly fixes a dispatch issue for RoPE on XPU for the llama4 and deepseek models by providing a forward_xpu method that falls back to the native PyTorch implementation. This prevents the use of an incorrect specialized kernel from the base class. My review includes suggestions to refactor the newly added methods to reduce code duplication and improve maintainability, which will help prevent potential bugs in the future.
| def forward_xpu( | ||
| self, | ||
| positions: torch.Tensor, | ||
| query: torch.Tensor, | ||
| key: Optional[torch.Tensor] = None, | ||
| offsets: Optional[torch.Tensor] = None, | ||
| ) -> tuple[torch.Tensor, Optional[torch.Tensor]]: | ||
| return self.forward_native(positions, query, key, offsets) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To improve maintainability and reduce code duplication, you can directly alias forward_native to forward_xpu. The current implementation duplicates the body of forward_cuda, and both just delegate to forward_native. Using a direct assignment makes the intent clearer and ensures that any future changes to the signature of forward_native only need to be made in one place, reducing the risk of future bugs.
forward_xpu = forward_nativeThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand root cause now.
before #24444, RotaryEmbedding will use forward_xpu() , its child class like Llama4VisionRotaryEmbedding, MRotaryEmbedding, DeepseekScalingRotaryEmbedding will use forward method directly, ignore parent class dispatch forward.
after #24444, all these child classes will extends parent class RotaryEmbedding 's forward_xpu method, which is not match.
Maybe a best fix is we define a BaseRotaryEmbedding class which not do any dispatch. and all rope extends this base class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. That's the root cause. This is already a base class introduced in ROPE refactor PR #22192. In this class, we have forward_xpu dispatch which will go to forward_native or ops.rotary_embedding and this default behavior makes sense. But for cases like Llama4VisionRotaryEmbedding, MRotaryEmbedding, DeepseekScalingRotaryEmbedding, our kernel doesn't support them and we need fix it at kernel level to avoid using forward_native at child class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the Gemini suggestion here, could you try it out?
| def forward_xpu( # type: ignore[override] | ||
| self, | ||
| query: torch.Tensor, | ||
| key: Optional[torch.Tensor] = None, | ||
| ) -> tuple[torch.Tensor, Optional[torch.Tensor]]: | ||
| return self.forward_native(query, key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid code duplication and enhance maintainability, it's better to alias forward_native for forward_xpu, as both this method and forward_cuda simply call forward_native. This approach is cleaner and less prone to errors if the underlying forward_native implementation or its signature changes in the future.
forward_xpu = forward_native # type: ignore[override]|
May you add a description to explain current fixing. |
|
I think a |
|
@yma11 Any movement on this? Would love to have Llama4 functional here. |
f23ca75 to
a010ec7
Compare
updated based on comments. let's wait for CI result. |
|
do we need consider other rope class in this folder? |
should all be covered. |
@ProExpertProg can you help review this PR again? A base class is added. |
Head branch was pushed to by a user without write access
ae13379 to
a9d1af1
Compare
Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>
Purpose
Fix more dispatch issue on xpu introduced in #24444
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.