-
-
Notifications
You must be signed in to change notification settings - Fork 11.2k
[Bugfix] Fix MRoPE dispatch on XPU #24724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Yan Ma <yan.ma@intel.com>
|
@jikunshang please take a review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses a dispatch issue for MRoPE (Multimodal Rotary Position Embedding) on XPU devices. The PR description mentions CPU, but the change correctly targets XPU. The MRotaryEmbedding class was incorrectly inheriting the forward_xpu method from its RotaryEmbedding base class. The base implementation lacks support for the specific logic required for multimodal inputs (where positions.ndim == 2), which would lead to incorrect behavior.
The fix introduces a forward_xpu method in the MRotaryEmbedding class that dispatches to its own forward_native implementation. This is the correct approach, as MRotaryEmbedding.forward_native contains the necessary logic to handle multimodal inputs, ensuring correct functionality on XPU devices. This change aligns the XPU implementation with the existing CPU fallback, providing a correct execution path. The change is sound and resolves the bug.
| key = torch.cat((key_rot, key_pass), dim=-1).reshape(key_shape) | ||
| return query, key | ||
|
|
||
| def forward_xpu( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what will xpu call without this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/rotary_embedding/base.py#L122, I think this is the expected path but there will be tensor mismatch error when calling the kernel and strange that previously we don't run in this path. I will take a look at this further, but need this fix to unblock Qwen2.5 VL.
Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Purpose
Fix MRoPE dispatch issue on xpu introduced in #24444
Test Plan
VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 VLLM_WORKER_MULTIPROC_METHOD=spawn python3 examples/offline_inference/basic/generate.py --model Qwen/Qwen2.5-VL-7B-Instruct --enforce-eager
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.