-
-
Notifications
You must be signed in to change notification settings - Fork 11.4k
[Bugfix] [Model] Missing MRoPE function definition from KeyeForConditionalGeneration
#27895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly identifies and fixes a missing get_mrope_input_positions function for KeyeForConditionalGeneration by adding the function and the SupportsMRoPE interface. The changes also include a necessary ROCm-specific bugfix and refactoring of the attention backend selection, which improves code clarity. A new unit test is added to validate the fix. However, I've found a critical issue in the implementation of get_mrope_input_positions that could lead to a crash under certain conditions.
| if isinstance(video_grid_thw, list) and len(video_grid_thw) > 0: | ||
| video_grid_thw = video_grid_thw[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This conditional statement appears to incorrectly handle the case where video_grid_thw is a list[list[int]]. If video_grid_thw is a list of multiple video grids (e.g., [[t1, h1, w1], [t2, h2, w2]]), this line will slice it to just the first grid ([t1, h1, w1]). When this 1D list is passed to split_thw, it will be converted to a 1D tensor, causing an indexing error at grid_thw[:, 0] and crashing the execution. Since split_thw is already capable of handling a list[list[int]] by converting it to a 2D tensor, this slicing logic is both incorrect and unnecessary. Removing it will ensure correct behavior for all valid input types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if current_platform.is_cuda(): | ||
| from vllm.vllm_flash_attn.layers.rotary import apply_rotary_emb | ||
| elif current_platform.is_rocm(): | ||
| from flash_attn.ops.triton.rotary import apply_rotary as apply_rotary_emb | ||
|
|
||
| q_embed = apply_rotary_emb(q.float(), cos.float(), sin.float()).type_as(q) | ||
| k_embed = apply_rotary_emb(k.float(), cos.float(), sin.float()).type_as(k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call ROCm rotary kernel with wrong signature
In apply_rotary_pos_emb_flashatt the ROCm branch imports flash_attn.ops.triton.rotary.apply_rotary, which expects both q and k tensors and returns the rotated pair. The current implementation calls this kernel twice with only (tensor, cos, sin) just like the CUDA wrapper. On ROCm this will raise a TypeError for the missing argument and prevents rotary embeddings from being applied. The ROCm path should invoke apply_rotary(q, k, cos, sin) once and unpack the returned tensors, mirroring the existing usage in layers/rotary_embedding/common.py.
Useful? React with 👍 / 👎.
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
…tionalGeneration` (vllm-project#27895) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
…tionalGeneration` (vllm-project#27895) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
…tionalGeneration` (vllm-project#27895) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
…tionalGeneration` (vllm-project#27895) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
…tionalGeneration` (vllm-project#27895) (vllm-project#5) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
Purpose
This is a bugfix for
Kwai-Keye/Keye-VL-8B-Previewas an effort to complete the unit test for upcoming RFC ViT Attention Reorganization. The fix will be used as validation of RFC correctness.Ever since the shift of
get_mrope_input_positionsrole from the GPU runner into model definition file. This model is broken as it is missingget_mrope_input_positionsandSupportsMRoPE.I am not exactly sure how the exact
get_mrope_input_positionswould look like. I am referring toKeyeVL1_5ForConditionalGeneration'sget_mrope_input_positionsimplementation.I have also made a ROCm specific bugfix. But the major issue is the model is broken.
CC model author of PR #20126 , @Kwai-Keye .
Test Plan
Add a new unit test and ensure it pass the simple unit test where it outputs sensible data
tests/models/multimodal/generation/test_keye.pyChartQA lm eval
Test Result
ChartQA Lmeval score
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.