Skip to content

Conversation

@tjtanaa
Copy link
Collaborator

@tjtanaa tjtanaa commented Oct 31, 2025

Purpose

This is a bugfix for Kwai-Keye/Keye-VL-8B-Preview as an effort to complete the unit test for upcoming RFC ViT Attention Reorganization. The fix will be used as validation of RFC correctness.

Ever since the shift of get_mrope_input_positions role from the GPU runner into model definition file. This model is broken as it is missing get_mrope_input_positions and SupportsMRoPE.

I am not exactly sure how the exact get_mrope_input_positions would look like. I am referring to KeyeVL1_5ForConditionalGeneration's get_mrope_input_positions implementation.

I have also made a ROCm specific bugfix. But the major issue is the model is broken.

CC model author of PR #20126 , @Kwai-Keye .

Test Plan

Add a new unit test and ensure it pass the simple unit test where it outputs sensible data tests/models/multimodal/generation/test_keye.py

ChartQA lm eval

Test Result

--------------------------------------------------                                                                                         [6/1934]
<analysis>This question asks for the content of each image, which is straightforward and asks for a direct observation. Therefore, /no_think mode i
s more appropriate.</analysis>The first image depicts a street scene in what appears to be a Chinatown area. There is a prominent red stop sign in 
the foreground, and behind it, a traditional Chinese archway with red pillars and decorative elements. The archway has Chinese characters on it, an
d there are stone lion statues flanking the entrance. The area seems to be a commercial district with various shops and signs visible in the backgr
ound. A black car is driving on the street, and there are some pedestrians and trees in the distance.                                              
                                                                                                                                                   
The second image shows a view of a tall tower, likely a landmark, partially obscured by branches of cherry blossom trees in full bloom. The cherry 
blossoms are pink and create a beautiful contrast against the clear blue sky. The tower is modern in design, with a circular observation deck near the top. The scene suggests a springtime setting, with the cherry blossoms indicating the blooming season.
--------------------------------------------------
PASSED

=================== 1 passed, 2 warnings in 76.27s (0:01:16) ===================

ChartQA Lmeval score

For detailed information on this command, run:
  run.py eval_vllm --model_name Kwai-Keye/Keye-VL-8B-Preview --url http://0.0.0.0:7899 --output_dir ./chartqa --eval_name chartqa - --help
================================================================================
Metrics:
{
    "explicit_prompt_relaxed_correctness": 0.8672,
    "anywhere_in_answer_relaxed_correctness": 0.868
}
================================================================================

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
@mergify mergify bot added the multi-modality Related to multi-modality (#4194) label Oct 31, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly identifies and fixes a missing get_mrope_input_positions function for KeyeForConditionalGeneration by adding the function and the SupportsMRoPE interface. The changes also include a necessary ROCm-specific bugfix and refactoring of the attention backend selection, which improves code clarity. A new unit test is added to validate the fix. However, I've found a critical issue in the implementation of get_mrope_input_positions that could lead to a crash under certain conditions.

Comment on lines +1639 to +1640
if isinstance(video_grid_thw, list) and len(video_grid_thw) > 0:
video_grid_thw = video_grid_thw[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This conditional statement appears to incorrectly handle the case where video_grid_thw is a list[list[int]]. If video_grid_thw is a list of multiple video grids (e.g., [[t1, h1, w1], [t2, h2, w2]]), this line will slice it to just the first grid ([t1, h1, w1]). When this 1D list is passed to split_thw, it will be converted to a 1D tensor, causing an indexing error at grid_thw[:, 0] and crashing the execution. Since split_thw is already capable of handling a list[list[int]] by converting it to a 2D tensor, this slicing logic is both incorrect and unnecessary. Removing it will ensure correct behavior for all valid input types.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +344 to 350
if current_platform.is_cuda():
from vllm.vllm_flash_attn.layers.rotary import apply_rotary_emb
elif current_platform.is_rocm():
from flash_attn.ops.triton.rotary import apply_rotary as apply_rotary_emb

q_embed = apply_rotary_emb(q.float(), cos.float(), sin.float()).type_as(q)
k_embed = apply_rotary_emb(k.float(), cos.float(), sin.float()).type_as(k)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Call ROCm rotary kernel with wrong signature

In apply_rotary_pos_emb_flashatt the ROCm branch imports flash_attn.ops.triton.rotary.apply_rotary, which expects both q and k tensors and returns the rotated pair. The current implementation calls this kernel twice with only (tensor, cos, sin) just like the CUDA wrapper. On ROCm this will raise a TypeError for the missing argument and prevents rotary embeddings from being applied. The ROCm path should invoke apply_rotary(q, k, cos, sin) once and unpack the returned tensors, mirroring the existing usage in layers/rotary_embedding/common.py.

Useful? React with 👍 / 👎.

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 1, 2025
@DarkLight1337 DarkLight1337 merged commit e2347db into vllm-project:main Nov 1, 2025
55 checks passed
@tjtanaa tjtanaa deleted the bugfix-keye branch November 1, 2025 05:45
zhaozuy pushed a commit to zhaozuy/vllm that referenced this pull request Nov 4, 2025
…tionalGeneration` (vllm-project#27895)

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
juliendenize pushed a commit to juliendenize/vllm that referenced this pull request Nov 6, 2025
…tionalGeneration` (vllm-project#27895)

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
ZhengHongming888 pushed a commit to ZhengHongming888/vllm that referenced this pull request Nov 8, 2025
…tionalGeneration` (vllm-project#27895)

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
…tionalGeneration` (vllm-project#27895)

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
yma11 pushed a commit to yma11/vllm that referenced this pull request Nov 14, 2025
…tionalGeneration` (vllm-project#27895) (vllm-project#5)

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

multi-modality Related to multi-modality (#4194) ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants