Skip to content

Conversation

@ywang96
Copy link
Member

@ywang96 ywang96 commented Sep 5, 2025

Purpose

In V0, multimodal profiling was based on the assumption that the batch size (by default model context window) fits all number of modality items of the biggest sizes, therefore the max number of video frames was computed based on how much space there is after fitting all images.

In V1, we no longer require this assumption because of chunked prefill and therefore there's no need to consider image tokens to calculate the max number of frames of videos.

A follow up is to further simplify dummy data generation since in V1 we don't need to consider the number of modality items but just whether a certain modality is allowed or not.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Roger Wang <hey@rogerw.me>
@ywang96 ywang96 requested a review from sighingnow as a code owner September 5, 2025 10:05
@mergify mergify bot added the qwen Related to Qwen models label Sep 5, 2025
@ywang96 ywang96 requested review from DarkLight1337 and Isotr0py and removed request for sighingnow September 5, 2025 10:06
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly improves the estimation of the maximum video embedding length for V1 models. By removing the subtraction of image token counts from the available sequence length, the calculation now accurately reflects the capabilities of the V1 architecture with chunked prefill, where video and image data do not need to fit into the context window simultaneously. The changes in llava_onevision.py and qwen2_vl.py are consistent and simplify the logic as intended. The code is cleaner and more aligned with the V1 processing pipeline. Overall, this is a good improvement.

@Isotr0py Isotr0py enabled auto-merge (squash) September 5, 2025 10:09
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 5, 2025
@vllm-bot vllm-bot merged commit eddaafc into vllm-project:main Sep 6, 2025
38 of 41 checks passed
eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025
…m-project#24312)

Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Roger Wang <hey@rogerw.me>
skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025
…m-project#24312)

Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Roger Wang <hey@rogerw.me>
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
…m-project#24312)

Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Roger Wang <hey@rogerw.me>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
…m-project#24312)

Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Roger Wang <hey@rogerw.me>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
…m-project#24312)

Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Roger Wang <hey@rogerw.me>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants