Skip to content

Conversation

@tomeras91
Copy link
Contributor

@tomeras91 tomeras91 commented Oct 6, 2025

Purpose

Add support for EVS (Efficient Video Sampling, introduced in #22980) for Nano Nemotron VL model.

Contrary to other multimodal models (for example, Qwen2.5-VL) Nano Nemotron VL uses text frame separators. These are not placeholders, and should not be replaced by video embeddings. For example, if a video has 2 frames, each with 5 tokens, the video text replacement will be:

Frame0: <img><image><image><image><image><image></img>Frame2: <img><image><image><image><image><image></img>

This poses a challenge to EVS, since it means we need to know the number of tokens per frame when creating the text replacement. In the standard vLLM flow, this is impossible since the replacement is created before the forward pass through the vision encoder, and EVS selects video tokens to prune based on the vision encoder outputs (which obviously are known only after the forward pass through the vision encoder).

We worked around this issue by signaling get_input_embeddings of the SupportsMultiModal protocol to not replace any of the video text replacement (done by setting embed_text=seq in get_video_repl when creating PromptUpdateDetails), and handle embedding merging ourselves in NemotronH_Nano_VL_V2.get_multimodal_embeddings, where we already know how many tokens are retained per frame. This also means the number of tokens per frame set in _get_prompt_updates's call to get_video_repl doesn't matter, as that text replacement is created only to set the correct number of tokens. Hence, all that matters in that call is the total number of retained video tokens, which is known at that stage based on video_pruning_rate.

Joint work with @BloodAxe


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
@mergify mergify bot added multi-modality Related to multi-modality (#4194) qwen Related to Qwen models labels Oct 6, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for Efficient Video Sampling (EVS) to the Nano Nemotron VL model. The implementation introduces a clever workaround to handle this model's specific use of text separators between video frames, which poses a challenge for the standard EVS workflow. The core of the solution involves manually merging video and text embeddings within the model's get_multimodal_embeddings method. Additionally, the EVS utility functions in vllm/multimodal/evs.py have been refactored for better generality, which is a positive change. The overall approach is sound, but I've identified a performance issue in the new code that could lead to unnecessary CPU-GPU synchronization.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

tomeras91 and others added 2 commits October 6, 2025 04:31
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
…er value to tokens_in_single_frame

Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) October 6, 2025 10:17
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 6, 2025
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
auto-merge was automatically disabled October 6, 2025 10:46

Head branch was pushed to by a user without write access

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) October 6, 2025 10:55
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
auto-merge was automatically disabled October 6, 2025 13:56

Head branch was pushed to by a user without write access

@DarkLight1337 DarkLight1337 merged commit b8f603c into vllm-project:main Oct 6, 2025
53 checks passed
southfreebird pushed a commit to southfreebird/vllm that referenced this pull request Oct 7, 2025
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
@tomeras91 tomeras91 deleted the evs-nano-nemotron-vlm branch October 8, 2025 15:45
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

multi-modality Related to multi-modality (#4194) qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants