[Model] EVS support for nano_nemotron_vl #26269

tomeras91 · 2025-10-06T01:28:37Z

Purpose

Add support for EVS (Efficient Video Sampling, introduced in #22980) for Nano Nemotron VL model.

Contrary to other multimodal models (for example, Qwen2.5-VL) Nano Nemotron VL uses text frame separators. These are not placeholders, and should not be replaced by video embeddings. For example, if a video has 2 frames, each with 5 tokens, the video text replacement will be:

Frame0: <img><image><image><image><image><image></img>Frame2: <img><image><image><image><image><image></img>

This poses a challenge to EVS, since it means we need to know the number of tokens per frame when creating the text replacement. In the standard vLLM flow, this is impossible since the replacement is created before the forward pass through the vision encoder, and EVS selects video tokens to prune based on the vision encoder outputs (which obviously are known only after the forward pass through the vision encoder).

We worked around this issue by signaling get_input_embeddings of the SupportsMultiModal protocol to not replace any of the video text replacement (done by setting embed_text=seq in get_video_repl when creating PromptUpdateDetails), and handle embedding merging ourselves in NemotronH_Nano_VL_V2.get_multimodal_embeddings, where we already know how many tokens are retained per frame. This also means the number of tokens per frame set in _get_prompt_updates's call to get_video_repl doesn't matter, as that text replacement is created only to set the correct number of tokens. Hence, all that matters in that call is the total number of retained video tokens, which is known at that stage based on video_pruning_rate.

Joint work with @BloodAxe

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

gemini-code-assist

Code Review

This pull request adds support for Efficient Video Sampling (EVS) to the Nano Nemotron VL model. The implementation introduces a clever workaround to handle this model's specific use of text separators between video frames, which poses a challenge for the standard EVS workflow. The core of the solution involves manually merging video and text embeddings within the model's get_multimodal_embeddings method. Additionally, the EVS utility functions in vllm/multimodal/evs.py have been refactored for better generality, which is a positive change. The overall approach is sound, but I've identified a performance issue in the new code that could lead to unnecessary CPU-GPU synchronization.

vllm/model_executor/models/nano_nemotron_vl.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

vllm/model_executor/models/nano_nemotron_vl.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>

…er value to tokens_in_single_frame Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

vllm/model_executor/models/nano_nemotron_vl.py

…ry Tensor Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>

DarkLight1337

LGTM, thanks

Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>

Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com> Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com> Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>

Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com> Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com> Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com> Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com> Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>

Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com> Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com> Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com> Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com> Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>

Add EVS support for nano-v2-vlm

46041f2

Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

tomeras91 requested review from DarkLight1337, NickLucche, sighingnow and ywang96 as code owners October 6, 2025 01:28

tomeras91 mentioned this pull request Oct 6, 2025

[Model] EVS support for nano_nemotron_vl #26267

Closed

5 tasks

mergify bot added multi-modality Related to multi-modality (#4194) qwen Related to Qwen models labels Oct 6, 2025

gemini-code-assist bot reviewed Oct 6, 2025

View reviewed changes

vllm/model_executor/models/nano_nemotron_vl.py Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Oct 6, 2025

View reviewed changes

vllm/model_executor/models/nano_nemotron_vl.py Show resolved Hide resolved

tomeras91 and others added 2 commits October 6, 2025 04:31

No need to create tensor - use a tuple instead

fddfd0e

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>

tokens_per_frame was used twice with different meanings. Rename integ…

c5532c4

…er value to tokens_in_single_frame Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

DarkLight1337 reviewed Oct 6, 2025

View reviewed changes

vllm/model_executor/models/nano_nemotron_vl.py Outdated Show resolved Hide resolved

BloodAxe added 2 commits October 6, 2025 12:55

Replace select_text with from_seq and avoid construction of unnecessa…

fc85a2f

…ry Tensor Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>

Merge branch 'main' into evs-nano-nemotron-vlm

e9ba869

DarkLight1337 approved these changes Oct 6, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) October 6, 2025 10:17

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 6, 2025

Fix wrong use of variable when EVS is not enabled

4c8eee8

Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>

auto-merge was automatically disabled October 6, 2025 10:46
Head branch was pushed to by a user without write access

DarkLight1337 enabled auto-merge (squash) October 6, 2025 10:55

Add missing SupportsMultiModalPruning

b4b0f81

Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>

auto-merge was automatically disabled October 6, 2025 13:56
Head branch was pushed to by a user without write access

DarkLight1337 merged commit b8f603c into vllm-project:main Oct 6, 2025
53 checks passed

tomeras91 deleted the evs-nano-nemotron-vlm branch October 8, 2025 15:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Model] EVS support for nano_nemotron_vl #26269

[Model] EVS support for nano_nemotron_vl #26269

Uh oh!

tomeras91 commented Oct 6, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[Model] EVS support for nano_nemotron_vl #26269

[Model] EVS support for nano_nemotron_vl #26269

Uh oh!

Conversation

tomeras91 commented Oct 6, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tomeras91 commented Oct 6, 2025 •

edited by github-actions bot

Loading