[Model] EVS support for nano_nemotron_vl #26267

tomeras91 · 2025-10-06T01:04:47Z

Purpose

Add support for EVS (Efficient Video Sampling, introduced in #22980) for Nano Nemotron VL model.

Contrary to other multimodal models (for example, Qwen2.5-VL) Nano Nemotron VL uses text frame separators. These are not placeholders, and should not be replaced by video embeddings. For example, if a video has 2 frames, each with 5 tokens, the video text replacement will be:

Frame0: <img><image><image><image><image><image></img>Frame2: <img><image><image><image><image><image></img>

This poses a challenge to EVS, since it means we need to know the number of tokens per frame when creating the text replacement. In the standard vLLM flow, this is impossible since the replacement is created before the forward pass through the vision encoder, and EVS selects video tokens to prune based on the vision encoder outputs (which obviously are known only after the forward pass through the vision encoder).

We worked around this issue by signaling get_input_embeddings of the SupportsMultiModal protocol to not replace any of the video text replacement (done by setting embed_text=seq in get_video_repl when creating PromptUpdateDetails), and handle embedding merging ourselves in NemotronH_Nano_VL_V2.get_multimodal_embeddings, where we already know how many tokens are retained per frame. This also means the number of tokens per frame set in _get_prompt_updates's call to get_video_repl doesn't matter, as that text replacement is created only to set the correct number of tokens. Hence, all that matters in that call is the total number of retained video tokens, which is known at that stage based on video_pruning_rate.

Joint work with @BloodAxe

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request adds support for Efficient Video Sampling (EVS) to the nano_nemotron_vl model. The implementation introduces a clever workaround to accommodate this model's unique video frame processing, which involves text separators, by handling embedding merging within get_multimodal_embeddings. The changes also include a beneficial refactoring of the EVS utility functions, improving their reusability and fixing a potential bug. The overall approach is sound and the implementation appears correct. I have one suggestion to improve robustness by replacing an assert statement with a more explicit ValueError for input validation in a critical path.

gemini-code-assist · 2025-10-06T01:06:52Z

vllm/model_executor/models/nano_nemotron_vl.py

+        # TODO: Maybe this can be optimized to avoid the loop?
+        for i, single_video_embeddings in enumerate(video_embeddings):
+            num_frames = video_input["num_patches"][i].item()
+            assert single_video_embeddings.shape[0] % num_frames == 0


Using assert for input validation in production code can be risky. Assertions can be disabled with Python's -O flag, and they raise a generic AssertionError. It's better to use an explicit if check and raise a ValueError with a descriptive message. This makes the code more robust against unexpected or malformed inputs and prevents potential server crashes.

if single_video_embeddings.shape[0] % num_frames != 0: raise ValueError( f"The number of video embeddings ({single_video_embeddings.shape[0]}) " f"is not divisible by the number of frames ({num_frames})." )

…rge (vllm-project#25331) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: isotr0py <2037008807@qq.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

…okens_count so code can be reused in nano_nemotrron_vl which doesn't have thw Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

This reverts commit c5dad7e. Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

…M mechanism: 1. get_video_repl now doesn't mask the indicator tokens - it signals vLLM to replace all placeholder embeddings with the video embeddings returned by get_multimodal_embeddings 2. get_multimodal_embeddings handles interleaving video embeddings with text embeddings for indicator tokens 3. This is done by creating the video replacement text again in get_multimodal_embeddings, tokenizing it, and masking the indicator tokens. Indicator tokens embeddings are calculated by calling self.language_model.get_input_embeddings() directly 4. The tokenizer was added to NemotronH_Nano_VL_V2, to allow for tokenizing in get_multimodal_embeddings() Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

…f NemotronH_Nano_VL_V2 Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

…#25827) Signed-off-by: yingjun-mou <renzomou@gmail.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

…d models (vllm-project#25854) Signed-off-by: zhoukz <me@zhoukz.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

…value (vllm-project#25868) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

Signed-off-by: Chenxi Yang <cxyang@fb.com> Co-authored-by: Chenxi Yang <cxyang@fb.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

…llm-project#25883) Signed-off-by: Rahul Tuli <rtuli@redhat.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

…test (vllm-project#25885) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

vllm-project#25706) Signed-off-by: Lee Nau <lnau@nvidia.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

…tion (vllm-project#25513) Signed-off-by: adabeyta <aabeyta@redhat.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

vllm-project#25819) Signed-off-by: Naman Lalit <nl2688@nyu.edu> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

mergify · 2025-10-06T01:18:32Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @tomeras91.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

tomeras91 · 2025-10-06T01:28:50Z

closed due to rebase havoc. Replaced by #26269

tomeras91 requested review from DarkLight1337, NickLucche, sighingnow and ywang96 as code owners October 6, 2025 01:04

mergify bot added multi-modality Related to multi-modality (#4194) qwen Related to Qwen models labels Oct 6, 2025

gemini-code-assist bot reviewed Oct 6, 2025

View reviewed changes

Isotr0py and others added 23 commits October 6, 2025 04:06

[V0 Deprecation][Models] Remove all V0 condition for mm embeddings me…

c9388c7

…rge (vllm-project#25331) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: isotr0py <2037008807@qq.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

[Misc] Remove more get_input_embeddings_v0 (vllm-project#25857)

86502dc

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

refactor - pass tokens_per_frame and num_frames to compute_retained_t…

219bc0b

…okens_count so code can be reused in nano_nemotrron_vl which doesn't have thw Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

WIP - commit with all commented code

23a205f

Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

Revert "WIP - commit with all commented code"

e8fd68a

This reverts commit c5dad7e. Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

support multiple videos in a batch (and better typehints)

69ea5b8

Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

Add EVS TODOs

0adec4b

Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

access tokenizer only when needed instead of saving it as attribute o…

d1a4d41

…f NemotronH_Nano_VL_V2 Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

Fix issue with using top-left tile instead of thumbnail tile

a7417d0

Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

Seemingly working version of Nano 2 with EVS

20fbfd7

Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

remove debug script

46a2847

Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

update to latest deepgemm for dsv3.2 (vllm-project#25871)

f9bf392

Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

[Bugfix] Fix requirements paths in install instructions (vllm-project…

8003828

…#25827) Signed-off-by: yingjun-mou <renzomou@gmail.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

[Model][Bugfix] Fix issues in MiDashengLM implementation for quantize…

d4dc907

…d models (vllm-project#25854) Signed-off-by: zhoukz <me@zhoukz.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

[torch.compile] serialize cudagraph_mode as its enum name instead of …

0261d11

…value (vllm-project#25868) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

[Nixl][P/D] Add cuda2cpu support (HD->DH transfer) (vllm-project#24690)

55327c7

Signed-off-by: Chenxi Yang <cxyang@fb.com> Co-authored-by: Chenxi Yang <cxyang@fb.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

[Bugfix][Speculative Decoding] Fix Eagle3 quantization config issue (v…

c693625

…llm-project#25883) Signed-off-by: Rahul Tuli <rtuli@redhat.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

[CI/Build] Include Transformers backend test in nightly transformers …

daaf453

…test (vllm-project#25885) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

[Model] Remove MotifForCausalLM (vllm-project#25866)

671b93c

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

[Bugfix] Use correct key "ignore" for config.json non-quantized layers (

577110c

vllm-project#25706) Signed-off-by: Lee Nau <lnau@nvidia.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

[BugFix][torch.compile] KV scale calculation issues with FP8 quantiza…

21face0

…tion (vllm-project#25513) Signed-off-by: adabeyta <aabeyta@redhat.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

[Doc] Add documentation for vLLM continuous benchmarking and profiling (

2e38ecf

vllm-project#25819) Signed-off-by: Naman Lalit <nl2688@nyu.edu> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

github-project-automation bot added this to gpt-oss Issues & Enhancements Oct 6, 2025

mergify bot added the structured-output label Oct 6, 2025

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Oct 6, 2025

mergify bot added speculative-decoding v1 labels Oct 6, 2025

github-project-automation bot added this to Structured Output Oct 6, 2025

mergify bot added tpu Related to Google TPUs tool-calling labels Oct 6, 2025

github-project-automation bot added this to Tool Calling Oct 6, 2025

mergify bot added the needs-rebase label Oct 6, 2025

mergify bot assigned sangstar Oct 6, 2025

mergify bot added the kv-connector label Oct 6, 2025

Merge branch 'main' into evs-nano-nemotron-vl

4ba5334

Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

mergify bot removed tpu Related to Google TPUs needs-rebase labels Oct 6, 2025

tomeras91 closed this Oct 6, 2025

github-project-automation bot moved this to Done in Structured Output Oct 6, 2025

github-project-automation bot moved this from To Triage to Done in gpt-oss Issues & Enhancements Oct 6, 2025

github-project-automation bot moved this to Done in Tool Calling Oct 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Model] EVS support for nano_nemotron_vl #26267

[Model] EVS support for nano_nemotron_vl #26267

Uh oh!

tomeras91 commented Oct 6, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 6, 2025

Uh oh!

mergify bot commented Oct 6, 2025

Uh oh!

tomeras91 commented Oct 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

99 participants

Uh oh!

[Model] EVS support for nano_nemotron_vl #26267

[Model] EVS support for nano_nemotron_vl #26267

Uh oh!

Conversation

tomeras91 commented Oct 6, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Oct 6, 2025

Uh oh!

tomeras91 commented Oct 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

99 participants

tomeras91 commented Oct 6, 2025 •

edited by github-actions bot

Loading