Add RADIO Vision Encoder Support to vLLM #24595

danielafrimi · 2025-09-10T15:25:28Z

This PR implements support for the C-RADIO (Retrieval-Augmented Dual Instruction Optimization) vision encoder in vLLM, enabling its use with multimodal models like Nano Nemotron VL.

Changes

New Radio Model Implementation (vllm/model_executor/models/radio.py)

RadioInternVisionModel: Core vision model using InternVision encoder architecture

Integration Updates (vllm/model_executor/models/nano_nemotron_vl.py)

Updated Nano Nemotron VL to use the new RadioModel for vision processing

Testing (tests/models/multimodal/pooling/test_radio.py)

Comprehensive tests for RADIO model with nvidia/C-RADIOv2-H
Validates output consistency between HuggingFace and vLLM implementations

Technical Notes

Hardcoded Values: The implementation preserves hardcoded values from the original timm package implementation, including OpenAI CLIP normalization constants and predefined ViT model dimensions, ensuring compatibility and reproducibility.

Configuration: Create new configuration approach to instantiate the Radio model based on InterVision model architecture, with dynamic parameter mapping for different ViT variants.

Weight Loading: Custom weight loader handles mapping between HuggingFace and vLLM parameter names, supporting models with radio_model. prefix while skipping unused parameters.

tests/models/multimodal/pooling/test_radio.py

gemini-code-assist

Code Review

This pull request adds support for the RADIO vision encoder, enabling its use in multimodal models like Nano Nemotron VL. The changes include a new RadioModel implementation, integration into NanoNemotronVL, and corresponding tests. While the implementation is comprehensive, there are a few critical issues that need to be addressed. A potential crash due to unsafe dictionary access in the configuration helper needs to be fixed. The vLLM implementation of RadioInternVisionModel is missing a final normalization layer present in the original model, which will lead to incorrect outputs. Additionally, a bug in the test file could lead to incorrect or inefficient test execution. There are also opportunities to make the weight loading logic more robust by handling unexpected weights.

tests/models/multimodal/pooling/test_radio.py

vllm/model_executor/models/nano_nemotron_vl.py

gemini-code-assist · 2025-09-10T15:34:06Z

vllm/model_executor/models/radio.py

The RadioInternVisionModel implementation is missing the final normalization layer that is present in the original HuggingFace RadioInternVisionModel. The original model applies a norm layer after the encoder. This omission will lead to incorrect model outputs.

Additionally, the load_weights method in RadioModel silently ignores weights that it doesn't recognize, including the weights for this missing normalization layer (model.norm.weight and model.norm.bias). This makes the issue harder to detect.

You should add the final normalization layer to RadioInternVisionModel and update RadioModel.load_weights to handle its weights.

vllm/model_executor/models/nano_nemotron_vl.py

tests/models/multimodal/pooling/test_radio.py

vllm/model_executor/models/radio.py

danielafrimi · 2025-09-15T08:16:18Z

@DarkLight1337 Fixed the comments.

mergify · 2025-09-16T12:42:46Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @danielafrimi.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: root <root@cw-dfw-h100-001-305-026.cm.cluster> Signed-off-by: <> cr Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com> cr Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>

vllm/model_executor/models/radio.py

Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>

vllm/model_executor/models/radio.py

Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>

DarkLight1337

LGTM now, thanks

Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com> Co-authored-by: root <root@cw-dfw-h100-001-305-026.cm.cluster>

Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com> Co-authored-by: root <root@cw-dfw-h100-001-305-026.cm.cluster> Signed-off-by: charlifu <charlifu@amd.com>

Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com> Co-authored-by: root <root@cw-dfw-h100-001-305-026.cm.cluster> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com> Co-authored-by: root <root@cw-dfw-h100-001-305-026.cm.cluster>

Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com> Co-authored-by: root <root@cw-dfw-h100-001-305-026.cm.cluster> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

danielafrimi requested review from DarkLight1337 and ywang96 as code owners September 10, 2025 15:25

mergify bot added the multi-modality Related to multi-modality (#4194) label Sep 10, 2025

danielafrimi commented Sep 10, 2025

View reviewed changes

tests/models/multimodal/pooling/test_radio.py Outdated Show resolved Hide resolved

gemini-code-assist bot reviewed Sep 10, 2025

View reviewed changes

danielafrimi commented Sep 10, 2025

View reviewed changes

tests/models/multimodal/pooling/test_radio.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Sep 10, 2025

View reviewed changes