[Bugfix] Standardize merging multimodal embeddings #26771

DarkLight1337 · 2025-10-14T07:01:15Z

Purpose

Standardize how multimodal embeddings from different modalities are merged in get_multimodal_embeddings:

Convert to tuple before assigning to the output to handle the case when the embeddings are tensors
Rename vision_embeddings to image_embeddings to avoid confusing with video_embeddings

FIX #26749

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

gemini-code-assist

Code Review

This pull request standardizes the merging of multimodal embeddings across various models. The changes involve renaming vision_embeddings to image_embeddings for better clarity and ensuring that embeddings are converted to a tuple before being concatenated. This improves code robustness by preventing potential TypeError exceptions when the underlying processing functions return lists instead of tuples. The changes are correct and improve the overall consistency and reliability of the codebase. No issues were found in this pull request.

BlueBlueFF · 2025-10-14T10:16:23Z

这里没有对image_embeddings的类型和维度校验？直接变tuple是否有测试各种情况下的兼容性

DarkLight1337 · 2025-10-14T10:21:23Z

We check ndim inside MultiModalDataParser already. The logic should work for both list of ndim=2 tensors and a single ndim=3 tensor.

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: bbartels <benjamin@bartels.dev>

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

[Bugfix] Standardize merging multimodal embeddings

0b174cb

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 requested review from Isotr0py and ywang96 October 14, 2025 07:01

DarkLight1337 requested a review from sighingnow as a code owner October 14, 2025 07:01

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 14, 2025

DarkLight1337 mentioned this pull request Oct 14, 2025

[Bug]: InternVL: passing image embeddings triggers TypeError: can only concatenate tuple (not "Tensor") to tuple in get_multimodal_embeddings, and v1 sanity check then expects a sequence of 2D tensors #26749

Closed

1 task

mergify bot added the qwen Related to Qwen models label Oct 14, 2025

gemini-code-assist bot reviewed Oct 14, 2025

View reviewed changes

Isotr0py approved these changes Oct 14, 2025

View reviewed changes

Isotr0py enabled auto-merge (squash) October 14, 2025 07:06

DarkLight1337 added the multi-modality Related to multi-modality (#4194) label Oct 14, 2025

DarkLight1337 disabled auto-merge October 14, 2025 07:52

DarkLight1337 enabled auto-merge (squash) October 14, 2025 07:52

DarkLight1337 merged commit d2f816d into vllm-project:main Oct 14, 2025
60 of 61 checks passed

DarkLight1337 deleted the std-embeds branch October 14, 2025 09:36

Dhruvilbhatt pushed a commit to Dhruvilbhatt/vllm that referenced this pull request Oct 14, 2025

[Bugfix] Standardize merging multimodal embeddings (vllm-project#26771)

3bd0586

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>

bbartels pushed a commit to bbartels/vllm that referenced this pull request Oct 16, 2025

[Bugfix] Standardize merging multimodal embeddings (vllm-project#26771)

1d81e12

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: bbartels <benjamin@bartels.dev>

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[Bugfix] Standardize merging multimodal embeddings (vllm-project#26771)

b2a01fd

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025

[Bugfix] Standardize merging multimodal embeddings (vllm-project#26771)

68e4c75

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

[Bugfix] Standardize merging multimodal embeddings (vllm-project#26771)

35dcf9b

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

[Bugfix] Standardize merging multimodal embeddings (vllm-project#26771)

287c8fa

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025

[Bugfix] Standardize merging multimodal embeddings (vllm-project#26771)

10affc3

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025

[Bugfix] Standardize merging multimodal embeddings (vllm-project#26771)

e763b24

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025

[Bugfix] Standardize merging multimodal embeddings (vllm-project#26771)

215feb7

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Zhathw pushed a commit to Zhathw/vllm that referenced this pull request Nov 12, 2025

[Bugfix] Standardize merging multimodal embeddings (vllm-project#26771)

9c5d068

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Standardize merging multimodal embeddings #26771

[Bugfix] Standardize merging multimodal embeddings #26771

Uh oh!

DarkLight1337 commented Oct 14, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

BlueBlueFF commented Oct 14, 2025

Uh oh!

DarkLight1337 commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[Bugfix] Standardize merging multimodal embeddings #26771

[Bugfix] Standardize merging multimodal embeddings #26771

Uh oh!

Conversation

DarkLight1337 commented Oct 14, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

BlueBlueFF commented Oct 14, 2025

Uh oh!

DarkLight1337 commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DarkLight1337 commented Oct 14, 2025 •

edited by github-actions bot

Loading