Skip to content

Commit c6b13e8

Browse files
committed
[TRTLLM-6577][feat] Support nano_v2_vlm in pytorch backend
* support cache reuse. Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
1 parent c6f2b30 commit c6b13e8

File tree

2 files changed

+5
-1
lines changed

2 files changed

+5
-1
lines changed

docs/source/reference/multimodal-feature-support-matrix.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
| LLaVA-NeXT | Yes | Yes | Yes | No |
99
| Llama 4 | Yes | Yes | No | No |
1010
| Mistral-Small-3.1 | Yes | Yes | No | No |
11+
| Nano-v2-VLM | Yes | Yes | Yes | No |
1112
| Phi-4-multimodal | Yes | Yes | No | No |
1213
| Qwen2-VL | Yes | Yes | Yes | No |
1314
| Qwen2.5-VL | Yes | Yes | Yes | No |

tensorrt_llm/_torch/models/modeling_nanov2vlm.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,8 @@
2020
from ..attention_backend import AttentionMetadata
2121
from ..model_config import ModelConfig
2222
from .modeling_auto import AutoModelForCausalLM
23-
from .modeling_multimodal_utils import fuse_input_embeds
23+
from .modeling_multimodal_utils import (find_uncached_mm_embeds,
24+
fuse_input_embeds)
2425
from .modeling_radio import RADIOVisionModel
2526
from .modeling_utils import register_auto_model
2627

@@ -394,6 +395,8 @@ def forward(
394395
multimodal_param.multimodal_data["multimodal_embedding"]
395396
for multimodal_param in multimodal_params
396397
]
398+
mm_embedding = find_uncached_mm_embeds(
399+
mm_embedding, multimodal_params[:num_context_requests])
397400
input_ids, input_embeds = fuse_input_embeds(
398401
self.llm.model.embed_tokens,
399402
input_ids,

0 commit comments

Comments
 (0)