Skip to content

Commit 879d1e2

Browse files
committed
[V1][Molmo] Fix get_multimodal_embeddings() in molmo.py
Expected: get_multimodal_embeddings() should return list[Tensor] for `GPUModelRunner` to iterate. Actual: prious to this PR, molmo's _get_mm_embeds() returns a list thus get_multimodal_embeddings() returns a list of list. This is reproducible when all of following satisfy: * more than one request * the tailing part of each request is a bit different, to trigger partial cache hit This PR also updates vision_language.py to help reproduce. Tested with: ``` VLLM_USE_V1=1 \ python examples/offline_inference/vision_language.py \ --model molmo \ --num-prompts=2 \ --use-different-prompt-per-request ``` Signed-off-by: Linkun Chen <github@lkchen.net>
1 parent a7f3731 commit 879d1e2

File tree

2 files changed

+190
-113
lines changed

2 files changed

+190
-113
lines changed

0 commit comments

Comments
 (0)