Commit 879d1e2

committed

[V1][Molmo] Fix get_multimodal_embeddings() in molmo.py

Expected: get_multimodal_embeddings() should return list[Tensor] for `GPUModelRunner` to iterate. Actual: prious to this PR, molmo's _get_mm_embeds() returns a list thus get_multimodal_embeddings() returns a list of list. This is reproducible when all of following satisfy: * more than one request * the tailing part of each request is a bit different, to trigger partial cache hit This PR also updates vision_language.py to help reproduce. Tested with: ``` VLLM_USE_V1=1 \ python examples/offline_inference/vision_language.py \ --model molmo \ --num-prompts=2 \ --use-different-prompt-per-request ``` Signed-off-by: Linkun Chen <github@lkchen.net>

1 parent a7f3731 commit 879d1e2Copy full SHA for 879d1e2

2 files changed

+190

-113

lines changed

examples/offline_inference
- vision_language.py
vllm/model_executor/models
- molmo.py

2 files changed

+190

-113

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Commit 879d1e2

2 files changed

2 files changed

Uh oh!

File tree

2 files changed

2 files changed

0 commit comments