Skip to content

[data][llm] - vLLMEngineStage has inconsistent field name for images (singular vs plural) #57978

@nrghosh

Description

@nrghosh

What happened + What you expected to happen

Description:

vLLMEngineStage has a mismatch between its declared optional input key and the actual implementation, causing silent failures when users provide multimodal data.

Location: ray/llm/_internal/batch/stages/vllm_engine_stage.py

The Inconsistency:

  • Line 693 (declared optional key): "images" (plural)
  • Line 264 (implementation): looks for "image" (singular)

Impact:

When users return images=images from preprocessing (matching the documentation), the code silently ignores it and sets image = [], resulting in empty multimodal data being sent to vLLM. This causes IndexError when vLLM tries to access image metadata.

Expected: Multimodal data should be passed to vLLM
Actual: Silent failure, empty multimodal data

Proposed Fix:

  1. fix documentation
  2. safeguard keys


**Workaround:**

Use `image=images` (singular) in preprocessing output.

### Versions / Dependencies

Ray 2.48.0, 2.49.0, and nightly (ray-llm)
Python 3.11, 3.12

### Reproduction script


**Reproduction:**

```python
def preprocess(row):
    images, videos = process_vision_info(chat_messages)
    return dict(
        prompt=prompt,
        images=images,  # ← Matches line 693 documentation
        sampling_params=dict(...)
    )

# Result: mm_inputs=[], mm_hashes=[], mm_positions=[] (empty)
# Error: IndexError: list index out of range at image_grid_thw[image_index][0]

Issue Severity

None

Metadata

Metadata

Assignees

Labels

bugSomething that is supposed to be working; but isn'tdataRay Data-related issuesdocsAn issue or change related to documentationllmstabilitytriageNeeds triage (eg: priority, bug/not-bug, and owning component)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions