[Bug]: IndexError: list index out of range on chunked prefill with speculative decoding

###  Environment

Engine Configuration:
```
V1 LLM engine (v0.9.1) with config:
- model='/tmp/model/'
- speculative_config=SpeculativeConfig(method='eagle3', model='/tmp/model/eagle_head/', num_spec_tokens=5)
- tensor_parallel_size=8
- pipeline_parallel_size=1
- quantization=compressed-tensors
- max_seq_len=131072
```

Prefix caching and chunked prefill are enabled by default V1 behavior. This issue also occurred in vLLM v0.8.5.post1 and has been hard to reproduce.

Model: `Llama-3.3-70B-Instruct`

Hardware: 8 H200 GPUs

### 🐛 Describe the bug

`IndexError` with chunked prefill and speculative decoding.

https://github.com/vllm-project/vllm/blob/9fb52e523abf7bdaf7e60cf2971edb5a1b13dc08/vllm/v1/worker/gpu_model_runner.py#L1612

### Error Logs

#### Worker Stack Trace (across multiple ranks):
```text
IndexError: list index out of range

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/vllm/v1/executor/multiproc_executor.py", line 522, in worker_busy_loop
    output = func(*args, **kwargs)
  
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  
  File "/usr/local/lib/python3.11/dist-packages/vllm/v1/worker/gpu_worker.py", line 293, in execute_model
    output = self.model_runner.execute_model(scheduler_output)
  
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  
  File "/usr/local/lib/python3.11/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1428, in execute_model
    next_token_id = req_state.get_token_id(seq_len)
  
  File "/usr/local/lib/python3.11/dist-packages/vllm/v1/worker/gpu_input_batch.py", line 53, in get_token_id
    return self.output_token_ids[idx - self.num_prompt_tokens]
```
#### Engine Core Stack Trace:
```text
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/vllm/v1/engine/core.py", line 508, in run_engine_core
    engine_core.run_busy_loop()
  
  File "/usr/local/lib/python3.11/dist-packages/vllm/v1/engine/core.py", line 535, in run_busy_loop
    self._process_engine_step()
  
  File "/usr/local/lib/python3.11/dist-packages/vllm/v1/engine/core.py", line 560, in _process_engine_step
    outputs, model_executed = self.step_fn()
  
  File "/usr/local/lib/python3.11/dist-packages/vllm/v1/engine/core.py", line 231, in step
    model_output = self.execute_model(scheduler_output)
  
  File "/usr/local/lib/python3.11/dist-packages/vllm/v1/engine/core.py", line 217, in execute_model
    raise err
  
  File "/usr/local/lib/python3.11/dist-packages/vllm/v1/engine/core.py", line 211, in execute_model
    return self.model_executor.execute_model(scheduler_output)
  
  File "/usr/local/lib/python3.11/dist-packages/vllm/v1/executor/multiproc_executor.py", line 163, in execute_model
    (output, ) = self.collective_rpc("execute_model")
  
  File "/usr/local/lib/python3.11/dist-packages/vllm/v1/executor/multiproc_executor.py", line 220, in collective_rpc
    result = get_response(w, dequeue_timeout)
  
  File "/usr/local/lib/python3.11/dist-packages/vllm/v1/executor/multiproc_executor.py", line 207, in get_response
    raise RuntimeError("Worker failed with error 'list index out of range'")
```
#### AsyncLLM Stack Trace:
```text
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/vllm/v1/engine/async_llm.py", line 379, in output_handler
    outputs = await engine_core.get_output_async()
  
  File "/usr/local/lib/python3.11/dist-packages/vllm/v1/engine/core_client.py", line 790, in get_output_async
    raise self._format_exception(outputs) from None
```
#### Final Error:
```text
vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
```

Scheduler State at Time of Error:
```text
SchedulerOutput(
    scheduled_new_reqs=[],
    scheduled_cached_reqs=[
        CachedRequestData(
            req_id='f1f4e063-786b-4cf9-8ebc-548f201ed419',
            resumed_from_preemption=false,
            new_token_ids=[t1],
            new_block_ids=[[]],
            num_computed_tokens=t2
        ),
        CachedRequestData(
            req_id='93ae9bbb-f390-4f02-9d16-ddcba4e3b6d2',
            resumed_from_preemption=false,
            new_token_ids=[t3],
            new_block_ids=[[t4]],
            num_computed_tokens=t5
        )
    ],
    num_scheduled_tokens={
        'f1f4e063-786b-4cf9-8ebc-548f201ed419': 6,
        '93ae9bbb-f390-4f02-9d16-ddcba4e3b6d2': 6
    },
    total_num_scheduled_tokens=12,
    scheduled_spec_decode_tokens={
        'f1f4e063-786b-4cf9-8ebc-548f201ed419': [t6, t7, t8, t9, t10],
        '93ae9bbb-f390-4f02-9d16-ddcba4e3b6d2': [t11, t12, t13, t14, t15]
    }
)
```

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: IndexError: list index out of range on chunked prefill with speculative decoding #20531

Environment

🐛 Describe the bug

Error Logs

Worker Stack Trace (across multiple ranks):

Engine Core Stack Trace:

AsyncLLM Stack Trace:

Final Error:

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: IndexError: list index out of range on chunked prefill with speculative decoding #20531

Description

Environment

🐛 Describe the bug

Error Logs

Worker Stack Trace (across multiple ranks):

Engine Core Stack Trace:

AsyncLLM Stack Trace:

Final Error:

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions