[Bug]: Stuck request and empty streaming for gemma3 serving with ^v0.8.5

### Your current environment

ok

### 🐛 Describe the bug

I'm running VLLM with gemma3 and I've noticed with versions above (and including) v0.8.5, this model does not respond.

```
export MODEL_ID=ISTA-DASLab/gemma-3-27b-it-GPTQ-4b-128g
export MODEL_ID_PORT=8000
export MODEL_ID_GPU=0

docker run \
--runtime nvidia \
-e VLLM_USE_V1=0 \
--ipc=host \
-p "${MODEL_ID_PORT}:8000" \
--env "HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_HUB_TOKEN}" \
--env "CUDA_VISIBLE_DEVICES=${MODEL_ID_GPU}" \
-v "${HF_HOME}:/root/.cache/huggingface" \
-v "VLLM_LOGGING_LEVEL=${VLLM_LOGGING_LEVEL}" \
vllm/vllm-openai:v0.8.5 \
--model ${MODEL_ID} \
--tokenizer google/gemma-3-27b-it \
--gpu-memory-utilization 0.9 \
--max-model-len 32000 \
--max_num_seqs 8 \
--served-model-name ista-daslab-gemma-3-27b-it-gptq-4b-128g
```

This the example request:

```
curl http://localhost:8000/v1/chat/completions \
 -H "Content-Type: application/json" \
-d '{
    "stream": false,
    "model": "ista-daslab-gemma-3-27b-it-gptq-4b-128g",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": [
        {"type": "text", "text": "hello"}
        ]}
    ]
}'
```

And the response is never delivered. What is interesting is that the GPU KV Cache usage increases indefinetely:
```
INFO 05-05 07:16:33 [metrics.py:486] Avg prompt throughput: 1.9 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.1%, CPU KV cache usage: 0.0%.
INFO 05-05 07:16:38 [metrics.py:486] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 38.7 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.6%, CPU KV cache usage: 0.0%.
INFO 05-05 07:16:43 [metrics.py:486] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 38.6 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 1.1%, CPU KV cache usage: 0.0%.
INFO 05-05 07:16:48 [metrics.py:486] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 38.3 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 1.6%, CPU KV cache usage: 0.0%.
```

If we set stream to true, these responses come with empty content:

```
data: {"id":"chatcmpl-b5b346954cdb4fe087d15231250fbee4","object":"chat.completion.chunk","created":1746454706,"model":"ista-daslab-gemma-3-27b-it-gptq-4b-128g","choices":[{"index":0,"delta":{"content":""},"logprobs":null,"finish_reason":null}]}

data: {"id":"chatcmpl-b5b346954cdb4fe087d15231250fbee4","object":"chat.completion.chunk","created":1746454706,"model":"ista-daslab-gemma-3-27b-it-gptq-4b-128g","choices":[{"index":0,"delta":{"content":""},"logprobs":null,"finish_reason":null}]}

data: {"id":"chatcmpl-b5b346954cdb4fe087d15231250fbee4","object":"chat.completion.chunk","created":1746454706,"model":"ista-daslab-gemma-3-27b-it-gptq-4b-128g","choices":[{"index":0,"delta":{"content":""},"logprobs":null,"finish_reason":null}]}
```

When I downgrade to v0.8.4, it works normally. 

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: Stuck request and empty streaming for gemma3 serving with ^v0.8.5 #17658

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Stuck request and empty streaming for gemma3 serving with ^v0.8.5 #17658

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions