[Bug]: No output / Repeated outputs when using Gemma 3  on vLLM

### Your current environment

<details>
<summary>The output of <code>python collect_env.py</code></summary>

```text
Your output of `python collect_env.py` here
```

</details>


### 🐛 Describe the bug

I'm running the google/gemma-3-27b-it model with vLLM using the OpenAI-compatible API server.

```bash
CUDA_VISIBLE_DEVICES=0 VLLM_USE_V1=1 python /opt/VLLM/vllm/vllm/entrypoints/openai/api_server.py \
--model /opt/MODELS/gemma-3-27b-it/ \
--max-model-len 32000 \
--host 10.12.112.168 \
--port 9005 \
--tensor-parallel-size 1 \
--gpu_memory_utilization 0.9
```

Then, I send a standard request to the /v1/chat/completions endpoint using Python:

```python
import requests
import json

url = "http://10.12.112.168:9005/v1/chat/completions"

data = {
    "model": "/opt/MODELS/gemma-3-27b-it/",
    "messages": [
        {"role": "user", "content": "hello"}
    ],
    "temperature": 0.1,
    "max_tokens": 500,
    "enable_thinking": False
}

headers = {
    "Content-Type": "application/json"
}

response = requests.post(url, headers=headers, data=json.dumps(data))
result = response.json()
print(result['choices'][0]['message']['content'])
```

The request is processed, but the model fails to produce meaningful responses. It either:

outputs nothing,

or keeps repeating certain tokens or parts of the input (e.g., repeating “selamlar brom”).

This issue only happens with Gemma 3 IT models. I tested the exact same code and server setup with:

Qwen models 

Mistral models
...and they work perfectly. No repetition, and responses are coherent and aligned with the prompt.

So this looks like a Gemma-specific compatibility issue with /chat/completions, possibly due to missing or misaligned prompt formatting (e.g., lack of a compatible chat template?).

Let me know if there’s a known workaround or proper configuration required for Gemma models.

Thanks!



### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: No output / Repeated outputs when using Gemma 3 on vLLM #20341

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: No output / Repeated outputs when using Gemma 3 on vLLM #20341

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions