Skip to content

[Bug]: No output / Repeated outputs when using Gemma 3 on vLLM #20341

@syngokhan

Description

@syngokhan

Your current environment

The output of python collect_env.py
Your output of `python collect_env.py` here

🐛 Describe the bug

I'm running the google/gemma-3-27b-it model with vLLM using the OpenAI-compatible API server.

CUDA_VISIBLE_DEVICES=0 VLLM_USE_V1=1 python /opt/VLLM/vllm/vllm/entrypoints/openai/api_server.py \
--model /opt/MODELS/gemma-3-27b-it/ \
--max-model-len 32000 \
--host 10.12.112.168 \
--port 9005 \
--tensor-parallel-size 1 \
--gpu_memory_utilization 0.9

Then, I send a standard request to the /v1/chat/completions endpoint using Python:

import requests
import json

url = "http://10.12.112.168:9005/v1/chat/completions"

data = {
    "model": "/opt/MODELS/gemma-3-27b-it/",
    "messages": [
        {"role": "user", "content": "hello"}
    ],
    "temperature": 0.1,
    "max_tokens": 500,
    "enable_thinking": False
}

headers = {
    "Content-Type": "application/json"
}

response = requests.post(url, headers=headers, data=json.dumps(data))
result = response.json()
print(result['choices'][0]['message']['content'])

The request is processed, but the model fails to produce meaningful responses. It either:

outputs nothing,

or keeps repeating certain tokens or parts of the input (e.g., repeating “selamlar brom”).

This issue only happens with Gemma 3 IT models. I tested the exact same code and server setup with:

Qwen models

Mistral models
...and they work perfectly. No repetition, and responses are coherent and aligned with the prompt.

So this looks like a Gemma-specific compatibility issue with /chat/completions, possibly due to missing or misaligned prompt formatting (e.g., lack of a compatible chat template?).

Let me know if there’s a known workaround or proper configuration required for Gemma models.

Thanks!

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions