Current environment
Kubernetes Cluster on Azure with A100 GPUs
Bug
Hello team,
After upgrading the Docker image from vllm-openai:v0.8.4 to v0.8.5, I observed one issue when running the google/gemma-3-27b-it model (Hugging Face Model Link).
The model successfully returns metadata (e.g., finish reason, token usage), but the content field in the response is consistently an empty string. No changes were made to the Kubernetes deployment manifest apart from the image version bump.
When reverting to v0.8.4, the model responds correctly with expected text completions, confirming that the issue is specific to the new image version.
Steps to Reproduce:
- 
Deploy vllm-openai:v0.8.5 with the gemma-3-27b-it model.
 
- 
Send a chat completion request.
 
- 
Observe that the content field is empty in the response.