-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
Closed as not planned
Closed as not planned
Copy link
Labels
bugSomething isn't workingSomething isn't workingstaleOver 90 days of inactivityOver 90 days of inactivity
Description
Your current environment
Standard vllm Docker Container 0.9.0.1 with setup
services:
vllm-qwen3-32b:
image: vllm/vllm-openai:v0.9.0.1
container_name: vllm-qwen3-32b
environment:
- HF_TOKEN=$HF_TOKEN
- VLLM_NO_USAGE_STATS=1
ipc: host
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['0', '1']
capabilities: [ gpu ]
network_mode: host
volumes:
- /mnt/sda/huggingface:/root/.cache/huggingface
- .:/opt/vllm
command:
- --port=8000
- --disable-log-requests
- --model=Qwen/Qwen3-32B
- --tensor-parallel-size=2
- --gpu-memory-utilization=0.90
- --swap-space=5
- --reasoning-parser=deepseek_r1
restart: unless-stopped
🐛 Describe the bug
In a chatbot we can dynamically decide for Qwen3, if reasoning / thinking is necessary or not.
We set "chat_template_kwargs": {"enable_thinking": false}, if we want to deactivate reasoning in request (we don't use nothink tag, it's not reliable)
With enable_thinking=false and JSON guided sampling, the message.content is empty and the content goes misformatted into message.reasoning_content. Example:
$ curl http://ai1.dev.init:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "Qwen/Qwen3-32B",
"temperature": 0.6,
"max_tokens": 500,
"response_format": {
"type": "json_object",
"schema": {
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "BirdPuzzleResponse",
"type": "object",
"properties": {
"answer": {
"type": "integer",
"description": "Number of birds remaining in the tree"
},
"explanation": {
"type": "string",
"description": "Brief explanation of the reasoning"
}
},
"required": ["answer", "explanation"],
"additionalProperties": false
}
},
"messages": [
{
"role": "system",
"content": "Reply ONLY with JSON that satisfies the provided schema."
},
{
"role": "user",
"content": "There are 9 birds in the tree; a hunter shoots one. How many birds are left?"
}
],
"chat_template_kwargs": {"enable_thinking": false}
}'Result:
{
"id": "chatcmpl-5c085b2bd5c942169fa462a7db26a00d",
"object": "chat.completion",
"created": 1749142128,
"model": "Qwen/Qwen3-32B",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"reasoning_content": "```{\n \"answer\": 0\n }",
"content": null,
"tool_calls": []
},
"logprobs": null,
"finish_reason": "stop",
"stop_reason": null
}
],
"usage": { "prompt_tokens": 47, "total_tokens": 59, "completion_tokens": 12, "prompt_tokens_details": null },
"prompt_logprobs": null,
"kv_transfer_params": null
}Without {"enable_thinking": false} (or true), we get the expected response format:
{
"id": "chatcmpl-091059fc030d44cebd2a82662b7e9295",
"object": "chat.completion",
"created": 1749142374,
"model": "Qwen/Qwen3-32B",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"reasoning_content": "\nOkay, let's see. The problem says there are 9 birds in a tree, and a hunter shoots one. The question is, how many birds are left? Hmm.\n\nFirst, I need to think about what happens when a bird is shot. If a bird is shot, it would likely die and fall out of the tree. So, the number of birds remaining in the tree would be 9 minus 1, which is 8. But wait, maybe there's a trick here. Sometimes these riddles play on assumptions. For example, maybe the other birds would fly away when the shot is fired. If the hunter shoots one bird, the loud noise might scare the others, so they all leave the tree. In that case, there would be zero birds left. \n\nWait, the problem doesn't specify whether the other birds stay or fly away. But in typical riddles like this, the answer often relies on the assumption that the remaining birds would flee. So, if the hunter shoots one, the others get scared and fly off. So the answer would be zero. But I should check if that's the common answer. Alternatively, maybe the question is straightforward and expects a simple subtraction. \n\nBut considering it's a riddle, the trick is probably that after shooting one, the others fly away. So the answer is zero. Let me confirm. If you have 9 birds in a tree and one is shot, the rest would be gone because of the noise. So the answer is 0. Yeah, that makes sense. So the JSON should have the answer as 0.\n",
"content": "{\"answer\": 0}",
"tool_calls": []
},
"logprobs": null,
"finish_reason": "stop",
"stop_reason": null
}
],
"usage": { "prompt_tokens": 43, "total_tokens": 378, "completion_tokens": 335, "prompt_tokens_details": null },
"prompt_logprobs": null,
"kv_transfer_params": null
}Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingstaleOver 90 days of inactivityOver 90 days of inactivity
Type
Projects
Status
Done