Skip to content

[Bug]: reasoning-parser=deepseek_r1 wrong output with enable_thinking=False #19222

@andrePankraz

Description

@andrePankraz

Your current environment

Standard vllm Docker Container 0.9.0.1 with setup

services:
  vllm-qwen3-32b:
    image: vllm/vllm-openai:v0.9.0.1
    container_name: vllm-qwen3-32b
    environment:
      - HF_TOKEN=$HF_TOKEN
      - VLLM_NO_USAGE_STATS=1
    ipc: host
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ['0', '1']
              capabilities: [ gpu ]
    network_mode: host
    volumes:
      - /mnt/sda/huggingface:/root/.cache/huggingface
      - .:/opt/vllm
    command:
      - --port=8000
      - --disable-log-requests
      - --model=Qwen/Qwen3-32B
      - --tensor-parallel-size=2
      - --gpu-memory-utilization=0.90
      - --swap-space=5
      - --reasoning-parser=deepseek_r1
    restart: unless-stopped

🐛 Describe the bug

In a chatbot we can dynamically decide for Qwen3, if reasoning / thinking is necessary or not.

We set "chat_template_kwargs": {"enable_thinking": false}, if we want to deactivate reasoning in request (we don't use nothink tag, it's not reliable)

With enable_thinking=false and JSON guided sampling, the message.content is empty and the content goes misformatted into message.reasoning_content. Example:

$ curl http://ai1.dev.init:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
    "model": "Qwen/Qwen3-32B",
    "temperature": 0.6,
    "max_tokens": 500,
    "response_format": {
      "type": "json_object",
      "schema": {
        "$schema": "http://json-schema.org/draft-07/schema#",
        "title": "BirdPuzzleResponse",
        "type": "object",
        "properties": {
          "answer": {
            "type": "integer",
            "description": "Number of birds remaining in the tree"
          },
          "explanation": {
            "type": "string",
            "description": "Brief explanation of the reasoning"
          }
        },
        "required": ["answer", "explanation"],
        "additionalProperties": false
      }
    },
    "messages": [
      {
        "role": "system",
        "content": "Reply ONLY with JSON that satisfies the provided schema."
      },
      {
        "role": "user",
        "content": "There are 9 birds in the tree; a hunter shoots one. How many birds are left?"
      }
    ],
    "chat_template_kwargs": {"enable_thinking": false}
  }'

Result:

{
  "id": "chatcmpl-5c085b2bd5c942169fa462a7db26a00d",
  "object": "chat.completion",
  "created": 1749142128,
  "model": "Qwen/Qwen3-32B",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "reasoning_content": "```{\n  \"answer\": 0\n  }",
        "content": null,
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop",
      "stop_reason": null
    }
  ],
  "usage": { "prompt_tokens": 47, "total_tokens": 59, "completion_tokens": 12, "prompt_tokens_details": null },
  "prompt_logprobs": null,
  "kv_transfer_params": null
}

Without {"enable_thinking": false} (or true), we get the expected response format:

{
  "id": "chatcmpl-091059fc030d44cebd2a82662b7e9295",
  "object": "chat.completion",
  "created": 1749142374,
  "model": "Qwen/Qwen3-32B",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "reasoning_content": "\nOkay, let's see. The problem says there are 9 birds in a tree, and a hunter shoots one. The question is, how many birds are left? Hmm.\n\nFirst, I need to think about what happens when a bird is shot. If a bird is shot, it would likely die and fall out of the tree. So, the number of birds remaining in the tree would be 9 minus 1, which is 8. But wait, maybe there's a trick here. Sometimes these riddles play on assumptions. For example, maybe the other birds would fly away when the shot is fired. If the hunter shoots one bird, the loud noise might scare the others, so they all leave the tree. In that case, there would be zero birds left. \n\nWait, the problem doesn't specify whether the other birds stay or fly away. But in typical riddles like this, the answer often relies on the assumption that the remaining birds would flee. So, if the hunter shoots one, the others get scared and fly off. So the answer would be zero. But I should check if that's the common answer. Alternatively, maybe the question is straightforward and expects a simple subtraction. \n\nBut considering it's a riddle, the trick is probably that after shooting one, the others fly away. So the answer is zero. Let me confirm. If you have 9 birds in a tree and one is shot, the rest would be gone because of the noise. So the answer is 0. Yeah, that makes sense. So the JSON should have the answer as 0.\n",
        "content": "{\"answer\": 0}",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop",
      "stop_reason": null
    }
  ],
  "usage": { "prompt_tokens": 43, "total_tokens": 378, "completion_tokens": 335, "prompt_tokens_details": null },
  "prompt_logprobs": null,
  "kv_transfer_params": null
}

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaleOver 90 days of inactivity

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions