You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/features/reasoning_outputs.md
+35-8Lines changed: 35 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -76,7 +76,13 @@ Streaming chat completions are also supported for reasoning models. The `reasoni
76
76
}
77
77
```
78
78
79
-
Please note that it is not compatible with the OpenAI Python client library. You can use the `requests` library to make streaming requests.
79
+
Please note that it is not compatible with the OpenAI Python client library. You can use the `requests` library to make streaming requests. You could checkout the [example](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/openai_chat_completion_with_reasoning_streaming.py).
80
+
81
+
## Limitations
82
+
83
+
- The reasoning content is only available for online serving's chat completion endpoint (`/v1/chat/completions`).
84
+
- It is not compatible with [`tool_calling`](#tool_calling).
85
+
- The reasoning content is not available for all models. Check the model's documentation to see if it supports reasoning.
80
86
81
87
## How to support a new reasoning model
82
88
@@ -137,15 +143,36 @@ class ExampleParser(ReasoningParser):
137
143
"""
138
144
```
139
145
140
-
After defining the reasoning parser, you can use it by specifying the `--reasoning-parser` flag when making a request to the chat completion endpoint.
146
+
Additionally, to enable structured output, you'll need to create a new `Reasoner` similar to the one in `vllm/model_executor/guided_decoding/reasoner/deepseek_reasoner.py`.
The structured output engine like xgrammar will use `end_token_id` to check if the reasoning content is present in the model output and skip the structured output if it is the case.
172
+
173
+
Finally, you can enable reasoning for the model by using the `--enable-reasoning` and `--reasoning-parser` flags.
141
174
142
175
```bash
143
176
vllm serve <model_tag> \
144
177
--enable-reasoning --reasoning-parser example
145
178
```
146
-
147
-
## Limitations
148
-
149
-
- The reasoning content is only available for online serving's chat completion endpoint (`/v1/chat/completions`).
150
-
- It is not compatible with the [`structured_outputs`](#structured_outputs) and [`tool_calling`](#tool_calling) features.
151
-
- The reasoning content is not available for all models. Check the model's documentation to see if it supports reasoning.
0 commit comments