You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/features/reasoning_outputs.md
+8-4Lines changed: 8 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,3 +1,7 @@
1
+
---
2
+
title: reasoning_outputs
3
+
---
4
+
1
5
# Reasoning Outputs
2
6
3
7
vLLM offers support for reasoning models like [DeepSeek R1](https://huggingface.co/deepseek-ai/DeepSeek-R1), which are designed to generate outputs containing both reasoning steps and final conclusions.
@@ -10,11 +14,11 @@ vLLM currently supports the following reasoning models:
10
14
11
15
| Model Series | Parser Name | Structured Output Support | Tool Calling |
The next example shows how to use the `guided_regex`. The idea is to generate an email address, given a simple regex template:
53
+
The next example shows how to use the `regex`. The idea is to generate an email address, given a simple regex template:
54
54
55
55
??? code
56
56
@@ -63,18 +63,18 @@ The next example shows how to use the `guided_regex`. The idea is to generate an
63
63
"content": "Generate an example email address for Alan Turing, who works in Enigma. End in .com and new line. Example result: alan.turing@enigma.com\n",
Copy file name to clipboardExpand all lines: docs/features/tool_calling.md
+5-6Lines changed: 5 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -71,7 +71,7 @@ This example demonstrates:
71
71
* Making a request with `tool_choice="auto"`
72
72
* Handling the structured response and executing the corresponding function
73
73
74
-
You can also specify a particular function using named function calling by setting `tool_choice={"type": "function", "function": {"name": "get_weather"}}`. Note that this will use the guided decoding backend - so the first time this is used, there will be several seconds of latency (or more) as the FSM is compiled for the first time before it is cached for subsequent requests.
74
+
You can also specify a particular function using named function calling by setting `tool_choice={"type": "function", "function": {"name": "get_weather"}}`. Note that this will use the structured outputs backend - so the first time this is used, there will be several seconds of latency (or more) as the FSM is compiled for the first time before it is cached for subsequent requests.
75
75
76
76
Remember that it's the caller's responsibility to:
77
77
@@ -83,19 +83,18 @@ For more advanced usage, including parallel tool calls and different model-speci
83
83
84
84
## Named Function Calling
85
85
86
-
vLLM supports named function calling in the chat completion API by default. It does so using Outlines through guided decoding, so this is
87
-
enabled by default and will work with any supported model. You are guaranteed a validly-parsable function call - not a
86
+
vLLM supports named function calling in the chat completion API by default. This should work with most structured outputs backend supported by vLLM. You are guaranteed a validly-parsable function call - not a
88
87
high-quality one.
89
88
90
-
vLLM will use guided decoding to ensure the response matches the tool parameter object defined by the JSON schema in the `tools` parameter.
91
-
For best results, we recommend ensuring that the expected output format / schema is specified in the prompt to ensure that the model's intended generation is aligned with the schema that it's being forced to generate by the guided decoding backend.
89
+
vLLM will use structured outputs to ensure the response matches the tool parameter object defined by the JSON schema in the `tools` parameter.
90
+
For best results, we recommend ensuring that the expected output format / schema is specified in the prompt to ensure that the model's intended generation is aligned with the schema that it's being forced to generate by the structured outputs backend.
92
91
93
92
To use a named function, you need to define the functions in the `tools` parameter of the chat completion request, and
94
93
specify the `name` of one of the tools in the `tool_choice` parameter of the chat completion request.
95
94
96
95
## Required Function Calling
97
96
98
-
vLLM supports the `tool_choice='required'` option in the chat completion API. Similar to the named function calling, it also uses guided decoding, so this is enabled by default and will work with any supported model. The guided decoding features for `tool_choice='required'` (such as JSON schema with `anyOf`) are currently only supported in the V0 engine with the guided decoding backend `outlines`. However, support for alternative decoding backends are on the [roadmap](../usage/v1_guide.md#features) for the V1 engine.
97
+
vLLM supports the `tool_choice='required'` option in the chat completion API. Similar to the named function calling, it also uses structured outputs, so this is enabled by default and will work with any supported model. However, support for alternative decoding backends are on the [roadmap](../usage/v1_guide.md#features) for the V1 engine.
99
98
100
99
When tool_choice='required' is set, the model is guaranteed to generate one or more tool calls based on the specified tool list in the `tools` parameter. The number of tool calls depends on the user's query. The output format strictly follows the schema defined in the `tools` parameter.
0 commit comments