Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
784c170
Implement granite reasoning parser for non streaming
alex-jw-brooks Mar 2, 2025
4438a38
Add granite reasoning parser to init pkg
alex-jw-brooks Mar 2, 2025
3278ca7
Add preliminary test for non streaming granite rparser
alex-jw-brooks Mar 2, 2025
07e58a8
Implement granite reasoning parser streaming
alex-jw-brooks Mar 4, 2025
6980ea8
Add additional granite reasoning parser tests
alex-jw-brooks Mar 4, 2025
f6ff0bc
Add docstrings for granite reasoning parser
alex-jw-brooks Mar 4, 2025
1f2f690
Add more streaming tests & cleanup
alex-jw-brooks Mar 4, 2025
2c9251c
Refactoring and code formatting
alex-jw-brooks Mar 4, 2025
1604ca9
Pass response seq length through message parsing
alex-jw-brooks Mar 4, 2025
118a051
Track parsed content as a bool
alex-jw-brooks Mar 4, 2025
5ac1c11
Add IBM 3.2 lang models to reasoning models
alex-jw-brooks Mar 4, 2025
2b871f1
Add note on thinking kwarg for granite reasoning
alex-jw-brooks Mar 4, 2025
721ab9f
Fix formatting
alex-jw-brooks Mar 4, 2025
6b79586
Add reasoning parser arg for granite
alex-jw-brooks Mar 6, 2025
e460b5e
Fix granite reasoning parser doc formatting
alex-jw-brooks Mar 6, 2025
658bf0a
Warn for unimplemented structured outputs reasoner
alex-jw-brooks Mar 6, 2025
75dcdd2
Add granite thinking note to reasoning docs
alex-jw-brooks Mar 10, 2025
ab83ec1
Update docs/source/features/reasoning_outputs.md
alex-jw-brooks Mar 10, 2025
a73c789
Revert precommit bullet formatting
alex-jw-brooks Mar 12, 2025
0c3dfa8
Merge branch 'main' into granite_reasoning
alex-jw-brooks Mar 26, 2025
da09717
Fix precommit
alex-jw-brooks Mar 26, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion docs/source/features/reasoning_outputs.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

vLLM offers support for reasoning models like [DeepSeek R1](https://huggingface.co/deepseek-ai/DeepSeek-R1), which are designed to generate outputs containing both reasoning steps and final conclusions.

Reasoning models return a additional `reasoning_content` field in their outputs, which contains the reasoning steps that led to the final conclusion. This field is not present in the outputs of other models.
Reasoning models return an additional `reasoning_content` field in their outputs, which contains the reasoning steps that led to the final conclusion. This field is not present in the outputs of other models.

## Supported Models

Expand All @@ -14,6 +14,9 @@ vLLM currently supports the following reasoning models:
|--------------|-------------|------------------|-------------|
| [DeepSeek R1 series](https://huggingface.co/collections/deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d) | `deepseek_r1` | `guided_json`, `guided_regex` | ❌ |
| [QwQ-32B](https://huggingface.co/Qwen/QwQ-32B) | `deepseek_r1` | `guided_json`, `guided_regex` | ✅ |
| [IBM Granite 3.2 language models](https://huggingface.co/collections/ibm-granite/granite-32-language-models-67b3bc8c13508f6d064cff9a) | `granite` | ❌ | ❌ |

- IBM Granite 3.2 reasoning is disabled by default; to enable it, you must also pass `thinking=True` in your `chat_template_kwargs`.

## Quickstart

Expand Down Expand Up @@ -43,6 +46,7 @@ model = models.data[0].id

# Round 1
messages = [{"role": "user", "content": "9.11 and 9.8, which is greater?"}]
# For granite, add: `extra_body={"chat_template_kwargs": {"thinking": True}}`
response = client.chat.completions.create(model=model, messages=messages)

reasoning_content = response.choices[0].message.reasoning_content
Expand Down Expand Up @@ -97,6 +101,7 @@ models = client.models.list()
model = models.data[0].id

messages = [{"role": "user", "content": "9.11 and 9.8, which is greater?"}]
# For granite, add: `extra_body={"chat_template_kwargs": {"thinking": True}}`
stream = client.chat.completions.create(model=model,
messages=messages,
stream=True)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@

# Round 1
messages = [{"role": "user", "content": "9.11 and 9.8, which is greater?"}]
# For granite, add: `extra_body={"chat_template_kwargs": {"thinking": True}}`
response = client.chat.completions.create(model=model, messages=messages)

reasoning_content = response.choices[0].message.reasoning_content
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@
model = models.data[0].id

messages = [{"role": "user", "content": "9.11 and 9.8, which is greater?"}]
# For granite, add: `extra_body={"chat_template_kwargs": {"thinking": True}}`
stream = client.chat.completions.create(model=model,
messages=messages,
stream=True)
Expand Down
Loading