[Frontend] Support reasoning content for deepseek r1 #12473

gaocegege · 2025-01-27T16:30:49Z

Two CLI arguments are introduced for extensibility: --enable-reasoning and --reasoning-parser.

vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --enable-reasoning --reasoning-parser deepseek_r1

Test for full request:

from openai import OpenAI
client = OpenAI(
    base_url="http://100.104.240.69:8000/v1",
    api_key="token-abc123",
)

completion = client.chat.completions.create(
  model="deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B",
  messages = [{"role": "user", "content": "9.11 and 9.8, which is greater?"}]
)

print("reasoning_content:", completion.choices[0].message.reasoning_content)
print("content", completion.choices[0].message.content)

Test for stream request:

from openai import OpenAI
client = OpenAI(api_key="<DeepSeek API Key>", base_url="http://100.104.240.69:8000/v1")

# Round 1
messages = [{"role": "user", "content": "9.11 and 9.8, which is greater?"}]
response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B",
    messages=messages,
    stream=True
)

reasoning_content = ""
content = ""

for chunk in response:
    if chunk.choices[0].delta.reasoning_content:
        reasoning_content += chunk.choices[0].delta.reasoning_content
    else:
        content += chunk.choices[0].delta.content
    print(chunk)

print(f"Content: {content}")
print(f"Reasoning Content: {reasoning_content}")

These scripts are from https://api-docs.deepseek.com/guides/reasoning_model#api-example. However I encountered an issue with stream requests:

Traceback (most recent call last):
  File "main.py", line 16, in <module>
    if chunk.choices[0].delta.reasoning_content:
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../site-packages/pydantic/main.py", line 828, in __getattr__
    raise AttributeError(f'{type(self).__name__!r} object has no attribute {item!r}')
AttributeError: 'ChoiceDelta' object has no attribute 'reasoning_content'

Have to hack the openai python package to pass it. The RESTful API works well.

github-actions · 2025-01-27T16:31:09Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

DarkLight1337

To make sure this is working correctly, can you add some tests to the codebase? Would also be great to add a section to the docs explaining its usage!

cc @mgoin @K-Mistele

vllm/entrypoints/openai/reasoning_parsers/deepseek_r1_reasoning_parser.py

DarkLight1337 · 2025-01-28T03:45:55Z

Have to hack the openai python package to pass it. The RESTful API works well.

Yeah, I think we can just use requests library in the case where the API is incompatible with OpenAI spec.

gaocegege · 2025-01-28T04:14:57Z

To make sure this is working correctly, can you add some tests to the codebase? Would also be great to add a section to the docs explaining its usage!

cc @mgoin @K-Mistele

Will work on this.

Signed-off-by: Ce Gao <cegao@tensorchord.ai>

cksac · 2025-01-28T05:05:00Z

How this works together with Guided/Structured Outputs?
Can we apply guided generation after reasoning content? i.e. apply grammar after specific token, i.e. </think>

gaocegege · 2025-01-28T06:19:49Z

@cksac Hi, please checkout #12468 (comment)

I don’t think they can work together. Also, the DeepSeek API doesn’t do function calls or structured output for DeepSeek R1, just FYI.

If you enable the feature with structured output, the output will be only the structured output.

Signed-off-by: Ce Gao <cegao@tensorchord.ai>

gaocegege · 2025-01-28T07:33:16Z

tests/entrypoints/openai/reasoning_parsers/test_deepseekr1_reasoning_parser.py

+    expected_reasoning: str,
+    expected_content: str,
+):
+    tokenizer = AutoTokenizer.from_pretrained("facebook/opt-125m")


I'm not certain if we can accept this. I used a tokenizer to mimic the streaming output process, which might require some memory.

The DeepSeek tokenizer can be instantiated locally from the zip file they provide here: https://api-docs.deepseek.com/quick_start/token_usage

Though maybe this is worth upstreaming to Transformers?

I think we can use this:

tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1") output = tokenizer.tokenize(param_dict["output"])

I used opt-125m to conserve memory during testing because I was unsure about the resources available on the test machine. If we can tolerate around 500MB of memory usage, we can use the R1 tokenizer directly.

Signed-off-by: Ce Gao <cegao@tensorchord.ai>

gaocegege · 2025-01-28T07:55:18Z

@DarkLight1337 I added some unit tests, did not find how to add e2e tests, do we need add e2e cases for this feature? could you please guide me there?

Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

gaocegege · 2025-01-28T16:03:15Z

Thanks for your time.

I think OpenAI still hides reasoning?

Yes, OpenAI hides the reasoning content.

vercel AI SDK just extract reasoning from the streamed tokens by looking for the XML tags

I hadn’t realized they support reasoning. Thanks for bringing that to my attention!

https://github.com/vercel/ai/blob/main/packages/ai/core/middleware/extract-reasoning-middleware.ts

I will take a look too.

Signed-off-by: Ce Gao <cegao@tensorchord.ai>

docs/source/features/reasoning_outputs.md

tests/entrypoints/openai/reasoning_parsers/utils.py

Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

examples/online_serving/openai_chat_completion_with_reasoning.py

examples/online_serving/openai_chat_completion_with_reasoning_streaming.py

docs/source/features/reasoning_outputs.md

Co-authored-by: Michael Goin <mgoin@redhat.com>

Signed-off-by: Ce Gao <cegao@tensorchord.ai>

gaocegege · 2025-01-29T00:16:47Z

Comments are addressed, please take a look.
Thanks for your time 😄

mgoin

Excellent work incorporating feedback and producing nicely documented code, LGTM! I'd like @DarkLight1337 or @K-Mistele to sign off as well before landing

mgoin · 2025-01-29T00:31:08Z

vllm/entrypoints/openai/cli_args.py

+    parser.add_argument(
+        "--reasoning-parser",
+        type=str,
+        metavar="{" + ",".join(valid_reasoning_parsers) + "}",


Why couldn't this just be choices=valid_reasoning_parsers?

Here’s the trick adapted from the tool parser argument.

https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/cli_args.py#L216

I think it is to keep the consistency with other CLI arguments:

$ vllm serve --help ... --scheduling-policy {fcfs,priority} ... --task {auto,generate,embedding,embed,classify,score,reward} ... --tokenizer-mode {auto,slow,mistral}

gaocegege · 2025-01-29T00:38:32Z

The Vercel AI SDK has a design akin to this PR. It employs a regex to parse the entire output for synchronous generation requests, and we also use this method.

const regexp = new RegExp(`${openingTag}(.*?)${closingTag}`, 'gs');
      const matches = Array.from(text.matchAll(regexp));

      if (!matches.length) {
        return { text, ...rest };
      }

      const reasoning = matches.map(match => match[1]).join(separator);

For streaming requests, we also have a similar design that checks the start and end tokens in both the previous and delta text.

gaocegege · 2025-01-29T04:38:46Z

Thanks for your review 🥰

K-Mistele · 2025-01-29T05:19:36Z

Hey guys just now getting to this, glad to see it's been merged. For what it's worth, there have already been some discussions (ref. #11522 ) about refactoring tool parsers (possibly to use FSMs, which seems like it would be a good approach for managing tool reasoning parsing) instead of the current implementation which can tend to be buggy.

It seems like the direction we will want to move towards is having a single parser module for any given (supported) model that provides a unified interface for both tool parsing and reasoning parsing if one or both are supported by the model.

This would be more straightforward from a UX and a DX standpoint, and should result in cleaner code as well.

gaocegege · 2025-01-29T06:02:54Z

I think it's reasonable after a glance at the issue. Reasoning can be understood as a special tool, and they can share the implementation. During my implementation process, I did find the tool parser code to be quite messy.

) Signed-off-by: Ce Gao <cegao@tensorchord.ai> Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Michael Goin <mgoin@redhat.com>

) Signed-off-by: Ce Gao <cegao@tensorchord.ai> Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Michael Goin <mgoin@redhat.com> Signed-off-by: Isotr0py <2037008807@qq.com>

arunpatala · 2025-02-05T09:14:28Z

I was wondering if json schema only for answer ( and not supressing the thinking tokens) is added in this PR?

gaocegege · 2025-02-05T12:26:10Z

I was wondering if json schema only for answer ( and not supressing the thinking tokens) is added in this PR?

It is not supported. ref #12468 (comment)

) Signed-off-by: Ce Gao <cegao@tensorchord.ai> Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Michael Goin <mgoin@redhat.com>

mergify bot added the frontend label Jan 27, 2025

gaocegege force-pushed the reason branch from 16e2217 to 5fbb337 Compare January 28, 2025 00:43

gaocegege changed the title ~~feat(entrypoints): Support reasoning content for deepseek r1~~ feat(entrypoints): Support reasoning content for deepseek r1 WIP Jan 28, 2025

gaocegege marked this pull request as draft January 28, 2025 00:51

gaocegege force-pushed the reason branch from 64db686 to 7146b2d Compare January 28, 2025 00:58

gaocegege changed the title ~~feat(entrypoints): Support reasoning content for deepseek r1 WIP~~ feat(entrypoints): Support reasoning content for deepseek r1 Jan 28, 2025

gaocegege force-pushed the reason branch 2 times, most recently from 54b910e to 946179e Compare January 28, 2025 01:23

This comment was marked as off-topic.

Sign in to view

gaocegege marked this pull request as ready for review January 28, 2025 02:09

gaocegege mentioned this pull request Jan 28, 2025

[Feature] reasoning_content in API for reasoning models like DeepSeek R1 #12468

Closed

1 task

robertgshaw2-redhat requested a review from DarkLight1337 January 28, 2025 02:29

DarkLight1337 reviewed Jan 28, 2025

View reviewed changes

vllm/entrypoints/openai/reasoning_parsers/deepseek_r1_reasoning_parser.py Outdated Show resolved Hide resolved

gaocegege changed the title ~~feat(entrypoints): Support reasoning content for deepseek r1~~ feat(entrypoints): Support reasoning content for deepseek r1 WIP Jan 28, 2025

gaocegege added 2 commits January 28, 2025 12:34

feat: Support reasoning

70c81ad

Signed-off-by: Ce Gao <cegao@tensorchord.ai>

fix: Address comments

bfd64fc

Signed-off-by: Ce Gao <cegao@tensorchord.ai>

gaocegege force-pushed the reason branch from 946179e to bfd64fc Compare January 28, 2025 04:51

gaocegege requested review from robertgshaw2-redhat and simon-mo as code owners January 28, 2025 04:51

chore: Add more tests

f716096

Signed-off-by: Ce Gao <cegao@tensorchord.ai>

gaocegege commented Jan 28, 2025

View reviewed changes

gaocegege changed the title ~~feat(entrypoints): Support reasoning content for deepseek r1 WIP~~ feat(entrypoints): Support reasoning content for deepseek r1 Jan 28, 2025

gaocegege added 2 commits January 28, 2025 15:36

fix: Fix type check

bfa38ec

Signed-off-by: Ce Gao <cegao@tensorchord.ai>

fix: Fix types

129005a

Signed-off-by: Ce Gao <cegao@tensorchord.ai>

gaocegege and others added 2 commits January 28, 2025 23:55

Update docs/source/features/reasoning_outputs.md

3968d3b

Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

Update vllm/entrypoints/openai/cli_args.py

a9c278c

Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

fix: Address comments

f0b0a06

Signed-off-by: Ce Gao <cegao@tensorchord.ai>

DarkLight1337 changed the title ~~feat(entrypoints): Support reasoning content for deepseek r1~~ [Frontend] Support reasoning content for deepseek r1 Jan 28, 2025

DarkLight1337 reviewed Jan 28, 2025

View reviewed changes

docs/source/features/reasoning_outputs.md Outdated Show resolved Hide resolved

DarkLight1337 reviewed Jan 28, 2025

View reviewed changes

tests/entrypoints/openai/reasoning_parsers/utils.py Outdated Show resolved Hide resolved

gaocegege and others added 2 commits January 29, 2025 00:49

Update tests/entrypoints/openai/reasoning_parsers/utils.py

723172f

Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

Update docs/source/features/reasoning_outputs.md

386af32

Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

mgoin reviewed Jan 28, 2025

View reviewed changes

examples/online_serving/openai_chat_completion_with_reasoning.py Outdated Show resolved Hide resolved

examples/online_serving/openai_chat_completion_with_reasoning_streaming.py Outdated Show resolved Hide resolved

docs/source/features/reasoning_outputs.md Outdated Show resolved Hide resolved

gaocegege and others added 2 commits January 29, 2025 08:15

Update docs/source/features/reasoning_outputs.md

a6e69d6

Co-authored-by: Michael Goin <mgoin@redhat.com>

fix: Address comments

b6339ac

Signed-off-by: Ce Gao <cegao@tensorchord.ai>

mgoin approved these changes Jan 29, 2025

View reviewed changes

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 29, 2025

DarkLight1337 approved these changes Jan 29, 2025

View reviewed changes

DarkLight1337 merged commit a7e3eba into vllm-project:main Jan 29, 2025
58 checks passed

gaocegege deleted the reason branch January 29, 2025 03:50

jamesbraza mentioned this pull request Jan 31, 2025

[Feature]: separating reasoning and answer for reasoning models #12602

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Frontend] Support reasoning content for deepseek r1 #12473

[Frontend] Support reasoning content for deepseek r1 #12473

gaocegege commented Jan 27, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Jan 27, 2025

This comment was marked as off-topic.

DarkLight1337 left a comment •

edited

Loading

DarkLight1337 commented Jan 28, 2025

gaocegege commented Jan 28, 2025

cksac commented Jan 28, 2025 •

edited

Loading

gaocegege commented Jan 28, 2025

gaocegege Jan 28, 2025

GeorgelPreput Jan 28, 2025 •

edited

Loading

gaocegege Jan 29, 2025

gaocegege commented Jan 28, 2025 •

edited

Loading

gaocegege commented Jan 28, 2025 •

edited

Loading

gaocegege commented Jan 29, 2025

mgoin left a comment

mgoin Jan 29, 2025

gaocegege Jan 29, 2025 •

edited

Loading

gaocegege commented Jan 29, 2025

gaocegege commented Jan 29, 2025

K-Mistele commented Jan 29, 2025

gaocegege commented Jan 29, 2025

arunpatala commented Feb 5, 2025

gaocegege commented Feb 5, 2025

[Frontend] Support reasoning content for deepseek r1 #12473

[Frontend] Support reasoning content for deepseek r1 #12473

Conversation

gaocegege commented Jan 27, 2025 • edited by github-actions bot Loading

github-actions bot commented Jan 27, 2025

This comment was marked as off-topic.

DarkLight1337 left a comment • edited Loading

Choose a reason for hiding this comment

DarkLight1337 commented Jan 28, 2025

gaocegege commented Jan 28, 2025

cksac commented Jan 28, 2025 • edited Loading

gaocegege commented Jan 28, 2025

gaocegege Jan 28, 2025

Choose a reason for hiding this comment

GeorgelPreput Jan 28, 2025 • edited Loading

Choose a reason for hiding this comment

gaocegege Jan 29, 2025

Choose a reason for hiding this comment

gaocegege commented Jan 28, 2025 • edited Loading

gaocegege commented Jan 28, 2025 • edited Loading

gaocegege commented Jan 29, 2025

mgoin left a comment

Choose a reason for hiding this comment

mgoin Jan 29, 2025

Choose a reason for hiding this comment

gaocegege Jan 29, 2025 • edited Loading

Choose a reason for hiding this comment

gaocegege commented Jan 29, 2025

gaocegege commented Jan 29, 2025

K-Mistele commented Jan 29, 2025

gaocegege commented Jan 29, 2025

arunpatala commented Feb 5, 2025

gaocegege commented Feb 5, 2025

gaocegege commented Jan 27, 2025 •

edited by github-actions bot

Loading

DarkLight1337 left a comment •

edited

Loading

cksac commented Jan 28, 2025 •

edited

Loading

GeorgelPreput Jan 28, 2025 •

edited

Loading

gaocegege commented Jan 28, 2025 •

edited

Loading

gaocegege commented Jan 28, 2025 •

edited

Loading

gaocegege Jan 29, 2025 •

edited

Loading