Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Frontend] Support reasoning content for deepseek r1 #12473

Merged
merged 19 commits into from
Jan 29, 2025

Conversation

gaocegege
Copy link
Contributor

@gaocegege gaocegege commented Jan 27, 2025

Fix #12468

Two CLI arguments are introduced for extensibility: --enable-reasoning and --reasoning-parser.

vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --enable-reasoning --reasoning-parser deepseek_r1

Test for full request:

from openai import OpenAI
client = OpenAI(
    base_url="http://100.104.240.69:8000/v1",
    api_key="token-abc123",
)

completion = client.chat.completions.create(
  model="deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B",
  messages = [{"role": "user", "content": "9.11 and 9.8, which is greater?"}]
)

print("reasoning_content:", completion.choices[0].message.reasoning_content)
print("content", completion.choices[0].message.content)

Test for stream request:

from openai import OpenAI
client = OpenAI(api_key="<DeepSeek API Key>", base_url="http://100.104.240.69:8000/v1")

# Round 1
messages = [{"role": "user", "content": "9.11 and 9.8, which is greater?"}]
response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B",
    messages=messages,
    stream=True
)

reasoning_content = ""
content = ""

for chunk in response:
    if chunk.choices[0].delta.reasoning_content:
        reasoning_content += chunk.choices[0].delta.reasoning_content
    else:
        content += chunk.choices[0].delta.content
    print(chunk)

print(f"Content: {content}")
print(f"Reasoning Content: {reasoning_content}")

These scripts are from https://api-docs.deepseek.com/guides/reasoning_model#api-example. However I encountered an issue with stream requests:

Traceback (most recent call last):
  File "main.py", line 16, in <module>
    if chunk.choices[0].delta.reasoning_content:
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../site-packages/pydantic/main.py", line 828, in __getattr__
    raise AttributeError(f'{type(self).__name__!r} object has no attribute {item!r}')
AttributeError: 'ChoiceDelta' object has no attribute 'reasoning_content'

Have to hack the openai python package to pass it. The RESTful API works well.

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@mergify mergify bot added the frontend label Jan 27, 2025
@gaocegege gaocegege changed the title feat(entrypoints): Support reasoning content for deepseek r1 feat(entrypoints): Support reasoning content for deepseek r1 WIP Jan 28, 2025
@gaocegege gaocegege marked this pull request as draft January 28, 2025 00:51
@gaocegege gaocegege changed the title feat(entrypoints): Support reasoning content for deepseek r1 WIP feat(entrypoints): Support reasoning content for deepseek r1 Jan 28, 2025
@gaocegege gaocegege force-pushed the reason branch 2 times, most recently from 54b910e to 946179e Compare January 28, 2025 01:23
@gaocegege

This comment was marked as off-topic.

Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure this is working correctly, can you add some tests to the codebase? Would also be great to add a section to the docs explaining its usage!

cc @mgoin @K-Mistele

@DarkLight1337
Copy link
Member

Have to hack the openai python package to pass it. The RESTful API works well.

Yeah, I think we can just use requests library in the case where the API is incompatible with OpenAI spec.

@gaocegege
Copy link
Contributor Author

To make sure this is working correctly, can you add some tests to the codebase? Would also be great to add a section to the docs explaining its usage!

cc @mgoin @K-Mistele

Will work on this.

@gaocegege gaocegege changed the title feat(entrypoints): Support reasoning content for deepseek r1 feat(entrypoints): Support reasoning content for deepseek r1 WIP Jan 28, 2025
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
@cksac
Copy link

cksac commented Jan 28, 2025

How this works together with Guided/Structured Outputs?
Can we apply guided generation after reasoning content? i.e. apply grammar after specific token, i.e. </think>

@gaocegege
Copy link
Contributor Author

@cksac Hi, please checkout #12468 (comment)

I don’t think they can work together. Also, the DeepSeek API doesn’t do function calls or structured output for DeepSeek R1, just FYI.

If you enable the feature with structured output, the output will be only the structured output.

Signed-off-by: Ce Gao <cegao@tensorchord.ai>
expected_reasoning: str,
expected_content: str,
):
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-125m")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not certain if we can accept this. I used a tokenizer to mimic the streaming output process, which might require some memory.

Copy link

@GeorgelPreput GeorgelPreput Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DeepSeek tokenizer can be instantiated locally from the zip file they provide here: https://api-docs.deepseek.com/quick_start/token_usage

Though maybe this is worth upstreaming to Transformers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use this:

    tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1")
    output = tokenizer.tokenize(param_dict["output"])

I used opt-125m to conserve memory during testing because I was unsure about the resources available on the test machine. If we can tolerate around 500MB of memory usage, we can use the R1 tokenizer directly.

@gaocegege gaocegege changed the title feat(entrypoints): Support reasoning content for deepseek r1 WIP feat(entrypoints): Support reasoning content for deepseek r1 Jan 28, 2025
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
@gaocegege
Copy link
Contributor Author

gaocegege commented Jan 28, 2025

@DarkLight1337 I added some unit tests, did not find how to add e2e tests, do we need add e2e cases for this feature? could you please guide me there?

gaocegege and others added 2 commits January 28, 2025 23:55
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
@gaocegege
Copy link
Contributor Author

gaocegege commented Jan 28, 2025

Thanks for your time.

I think OpenAI still hides reasoning?

Yes, OpenAI hides the reasoning content.

vercel AI SDK just extract reasoning from the streamed tokens by looking for the XML tags

I hadn’t realized they support reasoning. Thanks for bringing that to my attention!

https://github.com/vercel/ai/blob/main/packages/ai/core/middleware/extract-reasoning-middleware.ts

I will take a look too.

Signed-off-by: Ce Gao <cegao@tensorchord.ai>
@DarkLight1337 DarkLight1337 changed the title feat(entrypoints): Support reasoning content for deepseek r1 [Frontend] Support reasoning content for deepseek r1 Jan 28, 2025
gaocegege and others added 2 commits January 29, 2025 00:49
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
gaocegege and others added 2 commits January 29, 2025 08:15
Co-authored-by: Michael Goin <mgoin@redhat.com>
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
@gaocegege
Copy link
Contributor Author

Comments are addressed, please take a look.
Thanks for your time 😄

Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work incorporating feedback and producing nicely documented code, LGTM! I'd like @DarkLight1337 or @K-Mistele to sign off as well before landing

parser.add_argument(
"--reasoning-parser",
type=str,
metavar="{" + ",".join(valid_reasoning_parsers) + "}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why couldn't this just be choices=valid_reasoning_parsers?

Copy link
Contributor Author

@gaocegege gaocegege Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here’s the trick adapted from the tool parser argument.

https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/cli_args.py#L216

I think it is to keep the consistency with other CLI arguments:

$ vllm serve --help
...
  --scheduling-policy {fcfs,priority}
...
--task {auto,generate,embedding,embed,classify,score,reward}
...
--tokenizer-mode {auto,slow,mistral}

@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 29, 2025
@gaocegege
Copy link
Contributor Author

The Vercel AI SDK has a design akin to this PR. It employs a regex to parse the entire output for synchronous generation requests, and we also use this method.

const regexp = new RegExp(`${openingTag}(.*?)${closingTag}`, 'gs');
      const matches = Array.from(text.matchAll(regexp));

      if (!matches.length) {
        return { text, ...rest };
      }

      const reasoning = matches.map(match => match[1]).join(separator);

For streaming requests, we also have a similar design that checks the start and end tokens in both the previous and delta text.

@DarkLight1337 DarkLight1337 merged commit a7e3eba into vllm-project:main Jan 29, 2025
58 checks passed
@gaocegege gaocegege deleted the reason branch January 29, 2025 03:50
@gaocegege
Copy link
Contributor Author

Thanks for your review 🥰

@K-Mistele
Copy link
Contributor

Hey guys just now getting to this, glad to see it's been merged. For what it's worth, there have already been some discussions (ref. #11522 ) about refactoring tool parsers (possibly to use FSMs, which seems like it would be a good approach for managing tool reasoning parsing) instead of the current implementation which can tend to be buggy.

It seems like the direction we will want to move towards is having a single parser module for any given (supported) model that provides a unified interface for both tool parsing and reasoning parsing if one or both are supported by the model.

This would be more straightforward from a UX and a DX standpoint, and should result in cleaner code as well.

@gaocegege
Copy link
Contributor Author

I think it's reasonable after a glance at the issue. Reasoning can be understood as a special tool, and they can share the implementation. During my implementation process, I did find the tool parser code to be quite messy.

rasmith pushed a commit to rasmith/vllm that referenced this pull request Jan 30, 2025
)

Signed-off-by: Ce Gao <cegao@tensorchord.ai>
Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Michael Goin <mgoin@redhat.com>
Isotr0py pushed a commit to Isotr0py/vllm that referenced this pull request Feb 2, 2025
)

Signed-off-by: Ce Gao <cegao@tensorchord.ai>
Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Michael Goin <mgoin@redhat.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
@arunpatala
Copy link

I was wondering if json schema only for answer ( and not supressing the thinking tokens) is added in this PR?

@gaocegege
Copy link
Contributor Author

I was wondering if json schema only for answer ( and not supressing the thinking tokens) is added in this PR?

It is not supported. ref #12468 (comment)

NickLucche pushed a commit to NickLucche/vllm that referenced this pull request Feb 7, 2025
)

Signed-off-by: Ce Gao <cegao@tensorchord.ai>
Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Michael Goin <mgoin@redhat.com>
ShangmingCai pushed a commit to ShangmingCai/vllm that referenced this pull request Feb 10, 2025
)

Signed-off-by: Ce Gao <cegao@tensorchord.ai>
Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Michael Goin <mgoin@redhat.com>
GWS0428 pushed a commit to GWS0428/VARserve that referenced this pull request Feb 12, 2025
)

Signed-off-by: Ce Gao <cegao@tensorchord.ai>
Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Michael Goin <mgoin@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation frontend ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] reasoning_content in API for reasoning models like DeepSeek R1
8 participants