Skip to content

Conversation

@chaunceyjiang
Copy link
Collaborator

@chaunceyjiang chaunceyjiang commented Oct 15, 2025

Purpose

rebase #20874

OpenAI Responses API supports Tool/Function calling

Follow up #20504

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify
Copy link

mergify bot commented Oct 15, 2025

Documentation preview: https://vllm--26874.org.readthedocs.build/en/26874/

@mergify mergify bot added documentation Improvements or additions to documentation frontend v1 labels Oct 15, 2025
@mergify mergify bot added the tool-calling label Oct 15, 2025
@chaunceyjiang chaunceyjiang changed the title [Frontend] OpenAI Responses API supports Tool/Function calling [Frontend] OpenAI Responses API supports Tool/Function calling - non-harmony Oct 15, 2025
@mergify mergify bot added the gpt-oss Related to GPT-OSS models label Oct 15, 2025
@chaunceyjiang chaunceyjiang marked this pull request as ready for review October 15, 2025 10:17
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

@yeqcharlotte
Copy link
Collaborator

i think we are quite aligned that we don't want responses api to only work for gpt-oss #26703 ;)

could you share some e2e vllm serve command how you use this without other models?

@chaunceyjiang
Copy link
Collaborator Author

could you share some e2e vllm serve command how you use this without other models?

Hi, @yeqcharlotte I've provided an example in examples/online_serving/openai_responses_client_with_tools.py; vllm serve does not require any additional arguments.

Copy link
Contributor

@alecsolder alecsolder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is great, it really gets us started with supporting other models on responses API

Comparing this to gpt-oss, for gpt-oss we handle the conversion of the output tokens to "messages" in entrypoints/context.py. The Harmony library happens to be a tool_parser reasoning_parser and tokenizer all at the same time, but it is nice that it all happens in one location. This matches needing to convert from responses types to completions types so we can use the parsers for other models.

I think the implementation in this PR as-is works, but once we want to support server side tool calling with models besides gpt-oss, we will likely need to move the parsing logic to entrypoints/context.py as well so it can happen in the tool calling "loop" from _generate_with_builtin_tools() in serving_engine.

Ideally we'd be able to start pulling conversion logic out of serving_responses into their own files like with harmony_utils.py, and continue to standardize more around something like creating the right context object for serving_engine.

@chaunceyjiang
Copy link
Collaborator Author

chaunceyjiang commented Oct 22, 2025

I think the implementation in this PR as-is works, but once we want to support server side tool calling with models besides gpt-oss, we will likely need to move the parsing logic to entrypoints/context.py as well so it can happen in the tool calling "loop" from _generate_with_builtin_tools() in serving_engine.

Ideally we'd be able to start pulling conversion logic out of serving_responses into their own files like with harmony_utils.py, and continue to standardize more around something like creating the right context object for serving_engine.

Hi, @alecsolder

Your suggestions are excellent, and I completely agree. However, the main goal of this PR is to enable non-gpt-oss models to use tool calling with the Responses API.

Essentially, this PR constructs chat-completion-style messages with tool calling, and then extracts tools from the output using the tool_parser.

You can think of it as similar to how the Responses API currently handles Harmony.

The new _construct_chat_message_with_tool_call function is analogous to the existing _construct_input_messages_with_harmony function — both are responsible for constructing input messages.

Similarly, the new _parse_tool_calls_from_content function is analogous to the existing parse_output_message function — both are responsible for parsing output messages.

Regarding what you mentioned about server-side tool calling and MCP, I believe that should be handled by SimpleContext. Therefore, for non-gpt-oss models to support MCP, we’ll likely need a new Context. I plan to submit a separate PR to introduce that new Context.

However, I still think the _construct_chat_message_with_tool_call and _parse_tool_calls_from_content functions are necessary for constructing input messages and parsing output messages. These functions do not conflict with the Context, since currently the Context is not responsible for message construction or parsing.

Copy link
Contributor

@alecsolder alecsolder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @chaunceyjiang, Completely agree with everything in your comment, I am fully aligned on moving things to the context classes, and all of it definitely doesn't need to be in this PR. I was mostly just trying to say these things to make sure we are aligned, which we are!

Other models can definitely support MCP, and the implementation from here is basically just us deciding when to handle the parsed FunctionCall from within vLLM, vs when to return it to the client. So we are right on track!

I'll try to put together the list of things that still need to be done to support the same arbitrary MCP integration as here #26704 for other models :)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
@chaunceyjiang
Copy link
Collaborator Author

@yeqcharlotte Ready for review.

Copy link
Collaborator

@yeqcharlotte yeqcharlotte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the change. let's keep iterate on this to also support minimax-m2 cc: @qandrew

@github-project-automation github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Nov 6, 2025
@yeqcharlotte yeqcharlotte enabled auto-merge (squash) November 6, 2025 07:20
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 6, 2025
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
@yeqcharlotte yeqcharlotte merged commit 59a50af into vllm-project:main Nov 6, 2025
48 checks passed
@chaunceyjiang chaunceyjiang deleted the func_call_2 branch November 6, 2025 10:45
ZhengHongming888 pushed a commit to ZhengHongming888/vllm that referenced this pull request Nov 8, 2025
…harmony (vllm-project#26874)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation frontend gpt-oss Related to GPT-OSS models ready ONLY add when PR is ready to merge/full CI is needed tool-calling v1

Projects

Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants