-
-
Notifications
You must be signed in to change notification settings - Fork 11.2k
[Frontend] OpenAI Responses API supports Tool/Function calling - non-harmony #26874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Documentation preview: https://vllm--26874.org.readthedocs.build/en/26874/ |
3451ea9 to
1ab9c9d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
|
i think we are quite aligned that we don't want responses api to only work for gpt-oss #26703 ;) could you share some e2e vllm serve command how you use this without other models? |
Hi, @yeqcharlotte I've provided an example in |
4cfd173 to
50e9124
Compare
alecsolder
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is great, it really gets us started with supporting other models on responses API
Comparing this to gpt-oss, for gpt-oss we handle the conversion of the output tokens to "messages" in entrypoints/context.py. The Harmony library happens to be a tool_parser reasoning_parser and tokenizer all at the same time, but it is nice that it all happens in one location. This matches needing to convert from responses types to completions types so we can use the parsers for other models.
I think the implementation in this PR as-is works, but once we want to support server side tool calling with models besides gpt-oss, we will likely need to move the parsing logic to entrypoints/context.py as well so it can happen in the tool calling "loop" from _generate_with_builtin_tools() in serving_engine.
Ideally we'd be able to start pulling conversion logic out of serving_responses into their own files like with harmony_utils.py, and continue to standardize more around something like creating the right context object for serving_engine.
Hi, @alecsolder Your suggestions are excellent, and I completely agree. However, the main goal of this PR is to enable non-gpt-oss models to use tool calling with the Responses API. Essentially, this PR constructs chat-completion-style messages with tool calling, and then extracts tools from the output using the You can think of it as similar to how the Responses API currently handles Harmony. The new Similarly, the new Regarding what you mentioned about server-side tool calling and MCP, I believe that should be handled by However, I still think the |
alecsolder
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @chaunceyjiang, Completely agree with everything in your comment, I am fully aligned on moving things to the context classes, and all of it definitely doesn't need to be in this PR. I was mostly just trying to say these things to make sure we are aligned, which we are!
Other models can definitely support MCP, and the implementation from here is basically just us deciding when to handle the parsed FunctionCall from within vLLM, vs when to return it to the client. So we are right on track!
I'll try to put together the list of things that still need to be done to support the same arbitrary MCP integration as here #26704 for other models :)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
79de140 to
06e802e
Compare
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
@yeqcharlotte Ready for review. |
yeqcharlotte
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the change. let's keep iterate on this to also support minimax-m2 cc: @qandrew
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
…harmony (vllm-project#26874) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Purpose
rebase #20874
OpenAI Responses API supports Tool/Function calling
Follow up #20504
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.