-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
[Frontend][gpt-oss] Support all MCP servers for gpt-oss #26704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Alec Solder <alecs@fb.com>
|
This pull request has merge conflicts that must be resolved before it can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a significant and well-executed refactoring to support arbitrary MCP (Model Context Protocol) servers, moving away from hardcoded tool logic. The changes generalize tool handling by introducing concepts like tool namespaces, normalization of legacy tool types, and a distinction between elevated and custom tools. The addition of comprehensive tests, including a standalone MCP server, ensures the new system is robust. Overall, this is a great improvement for the tool integration capabilities of vLLM. I have one piece of feedback regarding some debugging code that should be removed.
| print("INPUT MESSAGES \n\n", input_messages) | ||
| print("\n\nOUTPUT MESSAGES \n\n", output_messages) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you gemini :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
| from contextlib import AbstractAsyncContextManager, asynccontextmanager | ||
| from typing import TYPE_CHECKING, Any, Optional | ||
|
|
||
| from mcp import ClientSession | ||
| from mcp.client.sse import sse_client |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Importing MCP client unconditionally breaks deployments without MCP
The module now imports from mcp import ClientSession and sse_client at top level. Any process that imports vllm.entrypoints.tool_server (e.g., the default API server) will immediately raise ImportError when the optional mcp package is not installed, even if MCP tooling is never used. The previous implementation delayed the import inside MCPToolServer.__init__ and only failed when the optional feature was exercised. This regression forces all installations to ship the MCP dependency, which will crash existing configurations that do not provide it.
Useful? React with 👍 / 👎.
Signed-off-by: Alec Solder <alecs@fb.com>
Resolved conflicts in: - vllm/entrypoints/context.py: Updated type annotations to modern syntax (str | None), kept 'enabled_tool_namespaces' parameter and 'called_namespaces' set - vllm/entrypoints/harmony_utils.py: Updated type annotations to modern syntax, kept tool_namespaces parameter in get_developer_message - vllm/entrypoints/tool_server.py: Kept 'namespace' parameter naming throughout for consistency
qandrew
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for putting this together! However the PR is a bit large to review thoroughly, can you split into smaller pieces? if you have 3 main motivations, can you split them into 3 separate PRs? a 4th pr could add memory_mcp_server, test_memory_mcp_server logic too
| # SPDX-License-Identifier: Apache-2.0 | ||
| # SPDX-FileCopyrightText: Copyright contributors to the vLLM project | ||
| """ | ||
| Standalone Memory MCP Server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this file should go in the mcp folder
| - Tool calls should be on analysis channel | ||
| - Tool responses should be on analysis channel | ||
| """ | ||
| response = await memory_elevated_client.responses.create( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think i general we shouldn't need to do E2E tests for all functionality testing, ideally we can mock the client/server and reduce CI load. You might run into CI memory issues (i had some here eb6dd74) by spinning up more servers within the same CI step
|
This pull request has merge conflicts that must be resolved before it can be |
| When memory is NOT in GPT_OSS_SYSTEM_TOOL_MCP_LABELS: | ||
| - Tool should be in developer message (not system message) | ||
| - Tool calls should be on commentary channel | ||
| - Tool responses should be on commentary channel | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These comments (and the similar ones for the elevated test case) are the most concise description of custom tools vs elevated/system/builtin (naming is hard) tools that I've seen, and help me really understand the difference. And as someone that has studied the Harmony format, I immediately understand the implications of this.
It feels like we need some user-facing documentation about this as well, with something to help the user know if they want their tools to be in GPT_OSS_SYSTEM_TOOL_MCP_LABELS or not and what the implications of that are on the tool calls generated or on the execution of those tool calls.
Purpose
This PR is an implementation of #26703
Note: This PR is really big, but there was a lot that had to be cleaned up at the same time. A large amount of the code is from adding tests to existing code and a MCP server to test with and new MCP tests.
There are three main motivations:
This feature is similar to the concept of “connectors” because we still don’t support users specifying arbitrary MCP servers by URL yet, the list is static and owned by the server, which is more similar to the concept of OpenAI MCP Connectors.
Test Plan
Run the provided simple Memory MCP server using:
Then register it with your vLLM responses API server with:
And then try it out with something like:
Test Result
Passed