[Frontend][gpt-oss] Support all MCP servers for gpt-oss #26704

alecsolder · 2025-10-13T12:51:59Z

Purpose

This PR is an implementation of #26703

Note: This PR is really big, but there was a lot that had to be cleaned up at the same time. A large amount of the code is from adding tests to existing code and a MCP server to test with and new MCP tests.

There are three main motivations:

vLLM should support integrating with all MCP servers
- TODO: Test with enterprise ones like Github
Clean up gpt-oss tool specific code paths in vLLM
- Tools and models change, but MCP is a protocol meant to support all models
- Sets the stage for full MCP integration for all tool calling models
Avoid losing information due to lossy OpenAI types for output
- An example below

# What you can return to the user on the ResponsesAPI for this tool call with the OpenAI types.
class ActionOpenPage(TypedDict, total=False):
    type: Required[Literal["open_page"]]
    url: Required[str]
# The interface for the actual tool call the model does
open_link(ctx: Context,
    id: Union[int, str] = -1,
    cursor: int = -1,
    loc: int = -1,
    num_lines: int = -1,
    view_source: bool = False,
    source: Optional[str] = None)

This feature is similar to the concept of “connectors” because we still don’t support users specifying arbitrary MCP servers by URL yet, the list is static and owned by the server, which is more similar to the concept of OpenAI MCP Connectors.

Test Plan

Run the provided simple Memory MCP server using:

python tests/entrypoints/openai/memory_mcp_server.py 
# 8765 is the default port

Then register it with your vLLM responses API server with:

CUDA_VISIBLE_DEVICES=6 vllm serve openai/gpt-oss-20b --tool-server=localhost:8765

And then try it out with something like:

curl http://localhost:8000/v1/responses \
  -H "Content-Type: application/json" -N \
  -d '{
        "model": "openai/gpt-oss-20b",
        "input": "Show me that the memory tool works.",
        "tools": [
          {
            "type": "mcp",
            "server_label": "memory", 
            "server_url": "ignored"
          }
        ],
        "enable_response_messages": true
      }'

Test Result

Passed

Signed-off-by: Alec Solder <alecs@fb.com>

mergify · 2025-10-13T12:52:41Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @alecsolder.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

gemini-code-assist

Code Review

This pull request introduces a significant and well-executed refactoring to support arbitrary MCP (Model Context Protocol) servers, moving away from hardcoded tool logic. The changes generalize tool handling by introducing concepts like tool namespaces, normalization of legacy tool types, and a distinction between elevated and custom tools. The addition of comprehensive tests, including a standalone MCP server, ensures the new system is robust. Overall, this is a great improvement for the tool integration capabilities of vLLM. I have one piece of feedback regarding some debugging code that should be removed.

gemini-code-assist · 2025-10-13T12:53:44Z

vllm/entrypoints/openai/serving_responses.py

+        print("INPUT MESSAGES \n\n", input_messages)
+        print("\n\nOUTPUT MESSAGES \n\n", output_messages)


These print statements appear to be for debugging purposes. They should be removed before merging to avoid leaking potentially sensitive information from user requests into the server logs and to reduce log noise in production environments.

thank you gemini :)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

chatgpt-codex-connector · 2025-10-13T12:56:42Z

vllm/entrypoints/tool_server.py

 from contextlib import AbstractAsyncContextManager, asynccontextmanager
 from typing import TYPE_CHECKING, Any, Optional

+from mcp import ClientSession
+from mcp.client.sse import sse_client


Importing MCP client unconditionally breaks deployments without MCP

The module now imports from mcp import ClientSession and sse_client at top level. Any process that imports vllm.entrypoints.tool_server (e.g., the default API server) will immediately raise ImportError when the optional mcp package is not installed, even if MCP tooling is never used. The previous implementation delayed the import inside MCPToolServer.__init__ and only failed when the optional feature was exercised. This regression forces all installations to ship the MCP dependency, which will crash existing configurations that do not provide it.

Useful? React with 👍 / 👎.

Signed-off-by: Alec Solder <alecs@fb.com>

Resolved conflicts in: - vllm/entrypoints/context.py: Updated type annotations to modern syntax (str | None), kept 'enabled_tool_namespaces' parameter and 'called_namespaces' set - vllm/entrypoints/harmony_utils.py: Updated type annotations to modern syntax, kept tool_namespaces parameter in get_developer_message - vllm/entrypoints/tool_server.py: Kept 'namespace' parameter naming throughout for consistency

qandrew

thanks for putting this together! However the PR is a bit large to review thoroughly, can you split into smaller pieces? if you have 3 main motivations, can you split them into 3 separate PRs? a 4th pr could add memory_mcp_server, test_memory_mcp_server logic too

qandrew · 2025-10-13T18:51:07Z

tests/entrypoints/openai/memory_mcp_server.py

+# SPDX-License-Identifier: Apache-2.0
+# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
+"""
+Standalone Memory MCP Server


nit: this file should go in the mcp folder

qandrew · 2025-10-13T18:56:00Z

tests/entrypoints/openai/test_response_api_mcp_tools_elevated.py

+    - Tool calls should be on analysis channel
+    - Tool responses should be on analysis channel
+    """
+    response = await memory_elevated_client.responses.create(


i think i general we shouldn't need to do E2E tests for all functionality testing, ideally we can mock the client/server and reduce CI load. You might run into CI memory issues (i had some here eb6dd74) by spinning up more servers within the same CI step

mergify · 2025-10-15T03:35:58Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @alecsolder.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

bbrowning · 2025-10-16T14:16:24Z

tests/entrypoints/openai/test_response_api_mcp_tools_custom.py

+    When memory is NOT in GPT_OSS_SYSTEM_TOOL_MCP_LABELS:
+    - Tool should be in developer message (not system message)
+    - Tool calls should be on commentary channel
+    - Tool responses should be on commentary channel
+    """


These comments (and the similar ones for the elevated test case) are the most concise description of custom tools vs elevated/system/builtin (naming is hard) tools that I've seen, and help me really understand the difference. And as someone that has studied the Harmony format, I immediately understand the implications of this.

It feels like we need some user-facing documentation about this as well, with something to help the user know if they want their tools to be in GPT_OSS_SYSTEM_TOOL_MCP_LABELS or not and what the implications of that are on the tool calls generated or on the execution of those tool calls.

Support all MCP servers for gpt-oss

bf28587

Signed-off-by: Alec Solder <alecs@fb.com>

alecsolder requested review from DarkLight1337, NickLucche, aarnphm, chaunceyjiang, robertgshaw2-redhat and simon-mo as code owners October 13, 2025 12:52

mergify bot added frontend gpt-oss Related to GPT-OSS models labels Oct 13, 2025

mergify bot added the needs-rebase label Oct 13, 2025

github-project-automation bot added this to gpt-oss Issues & Enhancements Oct 13, 2025

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Oct 13, 2025

gemini-code-assist bot reviewed Oct 13, 2025

View reviewed changes

chatgpt-codex-connector bot reviewed Oct 13, 2025

View reviewed changes

Removing print statements

60645b4

Signed-off-by: Alec Solder <alecs@fb.com>

alecsolder changed the title ~~Support all MCP servers for gpt-oss~~ [Frontend] Support all MCP servers for gpt-oss Oct 13, 2025

alecsolder changed the title ~~[Frontend] Support all MCP servers for gpt-oss~~ [Frontend][gpt-oss] Support all MCP servers for gpt-oss Oct 13, 2025

mergify bot removed the needs-rebase label Oct 13, 2025

qandrew reviewed Oct 13, 2025

View reviewed changes

mergify bot added the needs-rebase label Oct 15, 2025

bbrowning mentioned this pull request Oct 15, 2025

[GPT-OSS] Structure_Tag support for gpt-oss tool-call in cot #25515

Merged

5 tasks

bbrowning reviewed Oct 16, 2025

View reviewed changes

alecsolder mentioned this pull request Oct 22, 2025

[Frontend] OpenAI Responses API supports Tool/Function calling - non-harmony #26874

Merged

5 tasks

chaunceyjiang mentioned this pull request Nov 6, 2025

[Bug]: Tool use is not supported in Responses API without Harmony None #28173

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Frontend][gpt-oss] Support all MCP servers for gpt-oss #26704

[Frontend][gpt-oss] Support all MCP servers for gpt-oss #26704

Uh oh!

alecsolder commented Oct 13, 2025 •

edited by github-actions bot

Loading

Uh oh!

mergify bot commented Oct 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 13, 2025

Uh oh!

alecsolder Oct 13, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Oct 13, 2025

Uh oh!

qandrew left a comment

Uh oh!

qandrew Oct 13, 2025

Uh oh!

qandrew Oct 13, 2025

Uh oh!

mergify bot commented Oct 15, 2025

Uh oh!

bbrowning Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		print("INPUT MESSAGES \n\n", input_messages)
		print("\n\nOUTPUT MESSAGES \n\n", output_messages)

Uh oh!

[Frontend][gpt-oss] Support all MCP servers for gpt-oss #26704

Are you sure you want to change the base?

[Frontend][gpt-oss] Support all MCP servers for gpt-oss #26704

Uh oh!

Conversation

alecsolder commented Oct 13, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

mergify bot commented Oct 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

alecsolder Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

qandrew left a comment

Choose a reason for hiding this comment

Uh oh!

qandrew Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

qandrew Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Oct 15, 2025

Uh oh!

bbrowning Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alecsolder commented Oct 13, 2025 •

edited by github-actions bot

Loading