Skip to content

Conversation

@alecsolder
Copy link
Contributor

@alecsolder alecsolder commented Oct 13, 2025

Purpose

This PR is an implementation of #26703

Note: This PR is really big, but there was a lot that had to be cleaned up at the same time. A large amount of the code is from adding tests to existing code and a MCP server to test with and new MCP tests.

There are three main motivations:

  • vLLM should support integrating with all MCP servers
    • TODO: Test with enterprise ones like Github
  • Clean up gpt-oss tool specific code paths in vLLM
    • Tools and models change, but MCP is a protocol meant to support all models
    • Sets the stage for full MCP integration for all tool calling models
  • Avoid losing information due to lossy OpenAI types for output
    • An example below
# What you can return to the user on the ResponsesAPI for this tool call with the OpenAI types.
class ActionOpenPage(TypedDict, total=False):
    type: Required[Literal["open_page"]]
    url: Required[str]
# The interface for the actual tool call the model does
open_link(ctx: Context,
    id: Union[int, str] = -1,
    cursor: int = -1,
    loc: int = -1,
    num_lines: int = -1,
    view_source: bool = False,
    source: Optional[str] = None)

This feature is similar to the concept of “connectors” because we still don’t support users specifying arbitrary MCP servers by URL yet, the list is static and owned by the server, which is more similar to the concept of OpenAI MCP Connectors.

Test Plan

Run the provided simple Memory MCP server using:

python tests/entrypoints/openai/memory_mcp_server.py 
# 8765 is the default port

Then register it with your vLLM responses API server with:

CUDA_VISIBLE_DEVICES=6 vllm serve openai/gpt-oss-20b --tool-server=localhost:8765

And then try it out with something like:

curl http://localhost:8000/v1/responses \
  -H "Content-Type: application/json" -N \
  -d '{
        "model": "openai/gpt-oss-20b",
        "input": "Show me that the memory tool works.",
        "tools": [
          {
            "type": "mcp",
            "server_label": "memory", 
            "server_url": "ignored"
          }
        ],
        "enable_response_messages": true
      }'

Test Result

Passed

Signed-off-by: Alec Solder <alecs@fb.com>
@mergify
Copy link

mergify bot commented Oct 13, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @alecsolder.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and well-executed refactoring to support arbitrary MCP (Model Context Protocol) servers, moving away from hardcoded tool logic. The changes generalize tool handling by introducing concepts like tool namespaces, normalization of legacy tool types, and a distinction between elevated and custom tools. The addition of comprehensive tests, including a standalone MCP server, ensures the new system is robust. Overall, this is a great improvement for the tool integration capabilities of vLLM. I have one piece of feedback regarding some debugging code that should be removed.

Comment on lines 602 to 603
print("INPUT MESSAGES \n\n", input_messages)
print("\n\nOUTPUT MESSAGES \n\n", output_messages)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

These print statements appear to be for debugging purposes. They should be removed before merging to avoid leaking potentially sensitive information from user requests into the server logs and to reduce log noise in production environments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you gemini :)

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Comment on lines 4 to 8
from contextlib import AbstractAsyncContextManager, asynccontextmanager
from typing import TYPE_CHECKING, Any, Optional

from mcp import ClientSession
from mcp.client.sse import sse_client

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 Badge Importing MCP client unconditionally breaks deployments without MCP

The module now imports from mcp import ClientSession and sse_client at top level. Any process that imports vllm.entrypoints.tool_server (e.g., the default API server) will immediately raise ImportError when the optional mcp package is not installed, even if MCP tooling is never used. The previous implementation delayed the import inside MCPToolServer.__init__ and only failed when the optional feature was exercised. This regression forces all installations to ship the MCP dependency, which will crash existing configurations that do not provide it.

Useful? React with 👍 / 👎.

Signed-off-by: Alec Solder <alecs@fb.com>
@alecsolder alecsolder changed the title Support all MCP servers for gpt-oss [Frontend] Support all MCP servers for gpt-oss Oct 13, 2025
@alecsolder alecsolder changed the title [Frontend] Support all MCP servers for gpt-oss [Frontend][gpt-oss] Support all MCP servers for gpt-oss Oct 13, 2025
Resolved conflicts in:
- vllm/entrypoints/context.py: Updated type annotations to modern syntax (str | None), kept 'enabled_tool_namespaces' parameter and 'called_namespaces' set
- vllm/entrypoints/harmony_utils.py: Updated type annotations to modern syntax, kept tool_namespaces parameter in get_developer_message
- vllm/entrypoints/tool_server.py: Kept 'namespace' parameter naming throughout for consistency
@mergify mergify bot removed the needs-rebase label Oct 13, 2025
Copy link
Contributor

@qandrew qandrew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for putting this together! However the PR is a bit large to review thoroughly, can you split into smaller pieces? if you have 3 main motivations, can you split them into 3 separate PRs? a 4th pr could add memory_mcp_server, test_memory_mcp_server logic too

# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
"""
Standalone Memory MCP Server
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this file should go in the mcp folder

- Tool calls should be on analysis channel
- Tool responses should be on analysis channel
"""
response = await memory_elevated_client.responses.create(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think i general we shouldn't need to do E2E tests for all functionality testing, ideally we can mock the client/server and reduce CI load. You might run into CI memory issues (i had some here eb6dd74) by spinning up more servers within the same CI step

@mergify
Copy link

mergify bot commented Oct 15, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @alecsolder.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Comment on lines +72 to +76
When memory is NOT in GPT_OSS_SYSTEM_TOOL_MCP_LABELS:
- Tool should be in developer message (not system message)
- Tool calls should be on commentary channel
- Tool responses should be on commentary channel
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments (and the similar ones for the elevated test case) are the most concise description of custom tools vs elevated/system/builtin (naming is hard) tools that I've seen, and help me really understand the difference. And as someone that has studied the Harmony format, I immediately understand the implications of this.

It feels like we need some user-facing documentation about this as well, with something to help the user know if they want their tools to be in GPT_OSS_SYSTEM_TOOL_MCP_LABELS or not and what the implications of that are on the tool calls generated or on the execution of those tool calls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend gpt-oss Related to GPT-OSS models needs-rebase

Projects

Status: To Triage

Development

Successfully merging this pull request may close these issues.

3 participants