Skip to content

Conversation

@qandrew
Copy link
Owner

@qandrew qandrew commented Oct 31, 2025

Purpose

server (proper m2 way)

 vllm serve MiniMaxAI/MiniMax-M2     --tensor-parallel-size 4     --tool-call-parser minimax_m2     --reasoning-parser minimax_m2_append_think     --enable-auto-tool-choice     --port 8000  --tool-server=localhost:8081/container,localhost:8081/browser,localhost:8081/python

server (hacky way)

vllm serve MiniMaxAI/MiniMax-M2 \
    --tensor-parallel-size 4 \
    --tool-call-parser minimax \
    --reasoning-parser minimax_m2_append_think \
    --enable-auto-tool-choice \
    --chat-template examples/tool_chat_template_minimax_m1.jinja \
    --tool-server=localhost:8081/container,localhost:8081/browser,localhost:8081/python \
    --port 8000

vllm/fb/scripts/gptoss/run_mcp_server.sh

client example

curl -X POST "http://localhost:8000/v1/responses"   -H "Content-Type: application/json"   -H "Authorization: Bearer dummy-api-key"   -d '{
        "model": "MiniMaxAI/MiniMax-M2",
        "input": "Multiply 64548*15151 using the python tool.",
        "tools": [
          {
            "type": "mcp",
            "server_label": "code_interpreter",
            "headers": {"test": "test"},
            "server_url": "IGNORED"
          }
        ]
      }'
'<think>\nLet me think about this problem. The user wants me to multiply 64548 by 15151. This is a basic arithmetic operation, but since the user specifically asked me to use the "python tool," I need to use the code interpreter tool to calculate this.\n\nI need to use the code_interpreter tool with Python code to perform this multiplication. The calculation is straightforward: 64548 × 15151.\n\nLet me write a simple Python script that will perform this multiplication. I\'ll use the print function to output the result. The code will be something like:\n```\nresult = 64548 * 15151\nprint(result)\n```\n\nThis simple multiplication should give us the correct answer. The code interpreter tool will execute this Python code and return the result.\n\nI\'ll format my response using the required XML tags for tool calls, specifying the code_interpreter as the tool name and providing the Python code as the argument. The system should then execute this code and return the multiplication result to the user.\n\nI\'m not going to calculate this manually since the user specifically requested using the Python tool, and the numbers are large enough that using a tool is more reliable than mental calculation.\n</think>\n\n<tool_calls>\n{"name": "code_interpreter", "arguments": {"code": "result = 64548 * 15151\\nprint(result)"}}\n</tool_calls>'

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@qandrew qandrew changed the title mcp for minimax [2] mcp for minimax Oct 31, 2025
Andrew Xia added 4 commits November 3, 2025 11:38
Signed-off-by: Andrew Xia <axia@fb.com>
Signed-off-by: Andrew Xia <axia@fb.com>
Signed-off-by: Andrew Xia <axia@fb.com>
Signed-off-by: Andrew Xia <axia@fb.com>
@qandrew
Copy link
Owner Author

qandrew commented Nov 4, 2025

function call round trip

curl http://localhost:8000/v1/responses \
  -H "Content-Type: application/json" \
  -N \
  -d '{
    "model": "MiniMaxAI/MiniMax-M2",
    "input": [
      {
        "role": "user",
        "content": "What is the weather in Paris in Celsius today?"
      },
      {
        "arguments": "{\"location\": \"Paris\", \"unit\": \"celsius\"}",
        "call_id": "call_5f7b38f3b81e4b8380fd0ba74f3ca3ab",
        "name": "get_weather",
        "type": "function_call",
        "id": "fc_4fe5d6fc5b6c4d6fa5f24cc80aa27f78",
        "status": "completed"
      },
      {
        "call_id": "call_5f7b38f3b81e4b8380fd0ba74f3ca3ab",
        "id": "fc_4fe5d6fc5b6c4d6fa5f24cc80aa27f78",
        "output": "The weather in Paris is 20 Celsius",
        "status": "completed",
        "type": "function_call_output"
      }
    ],
    "tools": [{
        "type": "function",
        "name": "get_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["location", "unit"]
        }
    }],
    "temperature": 0.7,
    "max_output_tokens": 2000
  }'
{
    "id": "resp_633f8a32cc594e18a30146dfca8ab948",
    "created_at": 1762229718,
    "incomplete_details": null,
    "instructions": null,
    "metadata": null,
    "model": "MiniMaxAI/MiniMax-M2",
    "object": "response",
    "output": [
        {
            "id": "rs_d7b0a3afb53443db88e68da2a2853fd1",
            "summary": [],
            "type": "reasoning",
            "content": [
                {
                    "text": "I need to provide the user with the current weather in Paris in Celsius. The tool has returned a response indicating that it's 20°C. This is a straightforward answer that directly addresses the user's question. I should respond clearly and succinctly, ensuring to include both the numeric value and the unit for clarity. Since no additional details were requested, I’ll keep it simple and direct. So, I’ll say: \"The weather in Paris today is 20°C.\"\n",
                    "type": "reasoning_text"
                }
            ],
            "encrypted_content": null,
            "status": null
        },
        {
            "id": "msg_38d74c191f8245d4a5de34cde1ae1c40",
            "content": [
                {
                    "annotations": [],
                    "text": "\n\nThe weather in Paris today is 20°C.",
                    "type": "output_text",
                    "logprobs": null
                }
            ],
            "role": "assistant",
            "status": "completed",
            "type": "message"
        }
    ],
    "parallel_tool_calls": true,
    "temperature": 0.7,
    "tool_choice": "auto",
    "tools": [],
    "top_p": 0.95,
    "background": false,
    "max_output_tokens": 2000,
    "max_tool_calls": null,
    "previous_response_id": null,
    "prompt": null,
    "reasoning": null,
    "service_tier": "auto",
    "status": "completed",
    "text": null,
    "top_logprobs": null,
    "truncation": "disabled",
    "usage": {
        "input_tokens": 91,
        "input_tokens_details": {
            "cached_tokens": 80,
            "input_tokens_per_turn": [],
            "cached_tokens_per_turn": []
        },
        "output_tokens": 105,
        "output_tokens_details": {
            "reasoning_tokens": 0,
            "tool_output_tokens": 0,
            "output_tokens_per_turn": [],
            "tool_output_tokens_per_turn": []
        },
        "total_tokens": 196
    },
    "user": null,
    "input_messages": null,
    "output_messages": null
}

@chaunceyjiang
Copy link

vllm/fb/scripts/gptoss/run_mcp_server.sh

Hi @qandrew Would you be willing to share this script?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants