-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Description
Bug Description
I am trying to follow the MCP example notebook. I am using OpenAILike to connect to a Hermes 3 tool calling model hosted on vllm. When I use the old synchronous tool method, it works:
./reproducer-sync.py
what is 6554 * 933?
The result of 6554 * 933 is 61148.
But I'm trying to use MCP in which I defined a multiply tool. This tool can be pasted into the code of the server.py example:
@mcp.tool()
def multiply(a: int, b: int) -> int:
"""Multiply two integers and returns the resulting integer
Args:
a: the first integer to multiply
b: the second integer to multiply
Returns:
int: the results of the multiply
Raises:
None
"""
return(a * b)
When doing this async like the mcp tutorial shows, I get this:
./reproducer-async.py
what is 6554 * 933?
<tool_call>
{"name": "multiply", "arguments": {"a": 6554, "b": 933}}
</tool_call>
This is the code for reproducer-async.py
#!/bin/env python3
from llama_index.llms.openai_like import OpenAILike
from llama_index.tools.mcp import BasicMCPClient, McpToolSpec
from llama_index.core.agent.workflow import FunctionAgent
llm = OpenAILike(model="hermes",
api_base="http://localhost:8000/v1",
api_key="fake",
is_chat_model=True,
context_window=122880,
is_function_calling_model=True)
SYSTEM_PROMPT = """You are an AI assistant for Tool Calling. \
Before you help a user, you need to work with tools to do math."""
async def main():
mcp_client = BasicMCPClient("http://127.0.0.1:3000/sse")
mcp_tools = McpToolSpec(client=mcp_client)
tool_list = await mcp_tools.to_tool_list_async()
agent = FunctionAgent(
name="Agent",
description="An agent that can do use tools.",
tools=tool_list,
llm=llm,
system_prompt=SYSTEM_PROMPT,
)
question = "what is 6554 * 933?"
print(question)
response = await agent.run(user_msg=question)
print(str(response))
if __name__ == "__main__":
import asyncio
asyncio.run(main())
The code for the old method that works is this:
#!/bin/env python3
from llama_index.llms.openai_like import OpenAILike
from llama_index.agent.openai import OpenAIAgent
from llama_index.core.tools import FunctionTool
def multiply(a: int, b: int) -> int:
"""Multiple two integers and returns the result integer"""
return(a * b)
multiply_tool = FunctionTool.from_defaults(fn=multiply)
llm = OpenAILike(model="hermes",
api_base="http://localhost:8000/v1",
api_key="fake",
is_chat_model=True,
context_window=122880,
is_function_calling_model=True)
agent = OpenAIAgent.from_tools([multiply_tool], llm=llm, verbose=False)
question = "what is 6554 * 933?"
print(question)
print( str(agent.chat(question)) )
The synchronous method can handle Hermes-3-Llama-3.1-8B.Q5_K_M. But the asynchronous agent workflow somehow doesn't recognize and intercept the tool call request. Instead it thinks that's the final answer.
Version
llama-index-agent-openai 0.4.6, llama-index-core 0.12.28, llama-index-llms-openai-like 0.3.4, llama-index-tools-mcp 0.1.1
Steps to Reproduce
See above. Paste code into server.py mcp example, serve a Hermes-3-Llama-3.1-8B model on vllm:
/bin/env python3 -m vllm.entrypoints.openai.api_server \
--host 127.0.0.1 --port 8000 \
--dtype=half \
--chat-template templates/tool_chat_template_hermes.jinja \
--model /home/llm/models/Hermes-3-Llama-3.1-8B.Q5_K_M.gguf \
--load-format gguf --max-model-len 122880 \
--gpu_memory_utilization 0.95 \
--served-model-name hermes \
--enable-auto-tool-choice \
--tool-call-parser hermes
Then try both reproducers.