[Bug]: FunctionAgent responce is a tool call and not the answer

### Bug Description

I am trying to follow the MCP example notebook. I am using OpenAILike to connect to a Hermes 3 tool calling model hosted on vllm. When I use the old synchronous tool method, it works:

```
 ./reproducer-sync.py 
what is 6554 * 933?
The result of 6554 * 933 is 61148.
```

But I'm trying to use MCP in which I defined a multiply tool. This tool can be pasted into the code of the server.py example:

```
@mcp.tool()
def multiply(a: int, b: int) -> int:
    """Multiply two integers and returns the resulting integer

    Args:
        a: the first integer to multiply
        b: the second integer to multiply

    Returns:
        int: the results of the multiply

    Raises:
        None
    """
    return(a * b)
```

When doing this async like the mcp tutorial shows, I get this:

```
./reproducer-async.py 
what is 6554 * 933?
<tool_call>
{"name": "multiply", "arguments": {"a": 6554, "b": 933}}
</tool_call>
```

This is the code for reproducer-async.py

```
#!/bin/env python3

from llama_index.llms.openai_like import OpenAILike
from llama_index.tools.mcp import BasicMCPClient, McpToolSpec
from llama_index.core.agent.workflow import FunctionAgent 

llm = OpenAILike(model="hermes",
                 api_base="http://localhost:8000/v1",
                 api_key="fake",
                 is_chat_model=True,
                 context_window=122880,
                 is_function_calling_model=True)

SYSTEM_PROMPT = """You are an AI assistant for Tool Calling. \
Before you help a user, you need to work with tools to do math."""

async def main():
    mcp_client = BasicMCPClient("http://127.0.0.1:3000/sse")
    mcp_tools = McpToolSpec(client=mcp_client)
    tool_list = await mcp_tools.to_tool_list_async()
    agent = FunctionAgent(
        name="Agent",
        description="An agent that can do use tools.",
        tools=tool_list,
        llm=llm,
        system_prompt=SYSTEM_PROMPT,
    )

    question = "what is 6554 * 933?"
    print(question)
    response = await agent.run(user_msg=question)
    print(str(response))

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())
```

The code for the old method that works is this:

```
#!/bin/env python3

from llama_index.llms.openai_like import OpenAILike
from llama_index.agent.openai import OpenAIAgent
from llama_index.core.tools import FunctionTool

def multiply(a: int, b: int) -> int:
    """Multiple two integers and returns the result integer"""
    return(a * b)

multiply_tool = FunctionTool.from_defaults(fn=multiply)

llm = OpenAILike(model="hermes",
                 api_base="http://localhost:8000/v1",
                 api_key="fake",
                 is_chat_model=True,
                 context_window=122880,
                 is_function_calling_model=True)

agent = OpenAIAgent.from_tools([multiply_tool], llm=llm, verbose=False)

question = "what is 6554 * 933?"
print(question)
print( str(agent.chat(question)) )
```

The synchronous method can handle Hermes-3-Llama-3.1-8B.Q5_K_M. But the asynchronous agent workflow somehow doesn't recognize and intercept the tool call request. Instead it thinks that's the final answer.

### Version

llama-index-agent-openai 0.4.6, llama-index-core 0.12.28, llama-index-llms-openai-like 0.3.4, llama-index-tools-mcp 0.1.1

### Steps to Reproduce

See above. Paste code into server.py mcp example, serve a Hermes-3-Llama-3.1-8B model on vllm:
```
/bin/env python3 -m vllm.entrypoints.openai.api_server \
        --host 127.0.0.1 --port 8000 \
        --dtype=half \
        --chat-template templates/tool_chat_template_hermes.jinja \
        --model /home/llm/models/Hermes-3-Llama-3.1-8B.Q5_K_M.gguf \
        --load-format gguf --max-model-len 122880 \
        --gpu_memory_utilization 0.95 \
        --served-model-name hermes \
        --enable-auto-tool-choice \
	--tool-call-parser hermes
```
Then try both reproducers.

### Relevant Logs/Tracbacks

```shell

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: FunctionAgent responce is a tool call and not the answer #18363

Bug Description

Version

Steps to Reproduce

Relevant Logs/Tracbacks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: FunctionAgent responce is a tool call and not the answer #18363

Description

Bug Description

Version

Steps to Reproduce

Relevant Logs/Tracbacks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions