Skip to content

Conversation

@SongChiYoung
Copy link
Contributor

@SongChiYoung SongChiYoung commented Apr 12, 2025

Why are these changes needed?

This PR introduces a reusable, event-driven McpSessionActor component for AutoGen’s MCP tool adapters.
Previously, each tool instance created its own isolated ClientSession, making it difficult to share a single session context across tools. This change solves that by:

  • Introducing a McpSessionActor that manages a single session each Agent via an async task.
  • Allowing safe concurrent usage of the same session by multiple tools.
  • Supporting serialization/deserialization of the session actor for config-based workflows.
  • Ensuring graceful shutdown of the session, even in atexit or test teardown scenarios.

This addresses hanging issues during test teardown and resolves RuntimeError: Attempted to exit cancel scope in a different task errors in anyio.

Related issue number

Closes #6198

Checks

Notes on Lifecycle Management and Design Decisions

  • This PR does not include a routine to reinitialize the session when an Agent or Team stops for various reasons. Instead, the actor shuts down gracefully when the process terminates.
    It's unclear whether the on_reset method is expected to be invoked in such cases, so I took a conservative approach. I'm happy to adjust this if there's guidance from the maintainers.

  • As seen in the test cases, when the session is used outside of an Agent (i.e., called directly), users must explicitly call the close() method.
    This design assumes lazy initialization and does not cause hangs as long as the session is constructed but not used.
    Supporting with-statement awareness was not included in this PR to keep things simple, but it can be added in a separate PR if the community sees value in it.

  • Currently, the session is not reset inside on_reset() since doing so would require another initialize() on the next call, adding more complexity.
    If needed, I’m open to revisiting this, either in this PR or a follow-up, depending on guidance from domain experts or maintainers.

In summary: there may be edge cases that lead to a hang only if the session is used without an Agent and not closed explicitly.
This is considered a usage-side responsibility. Adding more complexity to guard against such nonstandard usage might lead to over-engineering.
As for on_reset, I tried to follow the “simple is best” philosophy by relying on graceful shutdown rather than adding extra lifecycle management.

If there’s a better idiomatic approach for AutoGen or MCP usage patterns, I’d greatly appreciate your advice.

@SongChiYoung
Copy link
Contributor Author

Notes on Lifecycle Management and Design Decisions

  • This PR does not include a routine to reinitialize the session when an Agent or Team stops for various reasons. Instead, the actor shuts down gracefully when the process terminates.
    It's unclear whether the on_reset method is expected to be invoked in such cases, so I took a conservative approach. I'm happy to adjust this if there's guidance from the maintainers.

  • As seen in the test cases, when the session is used outside of an Agent (i.e., called directly), users must explicitly call the close() method.
    This design assumes lazy initialization and does not cause hangs as long as the session is constructed but not used.
    Supporting with-statement awareness was not included in this PR to keep things simple, but it can be added in a separate PR if the community sees value in it.

  • Currently, the session is not reset inside on_reset() since doing so would require another initialize() on the next call, adding more complexity.
    If needed, I’m open to revisiting this, either in this PR or a follow-up, depending on guidance from domain experts or maintainers.

In summary: there may be edge cases that lead to a hang only if the session is used without an Agent and not closed explicitly.
This is considered a usage-side responsibility. Adding more complexity to guard against such nonstandard usage might lead to over-engineering.
As for on_reset, I tried to follow the “simple is best” philosophy by relying on graceful shutdown rather than adding extra lifecycle management.

If there’s a better idiomatic approach for AutoGen or MCP usage patterns, I’d greatly appreciate your advice.

@ekzhu
Copy link
Contributor

ekzhu commented Apr 13, 2025

Currently, the session is not reset inside on_reset() since doing so would require another initialize() on the next call, adding more complexity.

Make sense.

@codecov
Copy link

codecov bot commented Apr 15, 2025

Codecov Report

Attention: Patch coverage is 80.70175% with 22 lines in your changes missing coverage. Please review.

Project coverage is 77.27%. Comparing base (7e8472f) to head (646e75f).

Files with missing lines Patch % Lines
.../autogen-ext/src/autogen_ext/tools/mcp/_session.py 81.57% 14 Missing ⚠️
.../autogen-ext/src/autogen_ext/tools/mcp/_factory.py 14.28% 6 Missing ⚠️
...ges/autogen-ext/src/autogen_ext/tools/mcp/_base.py 87.50% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6284      +/-   ##
==========================================
+ Coverage   77.24%   77.27%   +0.02%     
==========================================
  Files         200      200              
  Lines       14473    14556      +83     
==========================================
+ Hits        11180    11248      +68     
- Misses       3293     3308      +15     
Flag Coverage Δ
unittests 77.27% <80.70%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@ekzhu ekzhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed my mind regarding the overall design I approved earlier.

I tried this with some examples and I found that when there is an error from the tool, it is not properly closing the tools and causing event loop closed errors -- as I expected because I forgot to include try--finally to close the tools.

Try this simple example:

params = StdioServerParams(
    command="uvx",
    args=["mcp-server-fetch"],
    read_timeout_seconds=60,
)
tools = await mcp_server_tools(server_params=params)
assert tools is not None
assert tools[0].name == "fetch"
result = await tools[0].run_json({"url": "https://github.com/microsoft/not_exist"}, CancellationToken())
assert result is not None

Another weird issue is when one of the tool is closed, all of the tools will be closed -- this can be counter-intuitive.

I think we still need to use a context manager for the session. We can still use the McpSessionActor wrapper. I think this way it will make the implementation much easier and avoid confusion.

One more note: I think McpSession instead of McpSessionActor is more succinct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's update the API doc to use a context manager for the McpSessionActor.

server_params: McpServerParams


class McpSessionActor(ComponentBase[BaseModel], Component[McpSessionConfig]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The McpSessionActor should be used as an async context manager.

it is very easy for user to accidently forget calling close().

I think the new usage should be like this:

async McpSessionActor(server_params) as session:
  tools = mcp_server_tools(sessions)
  # rest of code.

Otherwise we will need to call close() on every tools directly, which is fine in some cases, but it is really easy to forget. So we should support the async context manager usage.

@ekzhu
Copy link
Contributor

ekzhu commented Apr 15, 2025

Also, let's add some unit tests that test mcp_server_tools directly to improve the code coverage.

@SongChiYoung
Copy link
Contributor Author

@ekzhu

Thanks for the detailed suggestions! I think I now better understand your preference for using async with on McpSessionActor directly — it’s a clean way to enforce graceful shutdown and avoid forgetting close().

However, I’d like to share one constraint from real-world usage that makes that pattern difficult in practice:

🧩 Tools are often injected into Agents — not used inline

In most AutoGen use cases, we don’t call tools immediately after creation.
Instead, tools are passed into Agents or Teams as dependencies.

In this pattern, the actor must outlive the entire agent execution, and it’s very difficult to scope the McpSessionActor within an async with block before injecting it.

Then the session would be closed the moment the with block ends — even before the agent starts running.
This breaks composability.

🛠️ Why the current design favors robustness

To solve that, I moved session lifecycle outside of tool logic, and into the McpSessionActor, which:
• Shares a single session across multiple tools
• Allows lazy initialization
• Can be reused across agents
• Can be explicitly await close()d at safe points (e.g. at the end of the test or app shutdown)

And yes — I absolutely agree that we should add aenter / aexit support as optional sugar, so advanced users can still do.
But I believe the actor-based pattern is necessary to make tool injection work safely.

Would love to hear your thoughts, and happy to iterate on this further!

@ekzhu
Copy link
Contributor

ekzhu commented Apr 15, 2025

In most AutoGen use cases, we don’t call tools immediately after creation. Instead, tools are passed into Agents or Teams as dependencies.

Then all of the running of agents and teams should be within the session's context. I don't see why this is an issue. The session context clearly indicates the lifecycle of the session, and anything that depends on it should live within it.

If the session is not created as a context, then we can lazily initialize it. But how do you solve the issue that closing one tool closes all the tools?

@SongChiYoung
Copy link
Contributor Author

SongChiYoung commented Apr 16, 2025

@ekzhu
Thanks for the suggestion.
I tried testing with a with block, but hit a few issues - so I wanted to first explain where things stand.

TL;DR

  1. To fix McpToolAdapter Connects to the MCP Server in a Stateless Mode #6198, all MCP tools must share the same session.
    --> That’s why closing one tool closes all others.
  2. Using with solves that, but breaks user code of before version and makes serialization hard.
    --> Found this thanks to your comment - turns out even my version has deserialization issues.
  3. The current Actor pattern avoids both problems.
    --> Runs in its own task and owns the session.
    --> Can inject the same session into multiple tools.

I’ll still try to support with in a clean way if possible.


1. #6198 root issue

The original issue was the session closing early if tools used with inside an Agent.

Later I found a bigger problem:
Tools in the same MCP server need to share the same session.

For example, web_surf and save_pdf are separate tools — but the PDF tool can’t access browsed pages unless they share a session.
That’s why I had to inject the same session into all tools when using mcp_server_tools().

2. Why `with` was tricky in practice

Yes, wrapping the whole thing in a with block technically works —
But it breaks two things:

  • It changes the public API and user code has to change
  • It breaks serialization/deserialization
    (Sessions are created outside the Agent/Team - so they can’t be deserialization)

Right now I’m working on making this session sharing work even in deserialization — your comment helped me catch this. Appreciate it.

3. Why I went with Actor

MCP SDK raises errors when the session is closed from a different task:

RuntimeError: athrow(): asynchronous generator is already running  
RuntimeError: Attempted to exit cancel scope in a different task than it was entered in

That’s why I put the session inside a dedicated task - the Actor.
That way it’s always closed from the right place. Tools just talk to the Actor, and it owns the session.

4. About “closing one tool closes all”

This is expected - because tools share the same session.

I thought about tracking tool lifetimes and only closing the session when all are done -
But it adds a lot of complexity, and probably isn’t worth it.
If there’s a cleaner idea I’m open though.

Let me know if you think anything here’s off.
Happy to adjust direction.

@SongChiYoung
Copy link
Contributor Author

@ekzhu
I think the current implementation is now in a pretty good place, and I’d love your thoughts.

✅ Summary of changes:

  • The session now lives inside McpToolAdapter, but is not closed at the end of with — this ensures it stays alive for as long as the Agent is running.
  • I updated the atexit handler to avoid event loop closed errors — it should now exit more gracefully.
  • I also fixed the issue where closing one tool would shut down the entire session — I'm now tracking a simple ref_count so the session only shuts down when all tools are done.
  • Serialization and deserialization are working correctly again!

One note on the async with usage pattern:
As I mentioned in an earlier comment, I’m hesitant to require users to wrap tools in a with block at a higher level — mainly because it introduces real complications with serialization/deserialization.
That’s why I’m keeping the session lifecycle managed independently for now.

🤔 A quick question about tests:

From what I saw in the previous codecov results, the coverage didn’t drop significantly after the changes.
That said, if there are specific test cases you’d still like to see, I’d be happy to add them!
Many of the new raise statements were added for safety, and might be tricky to trigger intentionally.

I’ve also documented more details, but figured I’d hold off unless it’s helpful now (since it might be too much information all at once). Just let me know if you’d like me to share more!

@ekzhu
Copy link
Contributor

ekzhu commented Apr 16, 2025

It breaks serialization/deserialization
(Sessions are created outside the Agent/Team - so they can’t be deserialization)

I think the fundamental issue is that MCP server is not compatible with how AssistantAgent uses tools, which assumes tools are separate and stateless.

It's not something we can solve with this PR -- it will require changes to the AssistantAgent. We could pass the MCP server parameters as a constructor argument to AssistantAgent, which then setups the session internally and manages it as part of its own lifecycle. This way, it will work perfectly with serialization and deserialization.

@SongChiYoung
Copy link
Contributor Author

SongChiYoung commented Apr 16, 2025

@ekzhu
You're absolutely right — I totally agree with your point.

But since passing MCP server parameters directly into Agent would introduce a breaking change,
how about making stateful tools optional instead?

(That way, the current PR wouldn't need to change much at all.)

Also, we could add close() hooks to Agent:

  • close() would terminate the session
  • on_reset() would re-initialize it

BTW even with your design, I think we’ll still need the Agent.close() hook for cleaning up the session,
and on_reset() must re-initialize it anyway.

so this might work either way.


Just one question:
Is there an existing hook when the Team terminates that could be used to trigger agent-level shutdowns?
I feel like there must be one based on State, but maybe I just don’t know the right place to look yet.

@ekzhu
Copy link
Contributor

ekzhu commented Apr 16, 2025

I think we can do that in a different PR.

For this PR, I want to avoid the complexity in the current implementation: reference counts, actors ... too many moving parts and potential for bugs.

Let's just focus on addressing the original issue with minimal fix, without worrying about serialization, which will be addressed in a separate PR.

So, for tools that require sharing a persistent session:

with create_mcp_server_session(server_params) as session:
  tools = mcp_server_tools(session)
  agent = AssistantAgent(..., tools=tools)

This won't work with serialization but that's not the point. It addresses the immediate issue.

For tools that can use new session on each invocation:

tools = mcp_server_tools(server_params)
agent = AssistantAgent(..., tools=tools)

Same thing as before.

I believe my issue description in #6198 already described this.

@ekzhu
Copy link
Contributor

ekzhu commented Apr 16, 2025

I'd rather have a quick fix to this problem so I can make a release today :D

I can take on this if it's too late for you.

@SongChiYoung
Copy link
Contributor Author

SongChiYoung commented Apr 16, 2025

@ekzhu
Totally fair — I understand the goal is to keep this PR minimal and focused. 👍

That said, I’ve been working on a use case where MCP + serialization is actually super important 😅
I’m building an internal tool CompanyDeepResearch(Research datas and makes report from company own data) on top of AutoGen + MCP, using the autogen-oaiapi wrapper I shared earlier in Discord.
For concurrent access and persistence, team (de)serialization for copy is kind of essential.

So… just wanted to flag that there’s at least one user (me 😅) really hoping to see serialization support without introducing breaking changes to the API.

I’ll hold off for now and follow up in a separate PR as suggested!

By the way — it’s currently 4AM KST here 😅
I’m worried I might break something if I try to fix this right now — would you mind taking care of it for this issue? I’d really appreciate it! 🙏

@ekzhu
Copy link
Contributor

ekzhu commented Apr 16, 2025

That said, I’ve been working on a use case where MCP + serialization is actually super important 😅

Understood. Let's get this issue over with and work on a more complete fix as a separate PR.

@ekzhu ekzhu closed this Apr 16, 2025
@SongChiYoung SongChiYoung deleted the FIX/McpToolAdapter_Connects_to_the_MCP_Server_in_a_StatelessMode branch April 16, 2025 22:38
ekzhu added a commit that referenced this pull request Apr 17, 2025
Resolves #6232, #6198

This PR introduces an optional parameter `session` to `mcp_server_tools`
to support reuse of the same session.

```python
import asyncio

from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.conditions import TextMentionTermination
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.ui import Console
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_ext.tools.mcp import StdioServerParams, create_mcp_server_session, mcp_server_tools


async def main() -> None:
    model_client = OpenAIChatCompletionClient(model="gpt-4o", parallel_tool_calls=False)  # type: ignore
    params = StdioServerParams(
        command="npx",
        args=["@playwright/mcp@latest"],
        read_timeout_seconds=60,
    )
    async with create_mcp_server_session(params) as session:
        await session.initialize()
        tools = await mcp_server_tools(server_params=params, session=session)
        print(f"Tools: {[tool.name for tool in tools]}")

        agent = AssistantAgent(
            name="Assistant",
            model_client=model_client,
            tools=tools,  # type: ignore
        )

        termination = TextMentionTermination("TERMINATE")
        team = RoundRobinGroupChat([agent], termination_condition=termination)
        await Console(
            team.run_stream(
                task="Go to https://ekzhu.com/, visit the first link in the page, then tell me about the linked page."
            )
        )


asyncio.run(main())
``` 

Based on discussion in this thread: #6284, we will consider
serialization and deserialization of MCP server tools when used in this
manner in a separate issue.

This PR also replaces the `json_schema_to_pydantic` dependency with
built-in utils.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

McpToolAdapter Connects to the MCP Server in a Stateless Mode

2 participants