Skip to content

fix: prevent MCP cancel scope errors from crashing agent-to-agent calls#1314

Open
jsonmp-k8 wants to merge 1 commit intokagent-dev:mainfrom
jsonmp-k8:fix/mcp-session-cleanup-cancel-scope-1276
Open

fix: prevent MCP cancel scope errors from crashing agent-to-agent calls#1314
jsonmp-k8 wants to merge 1 commit intokagent-dev:mainfrom
jsonmp-k8:fix/mcp-session-cleanup-cancel-scope-1276

Conversation

@jsonmp-k8
Copy link

Summary

  • Fixes MCP session cleanup crash when orchestrator agents call sub-agents that use MCP tools
  • Isolates runner.close() in a separate asyncio task using asyncio.gather(return_exceptions=True) to prevent cancel scope corruption from propagating to the A2A event queue teardown
  • Overrides KAgentMcpToolset.close() to catch BaseException (including CancelledError) since the upstream McpToolset.close() only catches Exception

Root Cause

When an orchestrator agent delegates to a sub-agent with MCP tools:

  1. MCP sessions are created in Task P (the asyncio.create_task from DefaultRequestHandler), entering anyio.CancelScope contexts
  2. During cleanup, _cleanup_toolsets() in google-adk wraps toolset.close() in asyncio.wait_for(), which can create a new internal task
  3. The AsyncExitStack.aclose() in that new task tries to exit the CancelScope from Task P — a different task
  4. anyio raises: Attempted to exit a cancel scope that isn't the current task's current cancel scope
  5. The resulting CancelledError propagates and crashes queue.close()queue.join(), causing a 500 Internal Server Error

Fix

Two layers of defense:

  1. _agent_executor.py: _safe_close_runner() runs runner.close() in an isolated asyncio task via asyncio.gather(return_exceptions=True). Any CancelledError or cancel scope corruption is collected as a result rather than raised, keeping the event queue teardown safe.

  2. _mcp_toolset.py: Override close() to catch BaseException (not just Exception). In Python 3.9+, CancelledError is a BaseException, so the upstream McpToolset.close() which only catches Exception lets it escape.

Test plan

  • Deploy with a multi-agent setup (orchestrator → sub-agent with MCP tools)
  • Issue requests that trigger MCP tool calls through the sub-agent
  • Verify no 500 errors and clean log output (warnings instead of crashes)
  • Verify single-agent MCP tool calls still work correctly

Fixes #1276

Copilot AI review requested due to automatic review settings February 16, 2026 04:50
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical bug where MCP session cleanup crashes during agent-to-agent calls when sub-agents use MCP tools. The issue stems from anyio cancel scope violations when cleanup occurs across different asyncio tasks, causing 500 Internal Server Errors.

Changes:

  • Added defensive exception handling in KAgentMcpToolset.close() to catch BaseException (including CancelledError) during MCP cleanup
  • Introduced _safe_close_runner() method that isolates runner cleanup in a separate task using asyncio.gather(return_exceptions=True) to prevent cancel scope corruption from propagating
  • Added logging for non-fatal cleanup errors to aid debugging without crashing the system

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
python/packages/kagent-adk/src/kagent/adk/_mcp_toolset.py Overrides close() method to catch BaseException during MCP session cleanup, preventing CancelledError from escaping and corrupting cancel state
python/packages/kagent-adk/src/kagent/adk/_agent_executor.py Adds _safe_close_runner() method that isolates runner cleanup in a separate task to prevent cancel scope errors from propagating to event queue teardown

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

jmhbh
jmhbh previously approved these changes Feb 16, 2026
Copy link
Contributor

@jmhbh jmhbh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

When an orchestrator agent calls a sub-agent that uses MCP tools, the
MCP session cleanup can trigger anyio CancelScope violations. This
happens because cancel scopes entered in one asyncio task context get
exited in a different task (created by asyncio.wait_for in the upstream
google-adk _cleanup_toolsets). The resulting CancelledError propagates
upward and crashes the A2A event queue teardown, causing 500 errors.

This fix applies multiple layers of defense:

1. _agent_executor.py: Wrap execute() with a top-level CancelledError
   guard that clears all pending task cancellations via Task.uncancel()
   (Python 3.11+) and publishes a failed status event instead of letting
   the error propagate to _run_event_stream in the A2A SDK.

2. _agent_executor.py: Run runner.close() in an isolated asyncio task
   via asyncio.gather(return_exceptions=True), so any CancelledError or
   cancel scope corruption stays contained. Only suppress the specific
   cross-task cancel scope error ("cancel scope" + "different task"),
   re-raise everything else.

3. _mcp_toolset.py: Override close() to catch BaseException (not just
   Exception), since CancelledError is a BaseException in Python 3.9+
   and the upstream McpToolset.close() only catches Exception. Only
   suppress known anyio cross-task cancel scope errors.

4. _agent_executor.py: Widen _publish_failed_status_event catch from
   Exception to BaseException (re-raising KeyboardInterrupt/SystemExit)
   so residual CancelledError cannot escape the failure event publisher.

Fixes kagent-dev#1276

Signed-off-by: Jaison Paul <paul.jaison@gmail.com>
@jsonmp-k8 jsonmp-k8 force-pushed the fix/mcp-session-cleanup-cancel-scope-1276 branch from df8b4b9 to e4e1deb Compare February 17, 2026 04:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] MCP session cleanup fails with cancel-scope error when agents call other agents with MCP tools

2 participants

Comments