fix: prevent MCP cancel scope errors from crashing agent-to-agent calls by jsonmp-k8 · Pull Request #1314 · kagent-dev/kagent

jsonmp-k8 · 2026-02-16T04:50:23Z

Summary

Fixes MCP session cleanup crash when orchestrator agents call sub-agents that use MCP tools
Isolates runner.close() in a separate asyncio task using asyncio.gather(return_exceptions=True) to prevent cancel scope corruption from propagating to the A2A event queue teardown
Overrides KAgentMcpToolset.close() to catch BaseException (including CancelledError) since the upstream McpToolset.close() only catches Exception

Root Cause

When an orchestrator agent delegates to a sub-agent with MCP tools:

MCP sessions are created in Task P (the asyncio.create_task from DefaultRequestHandler), entering anyio.CancelScope contexts
During cleanup, _cleanup_toolsets() in google-adk wraps toolset.close() in asyncio.wait_for(), which can create a new internal task
The AsyncExitStack.aclose() in that new task tries to exit the CancelScope from Task P — a different task
anyio raises: Attempted to exit a cancel scope that isn't the current task's current cancel scope
The resulting CancelledError propagates and crashes queue.close() → queue.join(), causing a 500 Internal Server Error

Fix

Two layers of defense:

_agent_executor.py: _safe_close_runner() runs runner.close() in an isolated asyncio task via asyncio.gather(return_exceptions=True). Any CancelledError or cancel scope corruption is collected as a result rather than raised, keeping the event queue teardown safe.
_mcp_toolset.py: Override close() to catch BaseException (not just Exception). In Python 3.9+, CancelledError is a BaseException, so the upstream McpToolset.close() which only catches Exception lets it escape.

Test plan

Deploy with a multi-agent setup (orchestrator → sub-agent with MCP tools)
Issue requests that trigger MCP tool calls through the sub-agent
Verify no 500 errors and clean log output (warnings instead of crashes)
Verify single-agent MCP tool calls still work correctly

Fixes #1276

Copilot

Pull request overview

This PR fixes a critical bug where MCP session cleanup crashes during agent-to-agent calls when sub-agents use MCP tools. The issue stems from anyio cancel scope violations when cleanup occurs across different asyncio tasks, causing 500 Internal Server Errors.

Changes:

Added defensive exception handling in KAgentMcpToolset.close() to catch BaseException (including CancelledError) during MCP cleanup
Introduced _safe_close_runner() method that isolates runner cleanup in a separate task using asyncio.gather(return_exceptions=True) to prevent cancel scope corruption from propagating
Added logging for non-fatal cleanup errors to aid debugging without crashing the system

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`python/packages/kagent-adk/src/kagent/adk/_mcp_toolset.py`	Overrides `close()` method to catch `BaseException` during MCP session cleanup, preventing `CancelledError` from escaping and corrupting cancel state
`python/packages/kagent-adk/src/kagent/adk/_agent_executor.py`	Adds `_safe_close_runner()` method that isolates runner cleanup in a separate task to prevent cancel scope errors from propagating to event queue teardown

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

python/packages/kagent-adk/src/kagent/adk/_agent_executor.py

jmhbh

LGTM

When an orchestrator agent calls a sub-agent that uses MCP tools, the MCP session cleanup can trigger anyio CancelScope violations. This happens because cancel scopes entered in one asyncio task context get exited in a different task (created by asyncio.wait_for in the upstream google-adk _cleanup_toolsets). The resulting CancelledError propagates upward and crashes the A2A event queue teardown, causing 500 errors. This fix applies multiple layers of defense: 1. _agent_executor.py: Wrap execute() with a top-level CancelledError guard that clears all pending task cancellations via Task.uncancel() (Python 3.11+) and publishes a failed status event instead of letting the error propagate to _run_event_stream in the A2A SDK. 2. _agent_executor.py: Run runner.close() in an isolated asyncio task via asyncio.gather(return_exceptions=True), so any CancelledError or cancel scope corruption stays contained. Only suppress the specific cross-task cancel scope error ("cancel scope" + "different task"), re-raise everything else. 3. _mcp_toolset.py: Override close() to catch BaseException (not just Exception), since CancelledError is a BaseException in Python 3.9+ and the upstream McpToolset.close() only catches Exception. Only suppress known anyio cross-task cancel scope errors. 4. _agent_executor.py: Widen _publish_failed_status_event catch from Exception to BaseException (re-raising KeyboardInterrupt/SystemExit) so residual CancelledError cannot escape the failure event publisher. Fixes kagent-dev#1276 Signed-off-by: Jaison Paul <paul.jaison@gmail.com>

Copilot AI review requested due to automatic review settings February 16, 2026 04:50

jsonmp-k8 requested review from EItanya, peterj and yuval-k as code owners February 16, 2026 04:50

Copilot started reviewing on behalf of jsonmp-k8 February 16, 2026 04:50 View session

Copilot AI reviewed Feb 16, 2026

View reviewed changes

python/packages/kagent-adk/src/kagent/adk/_agent_executor.py Outdated Show resolved Hide resolved

jmhbh previously approved these changes Feb 16, 2026

View reviewed changes

jsonmp-k8 force-pushed the fix/mcp-session-cleanup-cancel-scope-1276 branch from df8b4b9 to e4e1deb Compare February 17, 2026 04:03

jsonmp-k8 dismissed jmhbh’s stale review via e4e1deb February 17, 2026 22:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent MCP cancel scope errors from crashing agent-to-agent calls#1314

fix: prevent MCP cancel scope errors from crashing agent-to-agent calls#1314
jsonmp-k8 wants to merge 1 commit intokagent-dev:mainfrom
jsonmp-k8:fix/mcp-session-cleanup-cancel-scope-1276

jsonmp-k8 commented Feb 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

jmhbh left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

jsonmp-k8 commented Feb 16, 2026

Summary

Root Cause

Fix

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

jmhbh left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments