-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Fix MCP server initial connection failures with retry logic #6857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Added retry mechanism (3 attempts) for stdio server connections - Improved error handling and connection sequencing in McpHub - Added retry logic in useMcpToolTool for connection failures - Fixed race condition where servers were not fully initialized on first tool call - Updated tests to handle new async connection flow Fixes issue where MCP servers fail with "Connection closed" error on initial call
- Fixed stderr property access on transport types - Moved command/args variables to proper scope for retry logic - Added proper type casting for StdioClientTransport methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewing my own code is like debugging in a mirror - everything looks backwards but the bugs are still mine.
src/core/tools/useMcpToolTool.ts
Outdated
| // Try to execute the tool with retry logic for connection failures | ||
| let toolResult: any | ||
| let retryCount = 0 | ||
| const maxRetries = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The retry count here is 2, but in McpHub.ts it's 3. Should we standardize this across both files for consistency? Maybe extract these to configuration constants?
src/core/tools/useMcpToolTool.ts
Outdated
| const errorMessage = error?.message || String(error) | ||
|
|
||
| // Check if this is a connection closed error that might benefit from a retry | ||
| if (errorMessage.includes("Connection closed") || errorMessage.includes("No connection found")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider also checking for specific error codes if the MCP SDK provides them, not just string patterns. This would make error detection more robust.
src/services/mcp/McpHub.ts
Outdated
| // Wait before retrying | ||
| await new Promise((resolve) => setTimeout(resolve, retryDelay)) | ||
|
|
||
| // Create a new transport for the retry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When creating a new transport during retry, should we explicitly clean up the old transport's event handlers first to prevent potential memory leaks?
src/services/mcp/McpHub.ts
Outdated
| let connected = false | ||
| let retryCount = 0 | ||
| const maxRetries = 3 | ||
| const retryDelay = 1000 // 1 second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider implementing exponential backoff (e.g., 1s, 2s, 4s) instead of a fixed delay. This would be more resilient to temporary server overload situations.
| // Create McpHub and let it initialize | ||
| const mcpHub = new McpHub(mockProvider as ClineProvider) | ||
| await new Promise((resolve) => setTimeout(resolve, 100)) | ||
| await new Promise((resolve) => setTimeout(resolve, 200)) // Increased timeout to allow for connection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to add specific test cases for the retry logic - simulating connection failures and verifying that retries happen with the correct delays and counts.
|
@roomote-agent Why did we add retry logic instead of just fixing the race condition? Also.. i think the issue described about getting the error -32000 does not only happen with mcp servers installed from the marketplace. |
|
Hi @hannesrudolph, thanks for the review! I see your concerns about the retry logic approach versus fixing the root race condition, and you're right that the -32000 error might not be limited to marketplace-installed MCP servers. Let me investigate the code more thoroughly to understand the actual race condition and see if we can address it directly rather than just adding retries. I'll also check if this issue affects all MCP servers, not just marketplace ones. |
- Set up stderr handler BEFORE starting transport to avoid race condition - Remove unnecessary retry logic from both McpHub and useMcpToolTool - Fix the root cause: handlers were being set up after transport.start() - This ensures no events are missed between start() and handler setup - Applies to all stdio MCP servers, not just marketplace ones
|
Hi @hannesrudolph, you were absolutely right on both points! What I found:
The fix I implemented:Instead of adding retry logic (which was just a band-aid), I fixed the root cause by:
The changes are much cleaner and address the actual problem. All tests pass successfully. Thanks for the insightful review - this is a much better solution than the retry approach! |
|
This was already fixed on PR #6878 |
Summary
This PR fixes the issue where MCP (Model Context Protocol) servers fail on the initial call with a "Connection closed" error (-32000), but work fine on subsequent attempts. This issue specifically occurs when installing MCP servers from the marketplace.
Problem
The root cause was a race condition where:
useMcpToolTool.tsdidn't account for temporary connection issuesSolution
Changes in
McpHub.ts:Changes in
useMcpToolTool.ts:Changes in test files:
McpHub.spec.tstests to handle the new async connection flowTesting
Impact
This fix ensures that MCP servers installed from the marketplace work reliably on the first attempt, improving the user experience and preventing frustrating connection errors.
Fixes: Issue with MCP servers failing on initial connection
Important
Adds retry logic and improves error handling for MCP server connections in
McpHub.tsanduseMcpToolTool.ts, with updated tests inMcpHub.spec.ts.McpHub.tsfor stdio server connections (3 attempts, 1-second delay).McpHub.ts.useMcpToolTool.tsfor tool execution failures (2 retries, 1-second delay).useMcpToolTool.tsto better detect connection-related errors.useMcpToolTool.ts.McpHub.spec.tsto handle new async connection flow and test both "connecting" and "connected" states.This description was created by
for 625c0cb. You can customize this summary. It will automatically update as commits are pushed.