Skip to content

Critical Stability Issues: App Freezing & Hanging Analysis #731

@3dyuval

Description

@3dyuval

Problem Statement

Multiple critical stability issues are causing opencode to freeze or hang indefinitely, requiring external intervention to recover. Analysis of 8+ reported issues reveals systematic problems in process management, file handling, and external service integration.

Root Cause Categories

1. File System Bottlenecks

  • Large files (multi-GB .tar.gz) cause indefinite hangs
  • Synchronous file operations without size limits
  • Impact: Complete app freeze, "session is busy" errors

2. Process Management Gaps

  • Non-terminating processes make TUI unresponsive
  • Missing stdin handling causes sudo command deadlocks
  • Impact: ESC key ineffective, 1-minute timeout waits

3. MCP Server Failures ⚠️ Critical Missing Case

  • Failed MCP servers cause indefinite hangs during startup/operation
  • No timeout or health checks for MCP server connections
  • Impact: Complete app freeze when MCP servers fail to start or respond

4. Network/Firewall Conflicts

  • Hangs with network monitors (LuLu, Little Snitch)
  • No retry logic after firewall approval
  • Impact: Requires manual restart after network permission

5. Terminal Integration Issues

  • Scroll-related freezes, terminal corruption
  • Complex scroll debouncing logic causes instability
  • Impact: Complete terminal freeze, affects new sessions

Technical Evidence

Bash Tool Issues (packages/opencode/src/tool/bash.ts:30-38)

const process = Bun.spawn({
  cmd: ["bash", "-c", params.command],
  // Missing: stdin handling, process type detection, size limits
})

MCP Server Initialization (packages/opencode/src/mcp/index.ts:35-42)

const client = await experimental_createMCPClient({
  // ...
}).catch(() => {}) // Silent failure, no timeout

TUI Interrupt Handling (packages/tui/internal/tui/tui.go:256-272)

  • Complex debounce logic for interrupts
  • No direct process termination capability

Affected Issues

Proposed Solutions

Phase 1: Critical Fixes (Immediate)

  1. File Size Guards - Add size validation before processing
  2. MCP Timeouts - Connection timeouts and health checks for MCP servers
  3. Command Detection - Identify interactive/long-running commands
  4. Process Interruption - Basic process termination capabilities

Phase 2: Architectural Improvements

  1. Streaming File Operations - Progressive loading with size limits
  2. Network Circuit Breakers - Retry logic and graceful degradation
  3. Resource Management - Memory/CPU limits and monitoring
  4. Error Boundaries - Component isolation and recovery

Success Criteria

  • ✅ Zero indefinite hangs on large files
  • ✅ Reliable process interruption with ESC key
  • ✅ Clean recovery from MCP server failures
  • ✅ Stable terminal behavior across platforms
  • ✅ Clear error messages when operations are blocked

Documentation

  • Branch: stability-analysis-issue
  • Full analysis: STABILITY_ANALYSIS.md
  • Architectural decisions: ARCHITECTURAL_DECISIONS.md
  • Implementation roadmap: 8 ADRs with phased approach

Priority: Critical - These issues make opencode unusable in common scenarios and require external intervention to recover.


Analysis based on issues #519, #721, #706, #683, #682, #652, #471, #421, #504 + comprehensive codebase review

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions