Skip to content

Features stuck in 'in_progress' after server restart - cannot resume #696

@JasonBroderick

Description

@JasonBroderick

Bug Report: Features Stuck in "in_progress" After Server Restart

Summary

Features remain stuck in "in_progress" status after server restart, making them unresumable. The UI shows "Resume" buttons, but clicking them fails with "already running" error even though no agent process is actually running.

Environment

  • OS: Ubuntu 24.04 LTS (WSL2)
  • Docker: Docker Desktop with WSL2 backend
  • Automaker Version: v0.14.0rc (branch)
  • Deployment: Production Docker (docker-compose.yml)

Steps to Reproduce

  1. Start Automaker via Docker: docker compose up -d
  2. Create a project and add features to the backlog
  3. Start one or more features (they begin processing with status "in_progress")
  4. While features are running, restart the server:
    • docker restart automaker-server
    • Or let computer sleep/wake (causes WebSocket disconnect)
    • Or docker compose down && docker compose up -d
  5. Refresh the UI
  6. Observe features show "Resume" button
  7. Click "Resume" - fails with error or nothing happens

Expected Behavior

After server restart, the server should:

  1. Detect that features marked "in_progress" have no active agent process
  2. Mark these features as "interrupted" or "resumable"
  3. Allow the user to resume them successfully

Actual Behavior

  • Features remain with "status": "in_progress" in their state files
  • Server reads this stale state and believes features are still running
  • Clicking "Resume" triggers error: Error: already running
  • Features are effectively stuck and cannot be resumed or restarted without manual intervention

Error Logs

[ERROR] [AutoMode] Resume feature ai-chat-interface error: Error: already running
    at AutoModeService.resumeFeature (file:///app/apps/server/dist/services/auto-mode-service.js:1143:14)
    at file:///app/apps/server/dist/routes/auto-mode/routes/resume-feature.js:21:18

Root Cause Analysis

Feature state is persisted to disk in .automaker/features/<feature-id>/feature.json:

{
  "id": "ai-chat-interface",
  "title": "AI Chat Interface for Idea Exploration",
  "status": "in_progress",
  "updatedAt": "2026-01-25T12:37:11.720Z"
}

When the server restarts:

  1. The actual agent process is terminated (container restart kills all child processes)
  2. The state file is NOT updated (no graceful shutdown handler)
  3. On startup, server reads the stale state file
  4. Server assumes the feature is "running" based solely on the state file
  5. No process liveness check is performed

Workaround

Manually reset stuck features by changing status in the JSON files:

docker exec automaker-server bash -c '
for f in /projects/<project>/.automaker/features/*/feature.json; do
  if grep -q "in_progress" "$f"; then
    sed -i "s/\"status\": \"in_progress\"/\"status\": \"pending\"/" "$f"
  fi
done
'

Suggested Fix

Option A: Orphan Detection on Startup (Recommended)

On server startup, scan all projects for features with "in_progress" status and:

  1. Check if an actual agent process exists for that feature
  2. If not, mark the feature as "interrupted" (preserving any partial progress)
  3. Allow user to resume from the interrupted state

Option B: Graceful Shutdown Handler

Register a shutdown handler (SIGTERM, SIGINT) that:

  1. Marks all "in_progress" features as "interrupted"
  2. Saves state before exit

Note: Option B alone is insufficient because it doesn't handle crashes, OOM kills, or kill -9.

Option C: Process Tracking with PIDs

Store the agent process PID in the feature state, then on startup:

  1. Check if that PID is still running
  2. If not, mark as interrupted

Frequency

This bug occurs every time the server restarts while features are in progress. It's 100% reproducible.

Impact

  • Severity: Medium-High
  • User Impact: Features become permanently stuck, requiring manual file editing to fix
  • Workaround Available: Yes, but requires CLI access and technical knowledge

Related Files

  • apps/server/dist/services/auto-mode-service.js - Resume logic with "already running" check
  • .automaker/features/<id>/feature.json - Persisted feature state

Reported by: @JasonBroderick
Discovered with assistance from: Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions