A demo showing how the Claude Agent SDK (reasoning) combines with Stagehand (AI browser automation framework) to create powerful agentic browser automation. Because Stagehand accepts natural language instructions, it's significantly more context-efficient than native Playwright.
This demo illustrates a clean separation of concerns:
- Claude Agent SDK: Handles all reasoning, planning, and decision-making
- Stagehand: Executes browser actions via natural language commands
- Result: Context-efficient automation where Claude decides what to do and Stagehand handles how to do it
Stagehand saves thousands of tokens per interaction by handling DOM traversal and selector logic internally.
// Traditional: ~500 tokens of context + implementation
- Full DOM structure passed to Claude
- Claude generates: await page.click('button[data-testid="auth-submit"][aria-label="Submit"]');
- Breaks if UI changes
// Stagehand: ~50 tokens
- Claude calls: act({ action: "click the submit button" })
- Stagehand figures out the selector
- Resilient to UI changes
npm install
Requirements: Chrome must be installed on your system.
Set your Anthropic API key:
export ANTHROPIC_API_KEY="your-api-key"
Note: On first run, the demo will automatically copy your Chrome user data directory to .chrome-profile
for browser automation. This preserves your cookies and logged-in sessions.
npx tsx agent-browse.ts
npx tsx agent-browse.ts "Go to Hacker News and get the title of the top post"
After Claude responds, you can:
- Ask follow-up questions
- Give new instructions
- Type
exit
orquit
to end
# Complex multi-step workflow
npx tsx agent-browse.ts "Go to Hacker News, find the top post, click it, and summarize what it's about"
# Data extraction with reasoning
npx tsx agent-browse.ts "Navigate to example.com and extract any contact information you can find"
# Adaptive navigation
npx tsx agent-browse.ts "Go to github.com/browserbase/stagehand, take a screenshot, then find and click the documentation link"
Claude will:
- Plan the steps needed (reasoning via Agent SDK)
- Execute each step using Stagehand tools (natural language browser actions)
- Adapt based on what it sees (screenshots, extracted data)
- Report back with results
The demo exposes 6 Stagehand browser automation tools via MCP:
Tool | Description | Example |
---|---|---|
navigate |
Go to a URL | navigate({ url: "https://example.com" }) |
act |
Perform actions via natural language | act({ action: "click the login button" }) |
extract |
Get structured data from the page | extract({ instruction: "extract the title", schema: { title: "string" } }) |
observe |
Discover what's on the page | observe({ query: "find all buttons" }) |
screenshot |
Capture the current page | screenshot({}) |
close_browser |
Clean up when done | close_browser({}) |
const q = query({
prompt: generateMessages(),
options: {
mcpServers: {
"stagehand": stagehandServer // Register Stagehand tools
},
allowedTools: [
"mcp__stagehand__navigate",
"mcp__stagehand__act",
"mcp__stagehand__extract",
"mcp__stagehand__observe",
"mcp__stagehand__screenshot",
"mcp__stagehand__close_browser"
]
}
});
The flow:
- Claude (via Agent SDK) decides what browser action to take
- Claude calls a Stagehand MCP tool with natural language parameters
- Stagehand translates the natural language into precise browser actions
- Results flow back to Claude for the next decision
Install Chrome for your platform:
- macOS: https://www.google.com/chrome/
- Windows: https://www.google.com/chrome/
- Linux:
sudo apt install google-chrome-stable
To refresh cookies from your main Chrome profile:
rm -rf .chrome-profile