A Model Context Protocol (MCP) server that provides AI-powered web automation capabilities using Stagehand. This server enables LLMs to interact with web pages, perform actions, extract data, and observe possible actions in a real browser environment.
-
Run
npm install
to install the necessary dependencies, then runnpm run build
to getdist/index.js
. -
Set up your Claude Desktop configuration to use the server.
{
"mcpServers": {
"stagehand": {
"command": "node",
"args": ["path/to/mcp-server-browserbase/stagehand/dist/index.js"],
"env": {
"BROWSERBASE_API_KEY": "<YOUR_BROWSERBASE_API_KEY>",
"BROWSERBASE_PROJECT_ID": "<YOUR_BROWSERBASE_PROJECT_ID>",
"OPENAI_API_KEY": "<YOUR_OPENAI_API_KEY>",
}
}
}
}
-
Restart your Claude Desktop app and you should see the tools available clicking the 🔨 icon.
-
Start using the tools! Below is a demo video of Claude doing a Google search for OpenAI using stagehand MCP server and Browserbase for a remote headless browser.
-
stagehand_navigate
- Navigate to any URL in the browser
- Input:
url
(string): The URL to navigate to
-
stagehand_act
- Perform an action on the web page
- Inputs:
action
(string): The action to perform (e.g., "click the login button")variables
(object, optional): Variables used in the action template
-
stagehand_extract
- Extract data from the web page
-
stagehand_observe
- Observe actions that can be performed on the web page
- Input:
instruction
(string, optional): Instruction for observation
The server provides access to one resource:
-
Console Logs (
console://logs
)- Browser console output in text format
- Includes all console messages from the browser
-
Screenshots (
screenshot://<n>
)- PNG images of captured screenshots
- Accessible via the screenshot name specified during capture
The codebase is organized into the following modules:
- index.ts: Entry point that initializes and runs the server.
- server.ts: Core server logic, including server creation, configuration, and request handling.
- tools.ts: Definitions and implementations of tools that can be called by MCP clients.
- prompts.ts: Prompt templates that can be used by MCP clients.
- resources.ts: Resource definitions and handlers for resource-related requests.
- logging.ts: Comprehensive logging system with rotation and formatting capabilities.
- utils.ts: Utility functions including JSON Schema to Zod schema conversion and message sanitization.
The main entry point for the application. It:
- Initializes the logging system
- Creates the server instance
- Connects to the stdio transport to receive and respond to requests
Contains core server functionality:
- Creates and configures the MCP server
- Defines Stagehand configuration
- Sets up request handlers for all MCP operations
- Manages the Stagehand browser instance
Implements the tools that can be called by MCP clients:
stagehand_navigate
: Navigate to URLsstagehand_act
: Perform actions on web elementsstagehand_extract
: Extract structured data from web pagesstagehand_observe
: Observe elements on the pagescreenshot
: Take screenshots of the current page
Defines prompt templates for MCP clients:
click_search_button
: Template for clicking search buttons
Manages resources in the MCP protocol:
- Currently provides empty resource and resource template lists
Implements a comprehensive logging system:
- File-based logging with rotation
- In-memory operation logs
- Log formatting and sanitization
- Console logging for debugging
Provides utility functions:
jsonSchemaToZod
: Converts JSON Schema to Zod schema for validationsanitizeMessage
: Ensures messages are properly formatted JSON
- AI-powered web automation
- Perform actions on web pages
- Extract structured data from web pages
- Observe possible actions on web pages
- Simple and extensible API
- Model-agnostic support for various LLM providers
BROWSERBASE_API_KEY
: API key for BrowserBase authenticationBROWSERBASE_PROJECT_ID
: Project ID for BrowserBaseOPENAI_API_KEY
: API key for OpenAI (used by Stagehand)DEBUG
: Enable debug logging
This server implements the following MCP capabilities:
- Tools: Allows clients to call tools that control a browser instance
- Prompts: Provides prompt templates for common operations
- Resources: (Currently empty but structured for future expansion)
- Logging: Provides detailed logging capabilities
For more information about the Model Context Protocol, visit:
Licensed under the MIT License.
Copyright 2024 Browserbase, Inc.