browserbase · Kylejeong2 · May 24, 2025
diff --git a/README.md b/README.md
@@ -66,6 +66,16 @@ Enhance your Vercel applications with web-browsing capabilities. Build Generativ
 #### [**Braintrust Integration**](./examples/integrations/braintrust/)
 Integrate Browserbase with Braintrust for evaluation and testing of AI agent performance in web environments. Monitor, measure, and improve your browser automation workflows.
 
+#### [**MongoDB Integration**](./examples/integrations/mongodb/README.md)
+**Intelligent Web Scraping & Data Storage** - Extract structured data from e-commerce websites using Stagehand and store it in MongoDB for analysis. Perfect for building data pipelines, market research, and competitive analysis workflows.
+
+**Capabilities:**
+- AI-powered web scraping with Stagehand
+- Structured data extraction with schema validation
+- MongoDB storage for persistence and querying
+- Built-in data analysis and reporting
+- Robust error handling for production use
+
 ## 🏗️ Monorepo Structure
 
 ```
@@ -80,6 +90,7 @@ integrations/
 │       ├── langchain/           # LangChain framework integration
 │       ├── browser-use/         # Simplified browser automation
 │       ├── braintrust/          # Evaluation and testing tools
+│       ├── mongodb/             # MongoDB data extraction & storage
 │       └── agentkit/            # AgentKit implementations
 └── README.md                    # This file
 ```

diff --git a/examples/integrations/mongodb/.cursorrules b/examples/integrations/mongodb/.cursorrules
@@ -0,0 +1,140 @@
+# Stagehand Project
+
+This is a project that uses Stagehand, which amplifies Playwright with `act`, `extract`, and `observe` added to the Page class.
+
+`Stagehand` is a class that provides config, a `StagehandPage` object via `stagehand.page`, and a `StagehandContext` object via `stagehand.context`.
+
+`Page` is a class that extends the Playwright `Page` class and adds `act`, `extract`, and `observe` methods.
+`Context` is a class that extends the Playwright `BrowserContext` class.
+
+Use the following rules to write code for this project.
+
+- To take an action on the page like "click the sign in button", use Stagehand `act` like this:
+
+```typescript
+await page.act("Click the sign in button");
+```
+
+- To plan an instruction before taking an action, use Stagehand `observe` to get the action to execute.
+
+```typescript
+const [action] = await page.observe("Click the sign in button");
+```
+
+- The result of `observe` is an array of `ObserveResult` objects that can directly be used as params for `act` like this:
+
+  ```typescript
+  const [action] = await page.observe("Click the sign in button");
+  await page.act(action);
+  ```
+
+- When writing code that needs to extract data from the page, use Stagehand `extract`. Explicitly pass the following params by default:
+
+```typescript
+const { someValue } = await page.extract({
+  instruction: the instruction to execute,
+  schema: z.object({
+    someValue: z.string(),
+  }), // The schema to extract
+});
+```
+
+## Initialize
+
+```typescript
+import { Stagehand } from "@browserbasehq/stagehand";
+import StagehandConfig from "./stagehand.config";
+
+const stagehand = new Stagehand(StagehandConfig);
+await stagehand.init();
+
+const page = stagehand.page; // Playwright Page with act, extract, and observe methods
+const context = stagehand.context; // Playwright BrowserContext
+```
+
+## Act
+
+You can cache the results of `observe` and use them as params for `act` like this:
+
+```typescript
+const instruction = "Click the sign in button";
+const cachedAction = await getCache(instruction);
+
+if (cachedAction) {
+  await page.act(cachedAction);
+} else {
+  try {
+    const results = await page.observe(instruction);
+    await setCache(instruction, results);
+    await page.act(results[0]);
+  } catch (error) {
+    await page.act(instruction); // If the action is not cached, execute the instruction directly
+  }
+}
+```
+
+Be sure to cache the results of `observe` and use them as params for `act` to avoid unexpected DOM changes. Using `act` without caching will result in more unpredictable behavior.
+
+Act `action` should be as atomic and specific as possible, i.e. "Click the sign in button" or "Type 'hello' into the search input".
+AVOID actions that are more than one step, i.e. "Order me pizza" or "Type in the search bar and hit enter".
+
+## Extract
+
+If you are writing code that needs to extract data from the page, use Stagehand `extract`.
+
+```typescript
+const signInButtonText = await page.extract("extract the sign in button text");
+```
+
+You can also pass in params like an output schema in Zod, and a flag to use text extraction:
+
+```typescript
+const data = await page.extract({
+  instruction: "extract the sign in button text",
+  schema: z.object({
+    text: z.string(),
+  }),
+});
+```
+
+`schema` is a Zod schema that describes the data you want to extract. To extract an array, make sure to pass in a single object that contains the array, as follows:
+
+```typescript
+const data = await page.extract({
+  instruction: "extract the text inside all buttons",
+  schema: z.object({
+    text: z.array(z.string()),
+  }),
+  useTextExtract: true, // Set true for larger-scale extractions (multiple paragraphs), or set false for small extractions (name, birthday, etc)
+});
+```
+
+## Agent
+
+Use the `agent` method to automonously execute larger tasks like "Get the stock price of NVDA"
+
+```typescript
+// Navigate to a website
+await stagehand.page.goto("https://www.google.com");
+
+const agent = stagehand.agent({
+  // You can use either OpenAI or Anthropic
+  provider: "openai",
+  // The model to use (claude-3-7-sonnet-20250219 or claude-3-5-sonnet-20240620 for Anthropic)
+  model: "computer-use-preview",
+
+  // Customize the system prompt
+  instructions: `You are a helpful assistant that can use a web browser.
+	Do not ask follow up questions, the user will trust your judgement.`,
+
+  // Customize the API key
+  options: {
+    apiKey: process.env.OPENAI_API_KEY,
+  },
+});
+
+// Execute the agent
+await agent.execute(
+  "Apply for a library card at the San Francisco Public Library"
+);
+```
diff --git a/examples/integrations/mongodb/.env.example b/examples/integrations/mongodb/.env.example
@@ -0,0 +1,15 @@
+# MongoDB Connection
+
+# Local MongoDB instance
+# MONGO_URI=mongodb://localhost:27017
+
+# MongoDB Atlas connection string format:
+# MONGO_URI=mongodb+srv://<username>:<password>@<cluster>.mongodb.net/<database>?retryWrites=true&w=majority
+
+# Database name
+DB_NAME=scraper_db
+
+BROWSERBASE_PROJECT_ID="YOUR_BROWSERBASE_PROJECT_ID"
+BROWSERBASE_API_KEY="YOUR_BROWSERBASE_API_KEY"
+OPENAI_API_KEY="THIS_IS_OPTIONAL_WITH_ANTHROPIC_KEY"
+ANTHROPIC_API_KEY="THIS_IS_OPTIONAL_WITH_OPENAI_KEY"
diff --git a/examples/integrations/mongodb/.gitignore b/examples/integrations/mongodb/.gitignore
@@ -0,0 +1,7 @@
+.env
+node_modules
+tmp
+downloads
+.DS_Store
+dist
+cache.json
diff --git a/examples/integrations/mongodb/LICENSE b/examples/integrations/mongodb/LICENSE
@@ -0,0 +1,7 @@
+Copyright 2025 Browserbase, Inc 
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
diff --git a/examples/integrations/mongodb/README.md b/examples/integrations/mongodb/README.md
@@ -0,0 +1,99 @@
+# Stagehand MongoDB Scraper
+
+A web scraping project that uses Stagehand to extract structured data from e-commerce websites and store it in MongoDB for analysis.
+
+## Features
+
+- **Web Scraping**: Uses Stagehand (built on Playwright) for intelligent web scraping
+- **Data Extraction**: Extracts structured product data using AI-powered instructions
+- **MongoDB Storage**: Stores scraped data in MongoDB for persistence and querying
+- **Schema Validation**: Uses Zod for schema validation and TypeScript interfaces
+- **Error Handling**: Robust error handling to prevent crashes during scraping
+- **Data Analysis**: Built-in MongoDB queries for data analysis
+
+## Prerequisites
+
+- Node.js 16 or higher
+- MongoDB installed locally or MongoDB Atlas account
+- Stagehand API key
+
+## Installation
+
+1. Clone the repository:
+   ```
+   git clone <repository-url>
+   cd stagehand-mongodb-scraper
+   ```
+
+2. Install dependencies:
+   ```
+   npm install
+   ```
+
+3. Set up environment variables:
+   ```
+   # Create a .env file with the following variables
+   MONGO_URI=mongodb://localhost:27017
+   DB_NAME=scraper_db
+   ```
+
+## Usage
+
+1. Start MongoDB locally:
+   ```
+   mongod
+   ```
+
+2. Run the scraper:
+   ```
+   npm start
+   ```
+
+3. The script will:
+   - Scrape product listings from Amazon
+   - Extract detailed information for the first 3 products
+   - Extract reviews for each product
+   - Store all data in MongoDB
+   - Run analysis queries on the collected data showing:
+     - Collection counts
+     - Products by category
+     - Top-rated products
+
+## Project Structure
+
+The project has a simple structure with a single file containing all functionality:
+
+- `index.ts`: Contains the complete implementation including:
+  - MongoDB connection and data operations
+  - Schema definitions
+  - Scraping functions
+  - Data analysis
+  - Main execution logic
+- `stagehand.config.js`: Stagehand configuration
+- `.env.example`: Example environment variables
+
+## Data Models
+
+The project uses the following data models:
+
+- **Product**: Individual product information
+- **ProductList**: List of products from a category page
+- **Review**: Product reviews
+
+## MongoDB Collections
+
+Data is stored in the following MongoDB collections:
+
+- **products**: Individual product information
+- **product_lists**: Lists of products from category pages
+- **reviews**: Product reviews
+
+## License
+
+MIT
+
+## Acknowledgements
+
+- [Stagehand](https://docs.stagehand.dev/) for the powerful web scraping capabilities
+- [MongoDB](https://www.mongodb.com/) for the flexible document database
+- [Zod](https://zod.dev/) for runtime schema validation