🌐 Browser-Use

Open-Source Web Automation with LLMs

Let LLMs interact with websites through a simple interface.

Short Example

pip install browser-use

from langchain_openai import ChatOpenAI
from browser_use import Agent

agent = Agent(
    task="Go to hackernews on show hn and give me top 10 post titles, their points and hours. Calculate for each the ratio of points per hour.",
    llm=ChatOpenAI(model="gpt-4o"),
)

# ... inside an async function
await agent.run()

Demo

Prompt: Go to hackernews on show hn and give me top 10 post titles, their points and hours. Calculate for each the ratio of points per hour. (1x speed)

Prompt: Search the top 3 AI companies 2024 and find what out what concrete hardware each is using for their model. (1x speed)

Prompt: Go to kayak.com and find a one-way flight from Zürich to San Francisco on 12 January 2025. (2.5x speed)

Prompt: Opening new tabs and searching for images for these people: Albert Einstein, Oprah Winfrey, Steve Jobs. (2.5x speed)

Local Setup

Create a virtual environment and install dependencies:

# I recommend using uv
pip install .

Add your API keys to the .env file:

cp .env.example .env

E.g. for OpenAI:

OPENAI_API_KEY=

You can use any LLM model supported by LangChain by adding the appropriate environment variables. See langchain models for available options.

Features

Universal LLM Support - Works with any Language Model
Interactive Element Detection - Automatically finds interactive elements
Multi-Tab Management - Seamless handling of browser tabs
XPath Extraction for scraping functions - No more manual DevTools inspection
Vision Model Support - Process visual page information
Customizable Actions - Add your own browser interactions (e.g. add data to database which the LLM can use)
Handles dynamic content - dont worry about cookies or changing content
Chain-of-thought prompting with memory - Solve long-term tasks
Self-correcting - If the LLM makes a mistake, the agent will self-correct its actions

Advanced Examples

Chain of Agents

You can persist the browser across multiple agents and chain them together.

from asyncio import run
from browser_use import Agent, Controller
from dotenv import load_dotenv
from langchain_anthropic import ChatAnthropic
load_dotenv()

# Persist browser state across agents
controller = Controller()

# Initialize browser agent
agent1 = Agent(
    task="Open 3 VCs websites in the New York area.",
    llm=ChatAnthropic(model="claude-3-5-sonnet-20240620", timeout=25, stop=None),
    controller=controller)
agent2 = Agent(
    task="Give me the names of the founders of the companies in all tabs.",
    llm=ChatAnthropic(model="claude-3-5-sonnet-20240620", timeout=25, stop=None),
    controller=controller)

run(agent1.run())
founders, history = run(agent2.run())

print(founders)

You can use the history to run the agents again deterministically.

Command Line Usage

Run examples directly from the command line (clone the repo first):

python examples/try.py "Your query here" --provider [openai|anthropic]

Anthropic

You need to add ANTHROPIC_API_KEY to your environment variables. Example usage:

python examples/try.py "Search the top 3 AI companies 2024 and find out in 3 new tabs what hardware each is using for their models" --provider anthropic

OpenAI

You need to add OPENAI_API_KEY to your environment variables. Example usage:

python examples/try.py "Go to hackernews on show hn and give me top 10 post titles, their points and hours. Calculate for each the ratio of points per hour. " --provider anthropic

🤖 Supported Models

All LangChain chat models are supported. Tested with:

GPT-4o
GPT-4o Mini
Claude 3.5 Sonnet
LLama 3.1 405B

Limitations

When extracting page content, the message length increases and the LLM gets slower.
Currently one agent costs about 0.01$
Sometimes it tries to repeat the same task over and over again.
Some elements might not be extracted which you want to interact with.
What should we focus on the most?
- Robustness
- Speed
- Cost reduction

Roadmap

Save agent actions and execute them deterministically
Pydantic forced output
Third party SERP API for faster Google Search results
Multi-step action execution to increase speed
Test on mind2web dataset
Add more browser actions

Contributing

Contributions are welcome! Feel free to open issues for bugs or feature requests.

Feel free to join the Discord for discussions and support.

Star ⭐ this repo if you find it useful!
Made with ❤️ by the Browser-Use team

Name		Name	Last commit message	Last commit date
Latest commit History 141 Commits
.github/workflows		.github/workflows
.vscode		.vscode
browser_use		browser_use
examples		examples
static		static
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
conftest.py		conftest.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🌐 Browser-Use

Open-Source Web Automation with LLMs

Short Example

Demo

Local Setup

Features

Advanced Examples

Chain of Agents

Command Line Usage

Anthropic

OpenAI

🤖 Supported Models

Limitations

Roadmap

Contributing

About

Uh oh!

Releases

Packages

Languages

License

sunjae1294/browser-use

Folders and files

Latest commit

History

Repository files navigation

🌐 Browser-Use

Open-Source Web Automation with LLMs

Short Example

Demo

Local Setup

Features

Advanced Examples

Chain of Agents

Command Line Usage

Anthropic

OpenAI

🤖 Supported Models

Limitations

Roadmap

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages