WebArena is a standalone, self-hostable web environment for building autonomous agents
Website • Paper • Leaderboard
Here is the branch of WebArena used in AutoManual. We have made the following corrections and improvements to the original WebArena so that LLM Agents can obtain correct information:
- After each action is executed, an additional "none_action" is executed, to ensure that the action has been completed.
- The scroll bar information of the web page is added to the observation, to indicate the location of the current visible area.
- When a focused element has a property of "hasPopup", its current value and options in the popup menu will be displayed. (Fix the issue that the original env cannot display the content in the popup menu)
- Add "select" action for id-based actions, allowing the agent to select an option in a popup menu.
- Add "fill" action for id-based actions, allowing the agent to clear existing content before type.
- Fix the issue of fuzzy match a list.
To facilitate comparison with the original branch, we added a comment starting with "# [cmh]:" to each modified code.
We also fixed some annotation issues. (We will continue to update more annotation issues)
Reddit: |
---|
task_id 28: "Tell me the count of comments that have received more downvotes than upvotes for the user who made the latest post on the Worcester forum." The answer should be 1 instead of 0. |
task_id 407, 408: "Upvote the newest post in deeplearning subreddit" The original answer checked the hottest post instead of the latest post. |
task_id 583: "Create a new forum named PlantsForCatParents, with a description of Cat parents & plan lovers, and include ['Cat friendly', 'Local vendors', 'Promotion', 'Toxic plants!'] in the sidebar?" It should be "plant lovers" instead of "plan lovers". |
task_id 584: "Create a new forum named Karaoke, with a description of Place for Karaoke lovers, and include ['devices', 'setup'] in the sidebar?" The URL in the answer check should be changed to "__REDDIT__/f/Karaoke/edit" instead of "__REDDIT__/f/Karaoke" . |
task_id 603, 608: "Post my question, "safe and budge apartment to live in nyc", in a subreddit where I'm likely to get an answer" in which "budge apartment" should be "budget apartment". |
task_id 644: "Post a notice on a virtual meetup for Tears of Kingdom enthusiasts on Dec 15th in the games subreddit" There is only a "gaming" subreddit in the environment, not a "games" subreddit. |
# Python 3.10+
conda create -n webarena python=3.10; conda activate webarena
pip install -r requirements.txt
playwright install
pip install -e .
Then setup the standalone environment. Please check out this page for details.
Important
After evaluating the 812 examples, reset the environment to the initial state following the instructions here.
Solve the issue of the rate limit for reddit:
# Find the container id of postmill
docker container ls
# Enter the container
docker exec -it a7b6610b623c bash
# Modify the user trust level to send 15 messages every 5 minutes
psql -U postmill -d postmill
UPDATE users SET trusted = true WHERE username = 'MarvelsGrantMan136';
\q
Configurate the urls for each website by setting your AWS hostname.
export AWS_HOSTNAME="<your-server-hostname>"
export OPENAI_API_KEY="<your-api-key>" # a valid OpenAI API key starts with sk-
export OPENAI_BASE_URL="<your-base-url>" # e.g., https://api.openai.com/v1
If you use the environment or data, please cite the paper:
@article{zhou2023webarena,
title={WebArena: A Realistic Web Environment for Building Autonomous Agents},
author={Zhou, Shuyan and Xu, Frank F and Zhu, Hao and Zhou, Xuhui and Lo, Robert and Sridhar, Abishek and Cheng, Xianyi and Bisk, Yonatan and Fried, Daniel and Alon, Uri and others},
journal={arXiv preprint arXiv:2307.13854},
year={2023}
}