Name		Name	Last commit message	Last commit date
parent directory ..
agent		agent
browser_env		browser_env
config_files		config_files
environment_docker		environment_docker
evaluation_harness		evaluation_harness
llms		llms
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval-gpt-3.5-turbo.sh		eval-gpt-3.5-turbo.sh
eval-gpt-4.sh		eval-gpt-4.sh
eval-tgi.sh		eval-tgi.sh
prepare.sh		prepare.sh
run.py		run.py
setup.cfg		setup.cfg
setup.py		setup.py

README.md

WebArena

This evaluation code is adapted from WebArena.

Install

# Python 3.10+
conda create -n webarena python=3.10; conda activate webarena
pip install -r requirements.txt
playwright install
pip install -e .

Evaluation

Setup the standalone environment.

Please check out this page for details.

Configurate the urls for each website.

export SHOPPING="<your_shopping_site_domain>:7770"
export SHOPPING_ADMIN="<your_e_commerce_cms_domain>:7780/admin"
export REDDIT="<your_reddit_domain>:9999"
export GITLAB="<your_gitlab_domain>:8023"
export MAP="<your_map_domain>:3000"
export WIKIPEDIA="<your_wikipedia_domain>:8888/wikipedia_en_all_maxi_2022-05/A/User:The_other_Kiwix_guy/Landing"
export HOMEPAGE="<your_homepage_domain>:4399" # this is a placeholder

Obtain the auto-login cookies for all websites
```
bash prepare.sh
```
Launch the evaluation
```
bash eval-tgi.sh
```
or bash eval-gpt-3.5-turbo.sh to run with GPT-3.5, or bash eval-gpt-4.sh to run with GPT-4.

When evaluating, you need to modify the scripts to provide your OpenAI API key because the fuzzy match action requires access to OpenAI. When evaluating TGI models, you also need to modify the scripts to provide your TGI controller address.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

webarena

webarena

README.md

WebArena

Install

Evaluation

Files

webarena

Directory actions

More options

Directory actions

More options

Latest commit

History

webarena

Folders and files

parent directory

README.md

WebArena

Install

Evaluation