LLMs Playing Avalon: Benchmark and Agents

This is the official code of AvalonBench and the Avalon agent Strategist. The corresponding papers are AvalonBench: Evaluating LLMs Playing the Game of Avalon and Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search.

AvalonBench: Evaluating LLMs Playing the Game of Avalon

Based on AgentBench, we support Multi-Agent play of The Resistance: Avalon, a popular board game that requires the ability of deductive reasoning, coordinate and collaborate, and skill of deception.

Read the instructions below for how to run AvalonBench!

Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search (ICLR 2025)

In this work, we propose Strategist, which utilizes LLMs to acquire new skills for playing multi-agent games through a self-improvement process. Our method gathers quality feedback through self-play simulations with Monte Carlo tree search and LLM-based reflection, which can then be used to learn high-level strategic skills such as how to evaluate states that guide the low-level execution.

You can learn how to play with Strategist on AvalonBench at here, and the code/usage for bi-level tree search of Strategist can be found at the strategist folder.

News

[2025/01] 🎯Strategist has been accepted to ICLR 2025!
[2024/08] 🔥Try out our new agent, Strategist, by using the avalon-dev-single-discuss config, and find more details at Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search!
[2024/07] Our new agent SearchlightLLMAgentWithDiscussion is available at src/server/tasks/avalon/agents/search_agent.py. The academic paper will be coming soon.
[2023/11] 🎶Multi-LLM setting with AgentBench v0.2 is ready to roll! Details of the multi-agent submodule can be found here
[2023/11] ♠️We've added a new game called GOPS (Game of Pure Strategy [Wiki]). For more details of the code, please refer to here.
[2023/10] 🤖We've updated our code based on AgentBench v0.2. For the older version, please visit here.

Video Demos

GPT-3.5-turbo🤖 playing against rule-based bots in AvalonBench

single_llm_gpt3.5.mp4

GPT-4-turbo🤖 playing against rule-based bots in AvalonBench

single_llm_gpt_4.mp4

GPT-3.5-turbos🤖 playing against each other

multiplayer.mp4

Initial Results

LLMs Play Against Baseline Bots

Here are the results of LLMs playing against baseline bots.

Multi-LLMs Self-Play

We also let LLMs playing against each other. Evil has an 8:2 advantage over Good, which is similar to the stats of rookie human players! Here are also some examples of discussion under this setting.

Getting Started

Prerequisites

Install the dependencies.

conda create -n avalonbench python=3.9
conda activate avalonbench
pip install -r requirements.txt

OpenAI API Key

You need to fill your OPENAI API KEY in configs/agents/openai-chat first. Please replace <OPENAI_API_KEY> in Bearer <OPENAI_API_KEY> with your key.

Start the task server and the assigner

Start the game (3 is the number of workers)

python -m src.start_task -a --start avalon-dev-single 3

Open a new terminal and start the assigner

python -m src.assigner --config ./configs/assignments/test_avalon.yaml

Customize configurations and data

You can modify the file configs/tasks/avalon.yaml to configure the agent list. A config file looks like this:

default:
  module: "src.server.tasks.avalon.AvalonBench"
  parameters:
    num_players: 5
    discussion: False

avalon-dev-naive:
  parameters:
    name: "AvalonBench-dev-naive"
    data_file: "data/avalon/dev.json"
    agent_list: ["naive", "naive", "naive", "naive", "naive"]

avalon-dev-single:
  parameters:
    name: "AvalonBench-dev-single"
    data_file: "data/avalon/dev.json"
    agent_list: ["llm", "naive", "naive", "naive", "naive"]

where naive stands for the naive bots. Agents will play the roles with the same index in the data file (see following).

Note: There should only be one "llm" in the `agent_list`

You can also add data in data/avalon/dev.json (Note: Currently we only support the 5-player game setting, which includes 1 Merlin, 2 Servants, 1 Minion and 1 Assassin). A data item looks like this:

 {
     "num_players": 5,
     "quest_leader": 0,
     "role_names": ["Assassin", "Servant", "Servant", "Merlin", "Minion"]
 }

where quest_leader is the id of the initial quest leader in this game. You can change the game setup by altering quest_leader with number from 0 to 4, and by permuting role_names.

Naive experiment

You can also start a naive experiment using:

python -m src.start_task -a --start avalon-dev-naive 3

where all the agents are naive bots. For details of the naive strategies, please refer to the paper.

Play with Multi-LLM

You can also start a Multi-LLM experiment using:

python -m src.start_task -a --start avalon-dev-multi 3

where all the agents will be Large Language Models.

Play with Strategist

Our agent, Strategist, is also available in this repo. You can start the experiment using:

# Strategist playing against naive baselines
python -m src.start_task -a --start avalon-dev-single-search 1

Prompts

All the prompts are maintained in src/server/tasks/avalon/prompt.py. You can find the respective prompts used in src/server/tasks/avalon/agents/llm_with_discussion.py and src/server/tasks/avalon/wrapper.py.

Using game engines

We also provide our engines along with examples of usage for developers in avalonbench_dev.

You can import and use the game engine by running

from engine import AvalonGameEnvironment, AvalonConfig

First input your game configurations into AvalonBasicConfig, then create an AvalonGameEnvironment based on that.

For an example of how to use the game engine, see avalonbench_dev/avalon/test_engine.py

Citation

@inproceedings{
      light2025strategist,
      title={Strategist: Self-improvement of {LLM} Decision Making via Bi-Level Tree Search},
      author={Jonathan Light and Min Cai and Weiqin Chen and Guanzhi Wang and Xiusi Chen and Wei Cheng and Yisong Yue and Ziniu Hu},
      booktitle={The Thirteenth International Conference on Learning Representations},
      year={2025},
      url={https://openreview.net/forum?id=gfI9v7AbFg}
}

@inproceedings{
      light2023from,
      title={AvalonBench: Evaluating {LLM}s Playing the Game of Avalon},
      author={Jonathan Light and Min Cai and Sheng Shen and Ziniu Hu},
      booktitle={NeurIPS 2023 Foundation Models for Decision Making Workshop},
      year={2023},
      url={https://openreview.net/forum?id=ltUrSryS0K}
}

Name		Name	Last commit message	Last commit date
Latest commit History 293 Commits
assets		assets
avalonbench_dev		avalonbench_dev
configs		configs
data/avalon		data/avalon
docs		docs
good_examples/Avalon		good_examples/Avalon
multi_agent		multi_agent
src		src
strategist		strategist
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLMs Playing Avalon: Benchmark and Agents

AvalonBench: Evaluating LLMs Playing the Game of Avalon

Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search (ICLR 2025)

Table of Contents

News

Video Demos

Initial Results

LLMs Play Against Baseline Bots

Multi-LLMs Self-Play

Getting Started

Prerequisites

OpenAI API Key

Start the task server and the assigner

Customize configurations and data

Naive experiment

Play with Multi-LLM

Play with Strategist

Prompts

Using game engines

Citation

License

About

Releases

Packages

Contributors 3

Languages

jonathanmli/Avalon-LLM

Folders and files

Latest commit

History

Repository files navigation

LLMs Playing Avalon: Benchmark and Agents

AvalonBench: Evaluating LLMs Playing the Game of Avalon

Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search (ICLR 2025)

Table of Contents

News

Video Demos

Initial Results

LLMs Play Against Baseline Bots

Multi-LLMs Self-Play

Getting Started

Prerequisites

OpenAI API Key

Start the task server and the assigner

Customize configurations and data

Naive experiment

Play with Multi-LLM

Play with Strategist

Prompts

Using game engines

Citation

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages