In this document, we will discuss the key concepts and design choices of ChatArena. We expect this will be helpful particularly for developers who want to contribute to ChatArena or build their own environments.
ChatArena in general follows the design principle of openAI gym [1] and pettingzoo [2]. Any agent will interact with the environment and other agents through the agent environment cycle. For every single cycle,
- the agent observes the environment
- the agent output an action
- the environment makes a state transition given the action
As an optional feature, in each cycle, the environment can also compute a scalar reward for every single agent, along with a terminal signal for the environment.
[1] Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, Wojciech Zaremba: OpenAI Gym. CoRR abs/1606.01540 (2016)
[2] Justin K. Terry, Benjamin Black, Nathaniel Grammel, Mario Jayakumar, Ananth Hari, Ryan Sullivan, Luis S. Santos, Clemens Dieffendahl, Caroline Horsch, Rodrigo Perez-Vicente, Niall L. Williams, Yashas Lokesh, Praveen Ravi: PettingZoo: Gym for Multi-Agent Reinforcement Learning. NeurIPS 2021: 15032-15043
In the current version of ChatArena, all the actions are represented as plain text. More structured text outputs, like json or code, can be generated by prompting the LLM to do so. We provide simple utilities to extract json and code (with markdown syntax), which should cover common use cases but can break for intentionally crafted edge cases.
A observation is a list of messages with sender and content. Then sender can be any agent in the environment or the built-in moderator of the environment. The content is again plain text.
In ChatArena, agents cannot directly talk to each other but exchange information with a message pool as a proxy. The message pool is a utility abstraction that can serve as a part of the game state.
When an agent takes an action, a message can be created and appended to the message pool. In the message pool, each message will have a receiver, which can be decided by the environment dynamics (game rules) or by the agent itself. The environment itself can also create messages under the name of the moderator which can provide other state information or extra instructions given the current state.
To render an observation, the message pool will collect all the messages that are visible to the agent and return a list of these messages.
In particular, some of the environments require parallel moves, say, rock-paper-scissors, where the agent shouldn’t see the moves of other agents in the same turn. Such a mechanism is also implemented in the message pool. One can specify the “current turn” or the message of the “current turns” and turns after will be ignored.
In ChatArena, each agent will usually be powered by a language backend. These backends can be LLM APIs (say, from OpenAI, Anthropic or Cohere), local LLM or just humans behind a user interface. In backends, we render the observations (list of messages) into the required formats for the downstream models. And the returned text will be the agent’s action by default.