GitHub - LLM-Evaluation-s-Always-Fatiguing/leaf-playground: A framework to build scenario simulation projects where human and LLM based agents can participant in, with a user-friendly web UI to visualize simulation, support automatically evaluation on agent action level.

Introduction

leaf-playground is a "definition driven development" framework to build scenario simulation projects that human and LLM-based agents can participant in together to compete to or co-operate with each other. It is primarily designed to efficiently evaluate the performance of LLM-based agents at the action level in specific scenarios or tasks, but it also possesses enormous potential for LLM native applications, such as developing a language-based game.

Apart from the framework itself, a bunch of CLI commands are provided to help developers speedup the process of building a scenario simulation project, and easily deploy a server with a WEB UI where users can create simulation tasks, manually and(or) automatically evaluate agents' performance, visualize the simulation process and evaluation results.

Below are sister projects of leaf-playground:

leaf-playground-webui: the implementation of the leaf-playground's WEB UI.
leaf-playground-hub: hosts our officially implemented scenario simulation projects.

Features

"Definition Driven Development": advanced syntax for structured scenario definitions and programming conventions.
Human + Multiple Agents: facilitates human and AI Agents interaction in designated scenarios.
Auto Evaluation: automated action-level evaluation and report visualization for AI Agents.
Local server support: one-click local service deployment for scenario simulation tasks management and execution.
Containerization: containerization support for running scenario simulation tasks.
Auto generate projects: auto-generate and auto-complete code for scenario simulation projects.
Debug Friendly: support remote debugger across processes in Pycharm professional IDE.

Installation

Environment Setup

Make sure you have Python and Node.js installed on your computer, if not, you can set up the environment by following instructions:

install Python: we recommend to use miniconda to configure Python virtual environment.
install Node.js: you can download and install Node.js from Node.js official site.

Quick Install

leaf-playground has already been upload to pypi, thus you can use pip to quickly install:

pip install leaf-playground

If you want to save data in PostgreSQL instead of SQLite, you need to include the postgresql extra dependency:

pip install leaf-playground[postgresql]

If you are a framework or scenario simulation project developer who want to debug the code, you need to include the debug extra dependency:

pip install leaf-playground[debug]

Install from source

To install leaf-playground from the source, you need to clone the project by using git clone, then in your local leaf-playground directory, run:

pip install .

Usage

Start Server and Create a Task

To start the server that contains projects hosted in leaf-playground-hub, you need to first clone this project, then in the directory of your local leaf-playground-hub, using CLI command to start server with webui:

leaf-out start-server [--port PORT] [--ui_port UI_PORT]

By default, the backend service will run on port 8000, the UI service will run on port 3000, you can use --port and --ui_port options to use different ports respectively.

Below is a video demonstrates how to create and run a task that using MMLU dataset to evaluate LLM-based agents.

video.mp4

Maintainers

@PanQiWei; @Pandazki.

Roadmap

The Framework

The Hub

optimize scene flow of who_is_the_spy project and add metrics and evaluators
create a new project to support using OpenAI evals
create a new project to support using Microsoft promptbench

Name		Name	Last commit message	Last commit date
Latest commit History 496 Commits
.github/workflows		.github/workflows
docs		docs
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Features

Installation

Environment Setup

Quick Install

Install from source

Usage

Start Server and Create a Task

Maintainers

Roadmap

The Framework

The Hub

About

Releases 9

Packages

Contributors 3

Languages

License

LLM-Evaluation-s-Always-Fatiguing/leaf-playground

Folders and files

Latest commit

History

Repository files navigation

Introduction

Features

Installation

Environment Setup

Quick Install

Install from source

Usage

Start Server and Create a Task

Maintainers

Roadmap

The Framework

The Hub

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 9

Packages 0

Contributors 3

Languages

Packages