Skip to content

A framework to build scenario simulation projects where human and LLM based agents can participant in, with a user-friendly web UI to visualize simulation, support automatically evaluation on agent action level.

License

Notifications You must be signed in to change notification settings

LLM-Evaluation-s-Always-Fatiguing/leaf-playground

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

leaf-playground is a "definition driven development" framework to build scenario simulation projects that human and LLM-based agents can participant in together to compete to or co-operate with each other. It is primarily designed to efficiently evaluate the performance of LLM-based agents at the action level in specific scenarios or tasks, but it also possesses enormous potential for LLM native applications, such as developing a language-based game.

Apart from the framework itself, a bunch of CLI commands are provided to help developers speedup the process of building a scenario simulation project, and easily deploy a server with a WEB UI where users can create simulation tasks, manually and(or) automatically evaluate agents' performance, visualize the simulation process and evaluation results.

Below are sister projects of leaf-playground:

Features

  • "Definition Driven Development": advanced syntax for structured scenario definitions and programming conventions.
  • Human + Multiple Agents: facilitates human and AI Agents interaction in designated scenarios.
  • Auto Evaluation: automated action-level evaluation and report visualization for AI Agents.
  • Local server support: one-click local service deployment for scenario simulation tasks management and execution.
  • Containerization: containerization support for running scenario simulation tasks.
  • Auto generate projects: auto-generate and auto-complete code for scenario simulation projects.
  • Debug Friendly: support remote debugger across processes in Pycharm professional IDE.

Installation

Environment Setup

Make sure you have Python and Node.js installed on your computer, if not, you can set up the environment by following instructions:

  • install Python: we recommend to use miniconda to configure Python virtual environment.
  • install Node.js: you can download and install Node.js from Node.js official site.

Quick Install

leaf-playground has already been upload to pypi, thus you can use pip to quickly install:

pip install leaf-playground

Static Badge If you want to save data in PostgreSQL instead of SQLite, you need to include the postgresql extra dependency:

pip install leaf-playground[postgresql]

Static Badge If you are a framework or scenario simulation project developer who want to debug the code, you need to include the debug extra dependency:

pip install leaf-playground[debug]

Install from source

To install leaf-playground from the source, you need to clone the project by using git clone, then in your local leaf-playground directory, run:

pip install .

Usage

Start Server and Create a Task

To start the server that contains projects hosted in leaf-playground-hub, you need to first clone this project, then in the directory of your local leaf-playground-hub, using CLI command to start server with webui:

leaf-out start-server [--port PORT] [--ui_port UI_PORT]

By default, the backend service will run on port 8000, the UI service will run on port 3000, you can use --port and --ui_port options to use different ports respectively.

Below is a video demonstrates how to create and run a task that using MMLU dataset to evaluate LLM-based agents.

video.mp4

Maintainers

@PanQiWei; @Pandazki.

Roadmap

The Framework

  • support human participant in the scenario simulation as a dynamic agent
  • running each scenario simulation task in a docker container
  • support manage task status(pause, restart, interrupt, etc.)
  • support full task data persistence
    • save task info, logs and message in database
    • save task results in database or remote file system
    • support for resuming runtime state and information from checkpoint and continuing execution
  • support complete project automatically
    • complete scene definition automatically
    • complete agents automatically
      • complete agent base classes automatically
      • complete specific agent class automatically
    • complete evaluator automatically
    • complete scene automatically
  • refactor ai_backend to llm_backend_tools to remove some heavy dependencies
  • support streaming agents' responses

The Hub

  • optimize scene flow of who_is_the_spy project and add metrics and evaluators
  • create a new project to support using OpenAI evals
  • create a new project to support using Microsoft promptbench

About

A framework to build scenario simulation projects where human and LLM based agents can participant in, with a user-friendly web UI to visualize simulation, support automatically evaluation on agent action level.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published