OpenArena: Competitive Dataset Generation

Overview

OpenArena is a Python project designed to create high-quality datasets by pitting Language Models (LLMs) against each other in a competitive environment. This tool uses an ELO-based rating system to rank different LLMs based on their performance on various prompts, judged by another LLM acting as an impartial evaluator.

Features

Asynchronous battles between multiple LLMs
ELO rating system for model performance tracking
Detailed evaluation and scoring by a judge model
Generation of training data based on model performances
Support for Hugging Face datasets
YAML configuration for easy setup and customization
Support for custom endpoints and API keys

Requirements

Python 3.7+
Ollama (for local model execution)

Installation

Clone this repository:

git clone https://github.com/syv-ai/OpenArena.git
cd OpenArena

Install the required packages:
```
pip install -r requirements.txt
```
Ensure you have access to the Ollama API endpoint (default: http://localhost:11434/v1/chat/completions) or other custom endpoints as specified in your configuration.

Usage

Create or modify the arena_config.yaml file to set up your desired models, datasets, and endpoints:

default_endpoint:
  url: "http://localhost:11434/v1/chat/completions"
  # api_key: "your_default_api_key"  # Uncomment and set if needed

judge_model:
  name: "JudgeModel"
  model_id: "llama3"
  # endpoint:
  #   url: "http://custom-judge-endpoint.com/api/chat"
  #   api_key: "judge_model_api_key"

models:
  - name: "Open hermes"
    model_id: "openhermes"
    # endpoint:
    #   url: "http://custom-phi3-endpoint.com/api/chat"
    #   api_key: "phi3_api_key"
  - name: "Mistral v0.3"
    model_id: "mistral"
  - name: "Phi 3 medium"
    model_id: "phi3:14b"

datasets:
  - name: "skunkworksAI/reasoning-0.01"
    description: "Reasoning dataset"
    split: "train"
    field: "instruction"
    limit: 10

Run the script:
```
python llm_arena.py
```
The script will output:
- Intermediate ELO rankings after each prompt
- ELO rating progression for each model
- Generated training data
- Final ELO ratings
- Total execution time

How it Works

Configuration Loading: The script loads the configuration from the arena_config.yaml file.
Dataset Loading: Prompts are loaded from the specified Hugging Face dataset(s).
Response Generation: Each model generates responses for all given prompts.
Evaluation: The judge model evaluates pairs of responses for each prompt, providing scores and explanations.
ELO Updates: ELO ratings are updated based on the scores from each battle.
Training Data Generation: The results are compiled into a structured format for potential use in training or fine-tuning other models.

Customization

Modify the arena_config.yaml file to add or remove models, change the judge model, adjust dataset parameters, or specify custom endpoints and API keys.
Adjust the ELO K-factor in the update_elo_ratings() method to change the volatility of the ratings.

Future work

Automatically download Ollama models
Added YAML configuration
OpenAI support - supports any OpenAI compatible endpoint (also vLLM)
Hugging Face datasets integration
Auto upload to Hugging Face
Database to keep track of which prompts have been done if it should fail

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
LICENSE		LICENSE
README.md		README.md
arena_config.yaml		arena_config.yaml
llm_arena.py		llm_arena.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenArena: Competitive Dataset Generation

Overview

Features

Requirements

Installation

Usage

How it Works

Customization

Future work

Contributing

About

Releases

Packages

Languages

License

syv-ai/OpenArena

Folders and files

Latest commit

History

Repository files navigation

OpenArena: Competitive Dataset Generation

Overview

Features

Requirements

Installation

Usage

How it Works

Customization

Future work

Contributing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages