Skip to content

KatherLab/llm-agent

Repository files navigation

Language Learning Model (LLM) AGI Evaluation Platform

This repository contains the codebase for the LLM-AGI Evaluation Platform. This platform is designed to generate, summarize, and evaluate results from various language learning models such as GPT3.5, GPT4, and Baby AGI. The evaluations are performed by human experts via a web interface.

Getting Started

Prerequisites

Clone the repo:

git clone https://github.com/KatherLab/llm-agent.git && cd llm-agent

Follow the instructions in the setup_env.md file to set up the environment.

Installation

If you are a project maintainer:

Create and checkout to your own dev branch:

git checkout <dev_branchname>

If you are a human expert:

Checkout to the working branch:

git checkout <exp_branchname>

Running the Code

If you are a project maintainer, go through all the steps below.

If you are a human expert, you only need to follow steps 4 and 5.

  1. Put the openai api key in the generator/API_KEY file. Make sure not to push this file remotely. This file has already been added to gitignore.

  2. Run generate.py to generate results from the language learning models:

python generator/generate.py

The generated text files can be found in the generator/results directory.

  1. Generate summarized markdown files:
python generator/summarize.py
  1. Load the summaries into the local database:
python webui/gpt4_summaries_db.py
  1. Run the web application:
python webui/app.py

Then, access the application in your favourite browser by visiting http://127.0.0.1:5000.

  1. Commit and push updates to the database:
git add webui/scores.db
git commit -m "Update database"
git push

Visualizations

Thanks everyone for your contributions!

Results from the human experts will be stored in the visualization directory.

Check analysis_summary.md for visualizations

About the Platform

The web-based evaluation application allows expert users to provide ratings. This platform will expedite the procedure of collating results generated by llm-agi and evaluations from ChatGPT-4 into a comprehensive summary database. The scoring system within this web interface will construct a score database, which will have each summary ID interconnected with the main summary database. This system will enable dynamic updates when human experts submit their scores via the web interface.