Language Learning Model (LLM) AGI Evaluation Platform

This repository contains the codebase for the LLM-AGI Evaluation Platform. This platform is designed to generate, summarize, and evaluate results from various language learning models such as GPT3.5, GPT4, and Baby AGI. The evaluations are performed by human experts via a web interface.

Getting Started

Prerequisites

Clone the repo:

git clone https://github.com/KatherLab/llm-agent.git && cd llm-agent

Follow the instructions in the setup_env.md file to set up the environment.

Installation

If you are a project maintainer:

Create and checkout to your own dev branch:

git checkout <dev_branchname>

If you are a human expert:

Checkout to the working branch:

git checkout <exp_branchname>

Running the Code

If you are a project maintainer, go through all the steps below.

If you are a human expert, you only need to follow steps 4 and 5.

Put the openai api key in the generator/API_KEY file. Make sure not to push this file remotely. This file has already been added to gitignore.
Run generate.py to generate results from the language learning models:

python generator/generate.py

The generated text files can be found in the generator/results directory.

Generate summarized markdown files:

python generator/summarize.py

Load the summaries into the local database:

python webui/gpt4_summaries_db.py

Run the web application:

python webui/app.py

Then, access the application in your favourite browser by visiting http://127.0.0.1:5000.

Commit and push updates to the database:

git add webui/scores.db
git commit -m "Update database"
git push

Visualizations

Thanks everyone for your contributions!

Results from the human experts will be stored in the visualization directory.

Check analysis_summary.md for visualizations

About the Platform

The web-based evaluation application allows expert users to provide ratings. This platform will expedite the procedure of collating results generated by llm-agi and evaluations from ChatGPT-4 into a comprehensive summary database. The scoring system within this web interface will construct a score database, which will have each summary ID interconnected with the main summary database. This system will enable dynamic updates when human experts submit their scores via the web interface.

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
.idea		.idea
generator		generator
legacy		legacy
llm-agent/lib/python3.8/site-packages		llm-agent/lib/python3.8/site-packages
llm-outputs		llm-outputs
scripts		scripts
visualization		visualization
webui		webui
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup_env.md		setup_env.md
visualization.py		visualization.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Language Learning Model (LLM) AGI Evaluation Platform

Getting Started

Prerequisites

Installation

Running the Code

Visualizations

About the Platform

About

Releases 1

Packages

Contributors 3

Languages

License

KatherLab/llm-agent

Folders and files

Latest commit

History

Repository files navigation

Language Learning Model (LLM) AGI Evaluation Platform

Getting Started

Prerequisites

Installation

Running the Code

Visualizations

About the Platform

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

Packages