Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MoA/README.md at main · togethercomputer/MoA #884

Open
1 task
ShellLM opened this issue Aug 16, 2024 · 1 comment
Open
1 task

MoA/README.md at main · togethercomputer/MoA #884

ShellLM opened this issue Aug 16, 2024 · 1 comment
Labels
AI-Agents Autonomous AI agents using LLMs AI-Chatbots Topics related to advanced chatbot platforms integrating multiple AI models ai-leaderboards leaderdoards for llm's and other ml models ai-platform model hosts and APIs Git-Repo Source code repository like gitlab or gh github gh tools like cli, Actions, Issues, Pages llm Large Language Models llm-applications Topics related to practical applications of Large Language Models in various fields llm-benchmarks testing and benchmarking large language models llm-evaluation Evaluating Large Language Models performance and behavior through human-written evaluation sets llm-experiments experiments with large language models Papers Research papers

Comments

@ShellLM
Copy link
Collaborator

ShellLM commented Aug 16, 2024

Mixture-of-Agents (MoA)

License
arXiv
Discord
Twitter

MoA architecture

Overview · Quickstart · Advanced example · Interactive CLI Demo · Evaluation · Results . Credits

Overview

Mixture of Agents (MoA) is a novel approach that leverages the collective strengths of multiple LLMs to enhance performance, achieving state-of-the-art results. By employing a layered architecture where each layer comprises several LLM agents, MoA significantly outperforms GPT-4 Omni's 57.5% on AlpacaEval 2.0 with a score of 65.1%, using only open-source models!

Quickstart: MoA in 50 LOC

To get to get started with using MoA in your own apps, see moa.py. In this simple example, we'll use 2 layers and 4 LLMs. You'll need to:

  1. Install the Together Python library: pip install together
  2. Get your Together API Key & export it: export TOGETHER_API_KEY=
  3. Run the python file: python moa.py

MoA explained

Multi-layer MoA Example

In the previous example, we went over how to implement MoA with 2 layers (4 LLMs answering and one LLM aggregating). However, one strength of MoA is being able to go through several layers to get an even better response. In this example, we'll go through how to run MoA with 3+ layers in advanced-moa.py.

python advanced-moa.py

MoA – 3 layer example

Interactive CLI Demo

This interactive CLI demo showcases a simple multi-turn chatbot where the final response is aggregated from various reference models.

To run the interactive demo, follow these 3 steps:

  1. Export Your API Key: export TOGETHER_API_KEY={your_key}
  2. Install Requirements: pip install -r requirements.txt
  3. Run the script: python bot.py

The CLI will prompt you to input instructions interactively:

  1. Start by entering your instruction at the ">>>" prompt.
  2. The system will process your input using the predefined reference models.
  3. It will generate a response based on the aggregated outputs from these models.
  4. You can continue the conversation by inputting more instructions, with the system maintaining the context of the multi-turn interaction.

[Optional] Additional Configuration

The demo will ask you to specify certain options but if you want to do additional configuration, you can specify these parameters:

  • --aggregator: The primary model used for final response generation.
  • --reference_models: List of models used as references.
  • --temperature: Controls the randomness of the response generation.
  • --max_tokens: Maximum number of tokens in the response.
  • --rounds: Number of rounds to process the input for refinement. (num rounds == num of MoA layers - 1)
  • --num_proc: Number of processes to run in parallel for faster execution.
  • --multi_turn: Boolean to toggle multi-turn interaction capability.

Evaluation

We provide scripts to quickly reproduce some of the results presented in our paper
For convenience, we have included the code from AlpacaEval,
MT-Bench, and FLASK, with necessary modifications.
We extend our gratitude to these projects for creating the benchmarks.

Preparation

# install requirements
pip install -r requirements.txt
cd alpaca_eval
pip install -e .
cd FastChat
pip install -e ".[model_worker,llm_judge]"
cd ..

# setup api keys
export TOGETHER_API_KEY=<TOGETHER_API_KEY>
export OPENAI_API_KEY=<OPENAI_API_KEY>

Run AlpacaEval 2

To run AlpacaEval 2, execute the following scripts:

bash run_eval_alpaca_eval.sh

Run MT-Bench

For a minimal example of MT-Bench evaluation, run:

bash run_eval_mt_bench.sh

Run FLASK

For a minimal example of FLASK evaluation, run:

bash run_eval_flask.sh

Results

alpaca_mtbench

We achieved top positions on both the AlpacaEval 2.0 leaderboard and MT-Bench. Notably, on AlpacaEval 2.0, using solely open-source models, we achieved a margin of 7.6% absolute improvement from 57.5% (GPT-4 Omni) to 65.1% (MoA).

flask

FLASK offers fine-grained evaluation of models across multiple dimensions. Our MoA method significantly outperforms the original Qwen1.5-110B-Chat on harmlessness, robustness, correctness, efficiency, factuality, commonsense, insightfulness, completeness. Additionally, MoA also outperforms GPT-4 Omni in terms of correctness, factuality, insightfulness, completeness, and metacognition.

Please feel free to contact us if you have difficulties in reproducing the results.

Credits

Notably, this work was made possible by the collaborative spirit and contributions of active organizations in the AI field. We appreciate the efforts of Meta AI, Mistral AI, Microsoft, Alibaba Cloud, and DataBricks for developing the Llama 3, Mixtral, WizardLM 2, Qwen 1.5, and DBRX models. Additionally, we extend our gratitude to Tatsu Labs, LMSYS, and KAIST AI for developing the AlpacaEval, MT-Bench, and FLASK evaluation benchmarks.

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

Citation

If you find this work helpful, please consider citing:

@article{wang2024mixture,
  title={Mixture-of-Agents Enhances Large Language Model Capabilities},
  author={Wang, Junlin and Wang, Jue and Athiwaratkun, Ben and Zhang, Ce and Zou, James},
  journal={arXiv preprint arXiv:2406.04692},
  year={2024}
}

Suggested labels

None

@ShellLM ShellLM added AI-Agents Autonomous AI agents using LLMs AI-Chatbots Topics related to advanced chatbot platforms integrating multiple AI models ai-leaderboards leaderdoards for llm's and other ml models ai-platform model hosts and APIs Git-Repo Source code repository like gitlab or gh github gh tools like cli, Actions, Issues, Pages llm Large Language Models llm-applications Topics related to practical applications of Large Language Models in various fields llm-benchmarks testing and benchmarking large language models llm-evaluation Evaluating Large Language Models performance and behavior through human-written evaluation sets llm-experiments experiments with large language models Papers Research papers labels Aug 16, 2024
@ShellLM
Copy link
Collaborator Author

ShellLM commented Aug 16, 2024

Related content

#681 similarity score: 0.89
#682 similarity score: 0.88
#722 similarity score: 0.88
#638 similarity score: 0.87
#418 similarity score: 0.87
#396 similarity score: 0.87

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AI-Agents Autonomous AI agents using LLMs AI-Chatbots Topics related to advanced chatbot platforms integrating multiple AI models ai-leaderboards leaderdoards for llm's and other ml models ai-platform model hosts and APIs Git-Repo Source code repository like gitlab or gh github gh tools like cli, Actions, Issues, Pages llm Large Language Models llm-applications Topics related to practical applications of Large Language Models in various fields llm-benchmarks testing and benchmarking large language models llm-evaluation Evaluating Large Language Models performance and behavior through human-written evaluation sets llm-experiments experiments with large language models Papers Research papers
Projects
None yet
Development

No branches or pull requests

1 participant