Skip to content

Commit

Permalink
Automated leaderboard update
Browse files Browse the repository at this point in the history
  • Loading branch information
actions-user committed Apr 2, 2024
1 parent 13f6bdc commit de30d6b
Showing 1 changed file with 3 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ Humpback LLaMa2 70B,16.249164231428974,10.121771502645965,1107,https://arxiv.org
OpenHermes-2.5-Mistral (7B),16.248577696674843,10.340415705751552,1107,https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/OpenHermes-2.5-Mistral-7B/model_outputs.json,verified
DEITA 7B v1.0,16.05901353966741,12.646639472385097,1417,https://github.com/hkust-nlp/deita,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/deita-7b-v1.0/model_outputs.json,community
JinaChat,15.866004049505932,7.786130393366459,676,,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/jina-chat/model_outputs.json,community
TempNet-LLaMA2-Chat-70B-v0.1,15.831162778430024,15.051894420220444,1830,https://github.com/zhqiu/TempNet,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/TempNet-LLaMA2-Chat-70B-v0.1/model_outputs.json,minimal
CausalLM-14B,15.72032518895564,11.146160869950313,1391,https://huggingface.co/CausalLM/14B,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/causallm-14b/model_outputs.json,community
PairRM 0.4B+Zephyr 7B Beta (best-of-16),15.529867294986612,12.84127825562733,1487,https://huggingface.co/llm-blender/PairRM,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/pairrm-zephyr-7b-beta/model_outputs.json,community
Mistral-ORPO-Beta,14.716749430705242,12.565408794559003,1636,https://huggingface.co/kaist-ai/mistral-orpo-beta,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/mistral-orpo-beta/model_outputs.json,community
Expand Down Expand Up @@ -91,6 +92,7 @@ UltraLM 13B V2.0,9.129018444208118,7.504622955739131,1399,https://github.com/thu
Davinci001,9.025728852143091,2.764005231108344,296,,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/text_davinci_001/model_outputs.json,verified
OpenBuddy-Falcon-40B-v9,8.988936477935635,5.955742846322981,1089,https://huggingface.co/OpenBuddy/openbuddy-falcon-40b-v9-bf16,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/openbuddy-falcon-40b-v9/model_outputs.json,community
OpenChat-13B,8.806053491170802,8.022386010881988,1632,https://github.com/imoneoi/openchat,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/openchat-13b/model_outputs.json,community
TempNet-LLaMA2-Chat-13B-v0.1,8.57835531105755,7.728405066035775,1540,https://github.com/zhqiu/TempNet,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/TempNet-LLaMA2-Chat-13B-v0.1/model_outputs.json,community
LLaMA2 Chat 13B,8.436014548885215,7.702309957875775,1513,https://ai.meta.com/llama/,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/llama-2-13b-chat-hf/model_outputs.json,verified
Guanaco 65B,8.252916991586922,6.858494513378882,1249,https://huggingface.co/timdettmers/guanaco-65b,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/guanaco-65b/model_outputs.json,verified
OpenCoderPlus-15B,8.152410155715494,7.40622245099379,1628,https://github.com/imoneoi/openchat,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/opencoderplus-15b/model_outputs.json,community
Expand All @@ -110,6 +112,7 @@ Alpaca Farm PPO Human 7B,6.418603294911531,4.100426814981367,803,https://hugging
Vicuna 7B,6.277217738516609,4.16261116226087,1044,https://huggingface.co/lmsys/vicuna-7b-delta-v1.1,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/vicuna-7b/model_outputs.json,verified
Alpaca 7B,5.875487163278986,2.591450540223603,396,https://huggingface.co/tatsu-lab/alpaca-7b-wdiff,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/alpaca-7b/model_outputs.json,minimal
Phi-2 SFT,5.853787690603355,3.977567775217392,1068,https://huggingface.co/lxuechen/phi-2-sft,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/phi-2-sft/model_outputs.json,verified
TempNet-LLaMA2-Chat-7B-v0.1,5.739613836715224,5.430143264670806,1512,https://github.com/zhqiu/TempNet,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/TempNet-LLaMA2-Chat-7B-v0.1/model_outputs.json,minimal
MiniChat 3B,5.729332875896306,3.0071507063602487,868,https://huggingface.co/GeneZC/MiniChat-3B,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/minichat-3b/model_outputs.json,community
Guanaco 33B,5.690019090866207,5.002493724956522,1311,https://huggingface.co/timdettmers/guanaco-33b,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/guanaco-33b/model_outputs.json,verified
Falcon 40B Instruct,5.6075325447394455,3.3429188224720505,662,https://huggingface.co/tiiuae/falcon-40b-instruct,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/falcon-40b-instruct/model_outputs.json,verified
Expand Down

0 comments on commit de30d6b

Please sign in to comment.