Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TempNet-LLaMA2-Chat to AlpacaEval #264

Merged
merged 2 commits into from
Apr 2, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4,832 changes: 4,832 additions & 0 deletions results/TempNet-LLaMA2-Chat-13B-v0.1/model_outputs.json

Large diffs are not rendered by default.

Large diffs are not rendered by default.

4,832 changes: 4,832 additions & 0 deletions results/TempNet-LLaMA2-Chat-70B-v0.1/model_outputs.json

Large diffs are not rendered by default.

Large diffs are not rendered by default.

4,832 changes: 4,832 additions & 0 deletions results/TempNet-LLaMA2-Chat-7B-v0.1/model_outputs.json

Large diffs are not rendered by default.

63,857 changes: 63,857 additions & 0 deletions results/TempNet-LLaMA2-Chat-7B-v0.1/weighted_alpaca_eval_gpt4_turbo/annotations.json

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,10 @@
gpt4_1106_preview_verbose,64.30360147101865,1.3348590089025316,525,268,12,805,65.96273291925466,dev,2402,51.57500797967598
gpt4_1106_preview,50.0,0.0,0,0,805,805,50.0,minimal,2049,50.0
gpt4_1106_preview_concise,22.92019444047205,1.232517714329424,172,622,11,805,22.049689440993788,dev,1136,41.896601591245386
aligner-2b_claude-3-opus-20240229,34.46337362321739,1.314666526302454,225,475,105,805,34.47204968944099,community,1669,41.823071715247664
claude-3-opus-20240229,29.04176413403727,1.3942602231385623,223,580,2,805,27.82608695652174,minimal,1388,40.39177606350116
gpt4,23.576789314782605,1.275704201206918,179,618,8,805,22.732919254658384,minimal,1365,38.12808974440021
aligner-2b_qwen1.5-72b-chat,31.773037737123104,1.2392772646245978,180,473,152,805,31.801242236024844,community,1812,36.725868878524274
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like you didn't merge to the last main branch? those models should be in the lb!

Qwen1.5-72B-Chat,26.49828339562733,1.304236164893057,201,600,4,805,25.217391304347824,community,1549,36.571754111987296
gpt4_0314,22.073258928708075,1.2466725494608204,172,627,6,805,21.73913043478261,verified,1371,35.30706121640206
Ein-70B-v0.1,24.84472049689441,1.521406431103307,199,604,2,805,24.84472049689441,community,1467,35.029054008520646
claude-3-sonnet-20240229,25.556325292273296,1.3419811051815638,193,608,4,805,24.22360248447205,verified,1420,34.87247436243302
gpt4_0613_verbose,23.237360043453418,1.283539505582624,171,630,4,805,21.490683229813666,dev,1473,33.82126688658535
mistral-large-2402,21.43877598137888,1.2485232545097724,166,638,1,805,20.6832298136646,minimal,1362,32.65207998531868
Expand Down Expand Up @@ -57,6 +54,7 @@ humpback-llama2-70b,10.121771502645965,0.9401806122130112,77,727,1,805,9.6273291
OpenHermes-2.5-Mistral-7B,10.340415705751552,0.935655389929366,75,727,3,805,9.503105590062113,verified,1107,16.248577696674843
deita-7b-v1.0,12.646639472385097,1.0352555320811423,96,708,1,805,11.987577639751551,community,1417,16.05901353966741
jina-chat,7.786130393366459,0.8398450575524877,59,743,3,805,7.515527950310559,community,676,15.866004049505932
TempNet-LLaMA2-Chat-70B-v0.1,15.051894420220444,1.08015075807378,111,691,2,804,13.930348258706468,minimal,1830,15.831162778430024
gpt-3.5-turbo-1106_concise,7.41586497762733,0.8374438113826953,57,744,4,805,7.329192546583851,dev,431,15.769520983894386
causallm-14b,11.146160869950313,0.9544127300795228,81,720,4,805,10.31055900621118,community,1391,15.72032518895564
pairrm-zephyr-7b-beta,12.84127825562733,1.0535874941903722,98,706,1,805,12.236024844720497,community,1487,15.529867294986612
Expand Down Expand Up @@ -101,6 +99,7 @@ ultralm-13b-v2.0,7.504622955739131,0.8150376948236479,51,754,0,805,6.33540372670
text_davinci_001,2.764005231108344,0.5177668863975088,23,777,3,803,3.051058530510585,verified,296,9.025728852143091
openbuddy-falcon-40b-v9,5.955742846322981,0.7388621614393269,45,758,2,805,5.714285714285714,community,1089,8.988936477935635
openchat-13b,8.022386010881988,0.8368334957442762,58,746,1,805,7.267080745341616,community,1632,8.806053491170802
TempNet-LLaMA2-Chat-13B-v0.1,7.728405066035775,0.8268032187601844,56,749,0,805,6.956521739130435,community,1540,8.57835531105755
llama-2-13b-chat-hf,7.702309957875775,0.8286143393809762,60,744,1,805,7.515527950310559,verified,1513,8.436014548885215
guanaco-65b,6.858494513378882,0.8048449272409411,54,751,0,805,6.70807453416149,verified,1249,8.252916991586922
opencoderplus-15b,7.40622245099379,0.8024858020878345,52,750,3,805,6.645962732919254,community,1628,8.152410155715494
Expand All @@ -121,6 +120,7 @@ alpaca-farm-ppo-human,4.100426814981367,0.6304721406855217,32,770,3,805,4.161490
vicuna-7b,4.16261116226087,0.6135107768217068,28,775,2,805,3.602484472049689,verified,1044,6.277217738516609
alpaca-7b,2.591450540223603,0.4870855382635108,17,785,3,805,2.298136645962733,minimal,396,5.875487163278986
phi-2-sft,3.977567775217392,0.6098271417287373,28,777,0,805,3.4782608695652173,verified,1068,5.853787690603355
TempNet-LLaMA2-Chat-7B-v0.1,5.430143264670806,0.7210775889233014,39,765,1,805,4.906832298136646,minimal,1512,5.739613836715224
minichat-3b,3.0071507063602487,0.504124596172496,22,779,4,805,2.981366459627329,community,868,5.729332875896306
guanaco-33b,5.002493724956522,0.6697115752218856,37,768,0,805,4.596273291925466,verified,1311,5.690019090866207
falcon-40b-instruct,3.3429188224720505,0.5541127159067186,27,777,1,805,3.4161490683229814,verified,662,5.6075325447394455
Expand All @@ -136,4 +136,4 @@ falcon-7b-instruct,2.146617553167702,0.454225792894195,16,787,2,805,2.1118012422
oasst-sft-pythia-12b,1.790114083180124,0.3985580883049341,13,790,2,805,1.7391304347826086,verified,726,3.270102114456748
guanaco-13b,3.469596859739131,0.5518606725700214,22,780,3,805,2.919254658385093,verified,1774,3.003787329611614
guanaco-7b,2.880002266173913,0.5202924149314048,21,783,1,805,2.670807453416149,verified,1364,2.871116813131697
baichuan-13b-chat,1.9921455615279502,0.4176985079331233,14,790,1,805,1.8012422360248446,community,1727,2.062170253598568
baichuan-13b-chat,1.9921455615279504,0.4176985079331233,14,790,1,805,1.8012422360248446,community,1727,2.062170253598568
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
TempNet-LLaMA2-Chat-13B-v0.1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These files should be called configs.yaml not config.yaml

prompt_template: "TempNet-LLaMA2-Chat-7B-v0.1/prompt.txt"
fn_completions: null
completions_kwargs:
model_name: "LLM-Opt/TempNet-LLaMA2-Chat-13B-v0.1"
model_kwargs:
torch_dtype: "float16"
max_new_tokens: 2048
temperature: 1.0
top_p: 1.0
do_sample: True
pretty_name: "TempNet-LLaMA2-Chat-13B-v0.1"
link: "https://github.com/zhqiu/TempNet"
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
TempNet-LLaMA2-Chat-70B-v0.1:
prompt_template: "TempNet-LLaMA2-Chat-70B-v0.1/prompt.txt"
fn_completions: null
completions_kwargs:
model_name: "LLM-Opt/TempNet-LLaMA2-Chat-70B-v0.1"
model_kwargs:
torch_dtype: "float16"
max_new_tokens: 4096
temperature: 1.0
top_p: 1.0
do_sample: True
pretty_name: "TempNet-LLaMA2-Chat-70B-v0.1"
link: "https://github.com/zhqiu/TempNet"
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
[INST] {instruction} [/INST]
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
TempNet-LLaMA2-Chat-7B-v0.1:
prompt_template: "TempNet-LLaMA2-Chat-7B-v0.1/prompt.txt"
fn_completions: null
completions_kwargs:
model_name: "LLM-Opt/TempNet-LLaMA2-Chat-7B-v0.1"
model_kwargs:
torch_dtype: "float16"
max_new_tokens: 2048
temperature: 1.0
top_p: 1.0
do_sample: True
pretty_name: "TempNet-LLaMA2-Chat-7B-v0.1"
link: "https://github.com/zhqiu/TempNet"
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

{instruction} [/INST]
Loading