Skip to content

Commit

Permalink
run benchmark processing
Browse files Browse the repository at this point in the history
  • Loading branch information
slobentanzer committed May 15, 2024
1 parent 24ef861 commit 97bfe80
Show file tree
Hide file tree
Showing 43 changed files with 565 additions and 167 deletions.
8 changes: 4 additions & 4 deletions benchmark/results/processed/correlations.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Size vs accuracy Pearson correlation: 0.1619804275417575
Size vs accuracy Pearson correlation p-value: 0.000134079336616155
Quantisation vs accuracy Pearson correlation: 0.19280308291393375
Quantisation vs accuracy Pearson correlation p-value: 5.156919811050803e-06
Size vs accuracy Pearson correlation: 0.1542083231219649
Size vs accuracy Pearson correlation p-value: 0.00014935097092919373
Quantisation vs accuracy Pearson correlation: 0.1926652752040581
Quantisation vs accuracy Pearson correlation p-value: 1.9934596522710114e-06
10 changes: 6 additions & 4 deletions benchmark/results/processed/end_to_end_query_generation.csv
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,20 @@ gpt-3.5-turbo-0125,27.8,30.0,0.9266666666666666,5
gpt-4-0613,26.4,30.0,0.88,5
gpt-3.5-turbo-0613,25.0,30.0,0.8333333333333334,5
chatglm3:6:ggmlv3:q4_0,0.0,30.0,0.0,5
mistral-instruct-v0.2:7:ggufv2:Q4_K_M,0.0,30.0,0.0,5
llama-2-chat:70:ggufv2:Q5_K_M,0.0,30.0,0.0,5
llama-2-chat:7:ggufv2:Q3_K_M,0.0,30.0,0.0,5
llama-2-chat:7:ggufv2:Q4_K_M,0.0,30.0,0.0,5
llama-2-chat:7:ggufv2:Q5_K_M,0.0,30.0,0.0,5
llama-2-chat:7:ggufv2:Q6_K,0.0,30.0,0.0,5
llama-2-chat:7:ggufv2:Q8_0,0.0,30.0,0.0,5
llama-3-instruct:8:ggufv2:Q4_K_M,0.0,30.0,0.0,5
llama-3-instruct:8:ggufv2:Q5_K_M,0.0,30.0,0.0,5
llama-3-instruct:8:ggufv2:Q6_K,0.0,30.0,0.0,5
llama-3-instruct:8:ggufv2:Q8_0,0.0,30.0,0.0,5
mistral-instruct-v0.2:7:ggufv2:Q2_K,0.0,30.0,0.0,5
mistral-instruct-v0.2:7:ggufv2:Q3_K_M,0.0,30.0,0.0,5
mistral-instruct-v0.2:7:ggufv2:Q4_K_M,0.0,30.0,0.0,5
mistral-instruct-v0.2:7:ggufv2:Q5_K_M,0.0,30.0,0.0,5
llama-2-chat:70:ggufv2:Q5_K_M,0.0,30.0,0.0,5
mistral-instruct-v0.2:7:ggufv2:Q6_K,0.0,30.0,0.0,5
mistral-instruct-v0.2:7:ggufv2:Q8_0,0.0,30.0,0.0,5
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K,0.0,30.0,0.0,5
Expand All @@ -32,7 +33,7 @@ openhermes-2.5:7:ggufv2:Q6_K,0.0,30.0,0.0,5
llama-2-chat:7:ggufv2:Q2_K,0.0,30.0,0.0,5
llama-2-chat:70:ggufv2:Q3_K_M,0.0,30.0,0.0,5
llama-2-chat:70:ggufv2:Q4_K_M,0.0,30.0,0.0,5
code-llama-instruct:13:ggufv2:Q2_K,0.0,30.0,0.0,5
code-llama-instruct:7:ggufv2:Q3_K_M,0.0,30.0,0.0,5
code-llama-instruct:13:ggufv2:Q3_K_M,0.0,30.0,0.0,5
code-llama-instruct:13:ggufv2:Q4_K_M,0.0,30.0,0.0,5
code-llama-instruct:13:ggufv2:Q5_K_M,0.0,30.0,0.0,5
Expand All @@ -45,12 +46,13 @@ code-llama-instruct:34:ggufv2:Q5_K_M,0.0,30.0,0.0,5
code-llama-instruct:34:ggufv2:Q6_K,0.0,30.0,0.0,5
code-llama-instruct:34:ggufv2:Q8_0,0.0,30.0,0.0,5
code-llama-instruct:7:ggufv2:Q2_K,0.0,30.0,0.0,5
code-llama-instruct:7:ggufv2:Q3_K_M,0.0,30.0,0.0,5
code-llama-instruct:7:ggufv2:Q4_K_M,0.0,30.0,0.0,5
code-llama-instruct:13:ggufv2:Q2_K,0.0,30.0,0.0,5
code-llama-instruct:7:ggufv2:Q5_K_M,0.0,30.0,0.0,5
code-llama-instruct:7:ggufv2:Q6_K,0.0,30.0,0.0,5
code-llama-instruct:7:ggufv2:Q8_0,0.0,30.0,0.0,5
gpt-4-0125-preview,0.0,30.0,0.0,5
gpt-4o-2024-05-13,0.0,30.0,0.0,5
llama-2-chat:13:ggufv2:Q2_K,0.0,30.0,0.0,5
llama-2-chat:13:ggufv2:Q3_K_M,0.0,30.0,0.0,5
llama-2-chat:13:ggufv2:Q4_K_M,0.0,30.0,0.0,5
Expand Down
34 changes: 18 additions & 16 deletions benchmark/results/processed/entity_selection.csv
Original file line number Diff line number Diff line change
Expand Up @@ -2,60 +2,62 @@ Full model name,Score achieved,Score possible,Accuracy,Iterations
gpt-3.5-turbo-0125,8.0,8.0,1.0,5
openhermes-2.5:7:ggufv2:Q6_K,8.0,8.0,1.0,5
openhermes-2.5:7:ggufv2:Q3_K_M,9.0,9.0,1.0,5
gpt-4o-2024-05-13,8.0,8.0,1.0,5
openhermes-2.5:7:ggufv2:Q8_0,8.0,9.0,0.8888888888888888,5
openhermes-2.5:7:ggufv2:Q5_K_M,8.0,9.0,0.8888888888888888,5
openhermes-2.5:7:ggufv2:Q4_K_M,8.0,9.0,0.8888888888888888,5
gpt-4-0613,8.0,9.0,0.8888888888888888,5
gpt-3.5-turbo-0613,8.0,9.0,0.8888888888888888,5
llama-3-instruct:8:ggufv2:Q8_0,7.0,8.0,0.875,5
llama-3-instruct:8:ggufv2:Q6_K,7.0,8.0,0.875,5
llama-3-instruct:8:ggufv2:Q5_K_M,7.0,8.0,0.875,5
llama-3-instruct:8:ggufv2:Q4_K_M,7.0,8.0,0.875,5
gpt-4-0125-preview,7.0,9.0,0.7777777777777778,5
chatglm3:6:ggmlv3:q4_0,6.0,8.0,0.75,5
openhermes-2.5:7:ggufv2:Q2_K,5.0,9.0,0.5555555555555556,5
mistral-instruct-v0.2:7:ggufv2:Q6_K,4.0,8.0,0.5,5
code-llama-instruct:7:ggufv2:Q3_K_M,4.0,8.0,0.5,5
mistral-instruct-v0.2:7:ggufv2:Q6_K,4.0,8.0,0.5,5
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K,3.8,8.0,0.475,5
code-llama-instruct:13:ggufv2:Q3_K_M,3.6,8.0,0.45,5
llama-2-chat:70:ggufv2:Q5_K_M,4.0,9.0,0.4444444444444444,5
llama-2-chat:7:ggufv2:Q4_K_M,4.0,9.0,0.4444444444444444,5
llama-2-chat:7:ggufv2:Q8_0,4.0,9.0,0.4444444444444444,5
mistral-instruct-v0.2:7:ggufv2:Q5_K_M,4.0,9.0,0.4444444444444444,5
llama-2-chat:7:ggufv2:Q5_K_M,4.0,9.0,0.4444444444444444,5
llama-2-chat:70:ggufv2:Q4_K_M,4.0,9.0,0.4444444444444444,5
llama-2-chat:7:ggufv2:Q4_K_M,4.0,9.0,0.4444444444444444,5
llama-2-chat:7:ggufv2:Q5_K_M,4.0,9.0,0.4444444444444444,5
mistral-instruct-v0.2:7:ggufv2:Q5_K_M,4.0,9.0,0.4444444444444444,5
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M,3.8,9.0,0.4222222222222222,5
llama-2-chat:7:ggufv2:Q6_K,3.0,8.0,0.375,5
mistral-instruct-v0.2:7:ggufv2:Q8_0,3.0,9.0,0.3333333333333333,5
mistral-instruct-v0.2:7:ggufv2:Q4_K_M,3.0,9.0,0.3333333333333333,5
mistral-instruct-v0.2:7:ggufv2:Q3_K_M,3.0,9.0,0.3333333333333333,5
mistral-instruct-v0.2:7:ggufv2:Q4_K_M,3.0,9.0,0.3333333333333333,5
llama-2-chat:7:ggufv2:Q3_K_M,3.0,9.0,0.3333333333333333,5
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M,3.0,9.0,0.3333333333333333,5
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M,3.0,9.0,0.3333333333333333,5
llama-2-chat:7:ggufv2:Q3_K_M,3.0,9.0,0.3333333333333333,5
llama-2-chat:70:ggufv2:Q3_K_M,3.0,9.0,0.3333333333333333,5
mistral-instruct-v0.2:7:ggufv2:Q8_0,3.0,9.0,0.3333333333333333,5
code-llama-instruct:7:ggufv2:Q4_K_M,3.0,9.0,0.3333333333333333,5
llama-2-chat:70:ggufv2:Q3_K_M,3.0,9.0,0.3333333333333333,5
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0,2.8,9.0,0.3111111111111111,5
code-llama-instruct:7:ggufv2:Q2_K,2.0,8.0,0.25,5
code-llama-instruct:34:ggufv2:Q8_0,2.0,8.0,0.25,5
mistral-instruct-v0.2:7:ggufv2:Q2_K,2.0,9.0,0.2222222222222222,5
code-llama-instruct:34:ggufv2:Q5_K_M,1.0,8.0,0.125,5
code-llama-instruct:34:ggufv2:Q6_K,1.0,8.0,0.125,5
code-llama-instruct:34:ggufv2:Q5_K_M,1.0,8.0,0.125,5
code-llama-instruct:7:ggufv2:Q5_K_M,1.0,9.0,0.1111111111111111,5
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K,0.0,9.0,0.0,5
code-llama-instruct:7:ggufv2:Q6_K,0.0,8.0,0.0,5
code-llama-instruct:13:ggufv2:Q4_K_M,0.0,8.0,0.0,5
code-llama-instruct:13:ggufv2:Q5_K_M,0.0,8.0,0.0,5
code-llama-instruct:13:ggufv2:Q6_K,0.0,8.0,0.0,5
code-llama-instruct:7:ggufv2:Q6_K,0.0,8.0,0.0,5
code-llama-instruct:7:ggufv2:Q8_0,0.0,9.0,0.0,5
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K,0.0,9.0,0.0,5
llama-2-chat:7:ggufv2:Q2_K,0.0,9.0,0.0,5
code-llama-instruct:13:ggufv2:Q2_K,0.0,8.0,0.0,5
llama-2-chat:13:ggufv2:Q2_K,0.0,9.0,0.0,5
llama-2-chat:13:ggufv2:Q3_K_M,0.0,9.0,0.0,5
code-llama-instruct:13:ggufv2:Q2_K,0.0,8.0,0.0,5
llama-2-chat:13:ggufv2:Q4_K_M,0.0,9.0,0.0,5
llama-2-chat:13:ggufv2:Q5_K_M,0.0,9.0,0.0,5
llama-2-chat:13:ggufv2:Q6_K,0.0,8.0,0.0,5
code-llama-instruct:13:ggufv2:Q8_0,0.0,8.0,0.0,5
code-llama-instruct:34:ggufv2:Q2_K,0.0,8.0,0.0,5
code-llama-instruct:34:ggufv2:Q3_K_M,0.0,8.0,0.0,5
llama-2-chat:13:ggufv2:Q5_K_M,0.0,9.0,0.0,5
llama-2-chat:13:ggufv2:Q6_K,0.0,8.0,0.0,5
code-llama-instruct:34:ggufv2:Q4_K_M,0.0,8.0,0.0,5
llama-2-chat:13:ggufv2:Q8_0,0.0,9.0,0.0,5
llama-2-chat:70:ggufv2:Q2_K,0.0,9.0,0.0,5
code-llama-instruct:34:ggufv2:Q4_K_M,0.0,8.0,0.0,5
llama-2-chat:13:ggufv2:Q3_K_M,0.0,9.0,0.0,5
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Full model name,Score achieved,Score possible,Accuracy,Iterations
llama-2-chat:70:ggufv2:Q3_K_M,6.0,6.0,1.0,5
llama-2-chat:13:ggufv2:Q5_K_M,6.0,6.0,1.0,5
llama-3-instruct:8:ggufv2:Q6_K,6.0,6.0,1.0,5
llama-2-chat:13:ggufv2:Q8_0,6.0,6.0,1.0,5
llama-2-chat:70:ggufv2:Q2_K,6.0,6.0,1.0,5
llama-2-chat:70:ggufv2:Q4_K_M,6.0,6.0,1.0,5
Expand All @@ -11,8 +11,9 @@ llama-2-chat:7:ggufv2:Q5_K_M,6.0,6.0,1.0,5
llama-2-chat:7:ggufv2:Q6_K,6.0,6.0,1.0,5
llama-2-chat:7:ggufv2:Q8_0,6.0,6.0,1.0,5
llama-3-instruct:8:ggufv2:Q4_K_M,6.0,6.0,1.0,5
llama-3-instruct:8:ggufv2:Q6_K,6.0,6.0,1.0,5
llama-3-instruct:8:ggufv2:Q5_K_M,6.0,6.0,1.0,5
llama-3-instruct:8:ggufv2:Q8_0,6.0,6.0,1.0,5
llama-2-chat:13:ggufv2:Q5_K_M,6.0,6.0,1.0,5
mistral-instruct-v0.2:7:ggufv2:Q2_K,6.0,6.0,1.0,5
mistral-instruct-v0.2:7:ggufv2:Q3_K_M,6.0,6.0,1.0,5
mistral-instruct-v0.2:7:ggufv2:Q4_K_M,6.0,6.0,1.0,5
Expand All @@ -29,6 +30,7 @@ openhermes-2.5:7:ggufv2:Q8_0,6.0,6.0,1.0,5
llama-2-chat:13:ggufv2:Q4_K_M,6.0,6.0,1.0,5
llama-2-chat:13:ggufv2:Q3_K_M,6.0,6.0,1.0,5
llama-2-chat:13:ggufv2:Q2_K,6.0,6.0,1.0,5
gpt-4o-2024-05-13,6.0,6.0,1.0,5
gpt-4-0613,6.0,6.0,1.0,5
gpt-4-0125-preview,6.0,6.0,1.0,5
gpt-3.5-turbo-0613,6.0,6.0,1.0,5
Expand Down
31 changes: 31 additions & 0 deletions benchmark/results/processed/extraction_assay.csv
Original file line number Diff line number Diff line change
@@ -1,12 +1,43 @@
Full model name,Subtask,Score achieved,Score possible,Accuracy,Iterations
gpt-4o-2024-05-13,assay,6.673073593073593,9.0,0.7414526214526215,5
gpt-4-0125-preview,assay,6.602641802641802,9.0,0.7336268669602002,5
openhermes-2.5:7:ggufv2:Q6_K,assay,6.4535353535353535,9.0,0.7170594837261504,5
mistral-instruct-v0.2:7:ggufv2:Q3_K_M,assay,6.421562986369853,9.0,0.7135069984855392,5
openhermes-2.5:7:ggufv2:Q8_0,assay,6.2414141414141415,9.0,0.6934904601571268,5
mistral-instruct-v0.2:7:ggufv2:Q8_0,assay,5.866200466200466,9.0,0.6518000518000517,5
mistral-instruct-v0.2:7:ggufv2:Q2_K,assay,5.841649341649342,9.0,0.6490721490721492,5
mistral-instruct-v0.2:7:ggufv2:Q6_K,assay,5.832722832722832,9.0,0.6480803147469814,5
openhermes-2.5:7:ggufv2:Q5_K_M,assay,5.774747474747475,9.0,0.641638608305275,5
mistral-instruct-v0.2:7:ggufv2:Q4_K_M,assay,5.724211735976442,9.0,0.6360235262196047,5
gpt-3.5-turbo-0613,assay,5.717171717171717,9.0,0.6352413019079686,5
mistral-instruct-v0.2:7:ggufv2:Q5_K_M,assay,5.660844581774814,9.0,0.6289827313083127,5
gpt-3.5-turbo-0125,assay,5.483244206773619,9.0,0.6092493563081799,5
gpt-4-0613,assay,5.47237639553429,9.0,0.6080418217260323,5
openhermes-2.5:7:ggufv2:Q4_K_M,assay,5.404732049559636,9.0,0.600525783284404,5
openhermes-2.5:7:ggufv2:Q3_K_M,assay,4.993293054771316,9.0,0.5548103394190351,5
openhermes-2.5:7:ggufv2:Q2_K,assay,4.356890331890332,9.0,0.48409892576559244,5
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M,assay,3.1754259304908654,9.0,0.35282510338787393,5
llama-2-chat:70:ggufv2:Q4_K_M,assay,1.850895834676892,9.0,0.20565509274187688,5
llama-2-chat:70:ggufv2:Q5_K_M,assay,1.8184386657888962,9.0,0.20204874064321068,5
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K,assay,1.6841937120145838,9.0,0.18713263466828708,5
chatglm3:6:ggmlv3:q4_0,assay,1.6167198512690701,9.0,0.17963553902989668,5
code-llama-instruct:7:ggufv2:Q4_K_M,assay,1.537777777777778,9.0,0.1708641975308642,5
llama-3-instruct:8:ggufv2:Q6_K,assay,1.4810308738880167,9.0,0.16455898598755742,5
llama-3-instruct:8:ggufv2:Q8_0,assay,1.3708822923108637,9.0,0.15232025470120708,5
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0,assay,1.1632723199560822,9.0,0.12925247999512024,5
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M,assay,1.1592561785121616,9.0,0.12880624205690683,5
llama-2-chat:70:ggufv2:Q2_K,assay,1.1509544548608548,9.0,0.12788382831787276,5
llama-2-chat:70:ggufv2:Q3_K_M,assay,1.0778833336153588,9.0,0.11976481484615098,5
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M,assay,1.053469154805127,9.0,0.11705212831168077,5
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K,assay,1.029088605731717,9.0,0.11434317841463522,5
llama-2-chat:13:ggufv2:Q2_K,assay,0.9744409046661863,9.0,0.10827121162957626,5
llama-3-instruct:8:ggufv2:Q5_K_M,assay,0.9227064915034839,9.0,0.1025229435003871,5
llama-2-chat:7:ggufv2:Q5_K_M,assay,0.9192592592592593,9.0,0.10213991769547326,5
llama-2-chat:13:ggufv2:Q5_K_M,assay,0.8363486769132731,9.0,0.09292763076814145,5
llama-2-chat:13:ggufv2:Q8_0,assay,0.7563022941970311,9.0,0.08403358824411457,5
llama-2-chat:13:ggufv2:Q3_K_M,assay,0.7505568291135255,9.0,0.08339520323483617,5
llama-2-chat:13:ggufv2:Q4_K_M,assay,0.647223087245539,9.0,0.07191367636061545,5
llama-2-chat:7:ggufv2:Q4_K_M,assay,0.6047993567815865,9.0,0.06719992853128738,5
llama-3-instruct:8:ggufv2:Q4_K_M,assay,0.5222726775358355,9.0,0.058030297503981726,5
llama-2-chat:7:ggufv2:Q3_K_M,assay,0.4556989247311828,9.0,0.05063321385902031,5
llama-2-chat:7:ggufv2:Q2_K,assay,0.23382370530829094,9.0,0.025980411700921215,5
31 changes: 31 additions & 0 deletions benchmark/results/processed/extraction_chemical.csv
Original file line number Diff line number Diff line change
@@ -1,12 +1,43 @@
Full model name,Subtask,Score achieved,Score possible,Accuracy,Iterations
gpt-4-0613,chemical,6.388888888888889,9.0,0.7098765432098766,5
gpt-4-0125-preview,chemical,6.222222222222222,9.0,0.691358024691358,5
openhermes-2.5:7:ggufv2:Q6_K,chemical,6.166666666666667,9.0,0.6851851851851852,5
gpt-4o-2024-05-13,chemical,5.555555555555555,9.0,0.6172839506172839,5
gpt-3.5-turbo-0613,chemical,5.444444444444445,9.0,0.6049382716049383,5
openhermes-2.5:7:ggufv2:Q3_K_M,chemical,5.233091787439614,9.0,0.581454643048846,5
openhermes-2.5:7:ggufv2:Q8_0,chemical,5.166666666666667,9.0,0.5740740740740741,5
openhermes-2.5:7:ggufv2:Q5_K_M,chemical,5.066666666666666,9.0,0.5629629629629629,5
gpt-3.5-turbo-0125,chemical,5.064444444444445,9.0,0.5627160493827161,5
openhermes-2.5:7:ggufv2:Q4_K_M,chemical,4.955555555555556,9.0,0.5506172839506173,5
openhermes-2.5:7:ggufv2:Q2_K,chemical,4.666666666666666,9.0,0.5185185185185185,5
mistral-instruct-v0.2:7:ggufv2:Q5_K_M,chemical,4.023323324509765,9.0,0.44703592494552946,5
mistral-instruct-v0.2:7:ggufv2:Q3_K_M,chemical,3.6982448407132837,9.0,0.41091609341258706,5
mistral-instruct-v0.2:7:ggufv2:Q6_K,chemical,3.558802308802309,9.0,0.3954224787558121,5
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M,chemical,3.231746031746032,9.0,0.35908289241622576,5
mistral-instruct-v0.2:7:ggufv2:Q2_K,chemical,2.9648008911166808,9.0,0.3294223212351868,5
mistral-instruct-v0.2:7:ggufv2:Q4_K_M,chemical,2.8592632749513482,9.0,0.3176959194390387,5
mistral-instruct-v0.2:7:ggufv2:Q8_0,chemical,2.8021413110698825,9.0,0.3113490345633203,5
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K,chemical,2.288387147059708,9.0,0.2542652385621898,5
llama-3-instruct:8:ggufv2:Q6_K,chemical,1.9925925925925927,9.0,0.22139917695473252,5
llama-3-instruct:8:ggufv2:Q5_K_M,chemical,1.9845117845117846,9.0,0.2205013093901983,5
llama-3-instruct:8:ggufv2:Q8_0,chemical,1.9845117845117846,9.0,0.2205013093901983,5
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M,chemical,1.9268708406662942,9.0,0.2140967600740327,5
llama-2-chat:70:ggufv2:Q2_K,chemical,1.9240307745782197,9.0,0.21378119717535773,5
llama-2-chat:70:ggufv2:Q4_K_M,chemical,1.8659366473319963,9.0,0.20732629414799958,5
llama-2-chat:70:ggufv2:Q5_K_M,chemical,1.7972027972027973,9.0,0.1996891996891997,5
llama-2-chat:70:ggufv2:Q3_K_M,chemical,1.654167859715305,9.0,0.1837964288572561,5
llama-2-chat:13:ggufv2:Q4_K_M,chemical,1.6088501452885013,9.0,0.17876112725427792,5
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K,chemical,1.37177969361964,9.0,0.15241996595773777,5
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0,chemical,1.0247327948928762,9.0,0.1138591994325418,5
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M,chemical,0.9938956265833743,9.0,0.11043284739815269,5
llama-3-instruct:8:ggufv2:Q4_K_M,chemical,0.920791108205296,9.0,0.10231012313392178,5
chatglm3:6:ggmlv3:q4_0,chemical,0.8392930574865726,9.0,0.09325478416517473,5
llama-2-chat:7:ggufv2:Q5_K_M,chemical,0.580952380952381,9.0,0.06455026455026455,5
llama-2-chat:13:ggufv2:Q5_K_M,chemical,0.47397754611066556,9.0,0.05266417179007395,5
llama-2-chat:13:ggufv2:Q8_0,chemical,0.47397754611066556,9.0,0.05266417179007395,5
llama-2-chat:13:ggufv2:Q3_K_M,chemical,0.447004222503315,9.0,0.049667135833701664,5
code-llama-instruct:7:ggufv2:Q4_K_M,chemical,0.4418901660280971,9.0,0.049098907336455236,5
llama-2-chat:13:ggufv2:Q2_K,chemical,0.42911786937479424,9.0,0.047679763263866026,5
llama-2-chat:7:ggufv2:Q4_K_M,chemical,0.4167017026246235,9.0,0.04630018918051373,5
llama-2-chat:7:ggufv2:Q3_K_M,chemical,0.2701509872241579,9.0,0.03001677635823977,5
llama-2-chat:7:ggufv2:Q2_K,chemical,0.2649429813608918,9.0,0.029438109040099088,5
31 changes: 31 additions & 0 deletions benchmark/results/processed/extraction_context.csv
Original file line number Diff line number Diff line change
@@ -1,12 +1,43 @@
Full model name,Subtask,Score achieved,Score possible,Accuracy,Iterations
gpt-4-0613,context,7.9066286955899665,9.0,0.8785142995099963,5
gpt-4-0125-preview,context,7.852526840634657,9.0,0.8725029822927397,5
gpt-4o-2024-05-13,context,7.8296500008341265,9.0,0.8699611112037918,5
gpt-3.5-turbo-0125,context,6.892469269171475,9.0,0.7658299187968305,5
openhermes-2.5:7:ggufv2:Q4_K_M,context,6.890545812069671,9.0,0.7656162013410746,5
openhermes-2.5:7:ggufv2:Q6_K,context,6.7998911465892595,9.0,0.75554346073214,5
openhermes-2.5:7:ggufv2:Q3_K_M,context,6.772712292356359,9.0,0.7525235880395954,5
openhermes-2.5:7:ggufv2:Q8_0,context,6.677492956544011,9.0,0.7419436618382235,5
gpt-3.5-turbo-0613,context,6.504724288106235,9.0,0.722747143122915,5
openhermes-2.5:7:ggufv2:Q5_K_M,context,6.447693911327509,9.0,0.7164104345919454,5
mistral-instruct-v0.2:7:ggufv2:Q8_0,context,5.1675402838156606,9.0,0.5741711426461845,5
mistral-instruct-v0.2:7:ggufv2:Q5_K_M,context,5.125993745609519,9.0,0.5695548606232799,5
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M,context,5.028443149414303,9.0,0.5587159054904781,5
mistral-instruct-v0.2:7:ggufv2:Q6_K,context,5.015802962871023,9.0,0.5573114403190025,5
mistral-instruct-v0.2:7:ggufv2:Q2_K,context,4.993618461701055,9.0,0.5548464957445617,5
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K,context,4.5131432560655735,9.0,0.5014603617850637,5
llama-2-chat:70:ggufv2:Q3_K_M,context,4.223320462233215,9.0,0.46925782913702385,5
llama-2-chat:70:ggufv2:Q4_K_M,context,4.102843431479814,9.0,0.455871492386646,5
llama-2-chat:70:ggufv2:Q2_K,context,4.089792058375472,9.0,0.45442133981949684,5
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M,context,4.063184785079324,9.0,0.45146497611992487,5
mistral-instruct-v0.2:7:ggufv2:Q4_K_M,context,4.011171998461201,9.0,0.4456857776068001,5
mistral-instruct-v0.2:7:ggufv2:Q3_K_M,context,3.9098205855378256,9.0,0.43442450950420286,5
openhermes-2.5:7:ggufv2:Q2_K,context,3.868972771294753,9.0,0.42988586347719476,5
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M,context,3.794155209972813,9.0,0.4215728011080903,5
llama-2-chat:70:ggufv2:Q5_K_M,context,3.74590965202609,9.0,0.41621218355845446,5
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0,context,3.70125778870553,9.0,0.4112508654117255,5
code-llama-instruct:7:ggufv2:Q4_K_M,context,3.3265667121567946,9.0,0.3696185235729772,5
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K,context,3.145198436368636,9.0,0.34946649292984844,5
chatglm3:6:ggmlv3:q4_0,context,2.8563592524507615,9.0,0.31737325027230684,5
llama-2-chat:7:ggufv2:Q3_K_M,context,2.1085650159513922,9.0,0.2342850017723769,5
llama-2-chat:7:ggufv2:Q4_K_M,context,1.8960518045585364,9.0,0.21067242272872627,5
llama-2-chat:13:ggufv2:Q3_K_M,context,1.7886791639732817,9.0,0.19874212933036464,5
llama-2-chat:13:ggufv2:Q5_K_M,context,1.786179992924834,9.0,0.1984644436583149,5
llama-2-chat:13:ggufv2:Q4_K_M,context,1.773506826331816,9.0,0.19705631403686846,5
llama-3-instruct:8:ggufv2:Q8_0,context,1.6733362618842498,9.0,0.1859262513204722,5
llama-3-instruct:8:ggufv2:Q5_K_M,context,1.6482060618893395,9.0,0.18313400687659329,5
llama-2-chat:13:ggufv2:Q8_0,context,1.5882119645062303,9.0,0.1764679960562478,5
llama-3-instruct:8:ggufv2:Q4_K_M,context,1.571688476296535,9.0,0.17463205292183723,5
llama-2-chat:13:ggufv2:Q2_K,context,1.3428860743038644,9.0,0.1492095638115405,5
llama-2-chat:7:ggufv2:Q5_K_M,context,1.2388077239089992,9.0,0.13764530265655547,5
llama-2-chat:7:ggufv2:Q2_K,context,1.1233454059056658,9.0,0.12481615621174064,5
llama-3-instruct:8:ggufv2:Q6_K,context,1.102922285992053,9.0,0.12254692066578368,5
Loading

0 comments on commit 97bfe80

Please sign in to comment.