Perplexity: Compute scores correlated to HellaSwag #2312

klosax · 2023-07-21T20:32:41Z

This PR adds a --perplexity-lines parameter to the perplexity tool. In this mode the perplexity is calculated over each line of the prompt instead of over each ctx window.

HellaSwag scores is a great way to measure how much of the English language the model understands.

Make two runs on a model, one prompted with a file containing "correct" sentences (one per line) and another run with a file containing "wrong" sentences. The measured perplexity from both files can be used to compute a score that is linearly correlated to the HellaSwag score.

ppl_correct = Cumulative perplexity on each line of hellaswag_val_correct.txt, lower values are better.
ppl_wrong = Cumulative perplexity on each line of hellaswag_val_wrong.txt, higher values are better.

The formula (ppl_wrong - ppl_correct) / ppl_correct correlates linearly with HellaSwag scores on Open LLM Leaderboard.

Test files: klosax/ppl_hellaswag.

Open LLaMA 3B

200 lines	ppl_wrong	ppl_correct	formula x 100
F16	24.6445	16.0094	53.937455
Q8_0	24.6335	16.0000	53.959752
Q5_1	24.9139	16.2154	53.643474
Q4_0	25.4092	16.4574	54.393903

400 lines	ppl_wrong	ppl_correct	formula x 100
F16	23.7929	16.3507	45.515894
Q8_0	23.7787	16.3446	45.483392
Q5_1	24.0065	16.5328	45.205637
Q4_0	24.4787	16.8715	45.088413

ggerganov

Thanks for adding this.

It would be interesting to see what are the HellaSwag numbers for the different quantizations that we have and how they compare to F16.
We can merge and do the evaluations later, or if can post some numbers here in the PR

examples/perplexity/perplexity.cpp

klosax added 3 commits July 21, 2023 21:19

Update common.h

68d2ca6

Update common.cpp

ebf009f

Update perplexity.cpp

545862a

ggerganov added high priority Very important issue generation quality Quality of model output labels Jul 22, 2023

ggerganov approved these changes Jul 22, 2023

View reviewed changes

examples/perplexity/perplexity.cpp Outdated Show resolved Hide resolved

examples/perplexity/perplexity.cpp Outdated Show resolved Hide resolved

klosax added 3 commits July 22, 2023 10:31

Update perplexity.cpp

9a36dff

Update perplexity.cpp

f62bcfe

Update perplexity.cpp

abd66a9

klosax merged commit b5fe67f into ggml-org:master Jul 22, 2023

klosax deleted the perplexity-lines branch July 22, 2023 13:05

klosax mentioned this pull request Jul 25, 2023

Hellaswag scores #2389

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perplexity: Compute scores correlated to HellaSwag #2312

Perplexity: Compute scores correlated to HellaSwag #2312

klosax commented Jul 21, 2023 •

edited

Loading

ggerganov left a comment

Perplexity: Compute scores correlated to HellaSwag #2312

Perplexity: Compute scores correlated to HellaSwag #2312

Conversation

klosax commented Jul 21, 2023 • edited Loading

ggerganov left a comment

Choose a reason for hiding this comment

klosax commented Jul 21, 2023 •

edited

Loading