Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gpt-2: Add ppl for gpt-2 #521

Closed
wants to merge 2 commits into from
Closed

Conversation

xingchensong
Copy link

@xingchensong xingchensong commented Sep 14, 2023

Usage:

./build/bin/gpt-2 \
    -m models/gpt-2-1558M/ggml-model-q4_0.bin \
    --perplexity \
    -p "Meta (formerly known as Facebook) operates both PyTorch and Convolutional Architecture for Fast Feature Embedding (Caffe2), but models defined by the two frameworks were mutually incompatible. The Open Neural Network Exchange (ONNX) project was created by Meta and Microsoft in September 2017 for converting models between frameworks. Caffe2 was merged into PyTorch at the end of March 2018."

@xingchensong
Copy link
Author

xingchensong commented Sep 14, 2023

Model Measure FP16 Q4_0 (QK=16) Q4_0 (QK=32, default setting) Q4_0 (QK=64) Q4_0 (QK=1600)
gpt2-1.5B perplexity 36.00061850 725137.2734629 37.58230259 -nan -nan
model size 3GB 937M 845M 799M 754M

When I manually changed the macro definition QK4_0 to 16, the perplexity was significantly high. This is strange. Is it possible that there are other parts in the code that need to be modified in order to accurately test QK=16?

image

@xingchensong xingchensong changed the title feat(ppl): Add ppl for gpt-2 gpt-2: Add ppl for gpt-2 Sep 15, 2023
@ggerganov
Copy link
Member

When I manually changed the macro definition QK4_0 to 16, the perplexity was significantly high. This is strange. Is it possible that there are other parts in the code that need to be modified in order to accurately test QK=16?

I think we have various assumptions in the SIMD implementations that basically do not support changing QK.
Maybe try to build without SIMD (i.e. no AVX, AVX2, NEON, etc) and see what happens - I think the reference implementation should work with QK=16

@xingchensong
Copy link
Author

I first tried building it without using SIMD, but the perplexity results still seemed odd. After digging into it, I found out that in the function ggml_vec_dot_q4_0_q8_0, the macro definition used for calculating qk was QK8_0 instead of QK4_0. So, to test with QK=16, we had to change both of these values together.

image

After making the changes, here are the results:

Model Measure FP16 Q4_0 (QK=16) Q4_0 (QK=32, default setting) Q4_0 (QK=64) Q4_0 (QK=1600)
gpt2-1.5B perplexity 36.00061850 36.46691983 37.58230259 37.48167607 36.46209477
model size 3GB 937M 845M 799M 754M

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants