-
Notifications
You must be signed in to change notification settings - Fork 11.4k
Add AVX2 implementation of quantize_row_q4_1 #515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Full run output
Please disregard this result, I was using a broken model. I am re-running the perplexity computation now. |
running on latest master, it starts out like this for me:
your branch on my machine:
note: i tried to match your settings |
@Green-Sky does your system-info have the same flags as mine? I wonder if there is a different path somewhere that may cause the difference. I get the same results even after rebasing to current master. On master, my result is also different than yours:
Just in case my model is broken somehow, this is the SHA256 hash:
Can you verify if yours is the same? |
oh wow, it's different
I regenerated to double check, and same hash again. i also checked the src
which matches the |
@Green-Sky It looks like the problem was my model, after re-converting and re-quantizing the model I get the same sum and perplexity as yours. I will re-run the perplexity computation in case there is a significant difference. Thanks for checking! |
If I understood the results correctly, @Green-Sky shows major increase in speed with a slight decrease in accuracy? A sidepoint related to this: edit: -snip- as it doesn't really belong here, I made it a discussion topic: |
updated my previous post with system_info and make command.
yes, however the perplexity is very unstable in the beginning. so a full run would be necessary. |
7dca16b
to
ae08d8e
Compare
Perplexity: 6.3056 (7B q4_1) Full run output
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the conflicts
Please resolve and merge
ae08d8e
to
e296529
Compare
Rebased to master. |
3125ea0
to
41669f6
Compare
The bot almost got it right, the purpose of using the reference implementation in |
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Ironically, after the changes to master I am seeing slightly lower perplexity with the AVX path in the first chunks. master: avx2: 🤷♂️ |
I guess we must be doing something right 🦙 |
…sions (ggml-org#515) Update poetry.lock accordingly.
Largely based on the AVX2 implementation of quantize_row_q4_0.
🤖 Generated by Copilot at ae08d8e
Summary
🚀🐛♻️
Improved matrix quantization with AVX2 and bug fixes. Added a new function
quantize_row_q4_1
that uses AVX2 instructions to speed up the quantization of a matrix row using 4-bit factors. Renamed and fixed the original functionquantize_row_q4_1_reference
. Updatedggml_quantize_q4_1
to use the appropriate function depending on the CPU capabilities.Walkthrough
quantize_row_q4_1
toquantize_row_q4_1_reference
to avoid confusion with the new AVX2-optimized function (link)quantize_row_q4_1
that uses AVX2 instructions to speed up the quantization algorithm for 4-bit factors (link)quantize_row_q4_1
withquantize_row_q4_1_reference
inggml_quantize_q4_1
to fix a bug and avoid unnecessary computation (link)