We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
https://github.com/ggerganov/llama.cpp/blob/8b679987cdce292ff36bd741f6715e4927e26f9b/llama.cpp#L1558
Is currently single threaded. Quantization is quite slow (vicuna 7B: 65156.31 ms, vicuna 13B: 129902.48 ms).
The text was updated successfully, but these errors were encountered:
@ikawrakow did that in #896, see kQuantizeQ4 in ggml_extra.cpp, but that's for a new quantization scheme. https://github.com/ggerganov/llama.cpp/blob/6bfb00a53b1a06e209f1b814356dd79ee96b89af/ggml_extra.cpp#L287-L291
kQuantizeQ4
It did indeed speed things up. This could probably be integrated into llama_model_quantize_internal so that a new cpp module isn't necessary.
llama_model_quantize_internal
Sorry, something went wrong.
Is the new quantization scheme the one that minimizes MSE against the original weights?
Resolved by #1075
No branches or pull requests
https://github.com/ggerganov/llama.cpp/blob/8b679987cdce292ff36bd741f6715e4927e26f9b/llama.cpp#L1558
Is currently single threaded. Quantization is quite slow (vicuna 7B: 65156.31 ms, vicuna 13B: 129902.48 ms).
The text was updated successfully, but these errors were encountered: