-
Notifications
You must be signed in to change notification settings - Fork 11.4k
Simplify the quantization process #463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Agree; I was recently confused by the various type ids ( Though I think for performance reasons you can't really put to much abstraction in the vector dot-product in ggml.c. quantize.py could probably be removed if we manage to make quantize.cpp just a bit more user-friendly. Or make llama.cpp an executable and get rid of quantize.cpp too. |
That is true, quantize.py is an wholly unnecessary step.
That would be going backwards, the reason for llama.cpp to exist is to have a common C API which can be interfaced by 'apps' like main, quantize, perplexity, etc. ggml is shared with whisper.cpp so it needs to exist. when you look at quantize.cpp there is really no logic there, it's just a wrapper for calling the API. The way I see it , it's completely opposite. Meaning that it makes changing things easier since there are these two apis which is shared by everything. When in the future there is inevitably going to be a lot more apps than the current main,quantize,perplexity , imagine having to change every single one of them instead of changing just the API. I just can't see how that would be a better option. |
|
The current quantization call stack is long and difficult to debug, which makes extending or adding new quantization methods in the future a major issue. This is because changes would need to be made in various places.
Additionally, we should aim to add drivers that help with benchmarking various quantization methods.
The current stack:
Open to suggestions here and would like to hear if it's worth investing our time and effort
The text was updated successfully, but these errors were encountered: