You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We've recently open-sourced a new quantization method. VPTQ (Vector Post-Training Quantization) is a novel Post-Training Quantization method that leverages Vector Quantization to achieve high accuracy on Large Language Models (LLMs) at an extremely low bit-width (<2-bit). VPTQ can compress 70 billion parameter models, even up to the 405 billion parameter models, to 1-2 bits without retraining while maintaining high accuracy.
I am currently attempting to integrate VPTQ into AO. Does anyone have suggestions or best practices for this kind of integration? What should I be particularly aware of?
Thanks!
Yang
The text was updated successfully, but these errors were encountered:
Hi all,
We've recently open-sourced a new quantization method. VPTQ (Vector Post-Training Quantization) is a novel Post-Training Quantization method that leverages Vector Quantization to achieve high accuracy on Large Language Models (LLMs) at an extremely low bit-width (<2-bit). VPTQ can compress 70 billion parameter models, even up to the 405 billion parameter models, to 1-2 bits without retraining while maintaining high accuracy.
For more details, you can check the code and documentation here: https://github.com/microsoft/VPTQ
And the model can be accessed here: https://huggingface.co/VPTQ-community
I am currently attempting to integrate VPTQ into AO. Does anyone have suggestions or best practices for this kind of integration? What should I be particularly aware of?
Thanks!
Yang
The text was updated successfully, but these errors were encountered: