[New method] VPTQ Vector Post-Training Quantization Support #1204

YangWang92 · 2024-10-31T06:17:18Z

Hi all,

We've recently open-sourced a new quantization method. VPTQ (Vector Post-Training Quantization) is a novel Post-Training Quantization method that leverages Vector Quantization to achieve high accuracy on Large Language Models (LLMs) at an extremely low bit-width (<2-bit). VPTQ can compress 70 billion parameter models, even up to the 405 billion parameter models, to 1-2 bits without retraining while maintaining high accuracy.

For more details, you can check the code and documentation here: https://github.com/microsoft/VPTQ

And the model can be accessed here: https://huggingface.co/VPTQ-community

I am currently attempting to integrate VPTQ into AO. Does anyone have suggestions or best practices for this kind of integration? What should I be particularly aware of?

Thanks!
Yang

msaroufim · 2024-10-31T14:23:20Z

On phone so apologies for brevity. This seems awesome and we've been trying to start pushing boundaries with less than 4 bit quantization

You can checkout #391 and refer to some of the more recent contributions like auto round as an example of what to do.

@jerryzh168 is also hoping to simplify some of this work as well

jerryzh168 · 2024-10-31T20:25:58Z

#1184 and #1195 might be helpful as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[New method] VPTQ Vector Post-Training Quantization Support #1204

[New method] VPTQ Vector Post-Training Quantization Support #1204

YangWang92 commented Oct 31, 2024

msaroufim commented Oct 31, 2024

jerryzh168 commented Oct 31, 2024

[New method] VPTQ Vector Post-Training Quantization Support #1204

[New method] VPTQ Vector Post-Training Quantization Support #1204

Comments

YangWang92 commented Oct 31, 2024

msaroufim commented Oct 31, 2024

jerryzh168 commented Oct 31, 2024