Replace our GPTQ implementation with something better #583

carmocca · 2023-09-26T13:00:32Z

The original implementation that we use is slow and hardcodes the list of datasets to use https://github.com/Lightning-AI/lit-gpt/blob/e095ed300dd9ffbca89c6416eeb056b08869721f/quantize/gptq.py#L474-L481 and layers to quantize: https://github.com/Lightning-AI/lit-gpt/blob/e095ed300dd9ffbca89c6416eeb056b08869721f/quantize/gptq.py#L498-L516

One example could be https://github.com/PanQiWei/AutoGPTQ. The library is specific to huggingface/transformers modules but it could be forked to support regular torch.nn.Modules.

Any other suggestions are appreciated.

The text was updated successfully, but these errors were encountered:

Andrei-Aksionov · 2023-12-06T14:43:31Z

At the moment of writing, there are two general-purpose post-training quantization methods: AutoGPTQ and AutoAWQ.

It's a bit tricky to choose to what approach to use, since different models might show different results.
For instance, in this comparison with LlaMa2-7b (on 3090) GPTQ shows much lower VRAM usage and faster token generation, albeit with slightly higher perplexity.
But in this test by Huggingface (on A100), AWQ with Zyphyr-7B shows slightly lower VRAM usage (at small batch sizes) and higher throughput (on larger batch sizes), but higher latency.

I did a quick sanity check on A10G with facebook/opt-125m and AWQ was around 30% slower to generate 1k tokens.

So, in overall, I'd say it's better to stick to AutoGPTQ for now.

carmocca added enhancement New feature or request quantization labels Sep 26, 2023

Andrei-Aksionov mentioned this issue Sep 28, 2023

QA-LoRA: Quantization Aware Low-Rank Adaptation #595

Open

carmocca mentioned this issue Jan 18, 2024

Drop GPTQ support #889

Merged

carmocca closed this as completed in #889 Jan 18, 2024

Andrei-Aksionov mentioned this issue Feb 12, 2024

AutoGPTQ integration #924

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace our GPTQ implementation with something better #583

Replace our GPTQ implementation with something better #583

carmocca commented Sep 26, 2023

Andrei-Aksionov commented Dec 6, 2023

Replace our GPTQ implementation with something better #583

Replace our GPTQ implementation with something better #583

Comments

carmocca commented Sep 26, 2023

Andrei-Aksionov commented Dec 6, 2023