Technical details about quantization #1694

Tunaaaaa · 2023-06-05T02:48:17Z

Hi ggerganov,

Good morning!

I'm confused about the details of "q4_0", "q4_1", "GPTQ".
I've read the code of ./examples/quantize/quantize.cpp and found both q4_0 and q4_1 did't use the tech about "Hessian". It's more like just use min/max to do quantization.
However, at the end of ,, the youtuber video describes a method using H^-1 for optimization.
So I'm wondering what's the difference between "q4_0", "q4_1", "GPTQ" and that youtube vedio?

Thanks a lot.
Cheers,

TTTuna

github-actions · 2024-04-10T01:07:50Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions bot added the stale label Mar 25, 2024

github-actions bot closed this as completed Apr 10, 2024

Bearsaerker mentioned this issue Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Technical details about quantization #1694

Technical details about quantization #1694

Tunaaaaa commented Jun 5, 2023

github-actions bot commented Apr 10, 2024

Technical details about quantization #1694

Technical details about quantization #1694

Comments

Tunaaaaa commented Jun 5, 2023

github-actions bot commented Apr 10, 2024