Making a "quantize-ggml_16bit-to-gptq.py" script? #618

BadisG · 2023-03-30T07:23:54Z

Hello,

I know the quantize.py converts a ggml 16 bits into a ggml 4 bits RTN.
Do you think it's possible to create a script that converts a ggml 16 bits into a ggml 4bits GPTQ?

Referring to this repository, it appears that the current implementation of the quantization relies only on GPU, which demands a significant amount of VRAM and might not be suitable for the average user.

A new script, which we could call "quantize-ggml_16bit-to-gptq.py", could be designed to use only CPU and RAM resources, making it more accessible to the general public.

FNsi · 2023-03-30T10:29:25Z

Maybe you can check ggml.c to edit the q4 type?

prusnak · 2023-03-30T23:23:26Z

quantize.py does not do any conversion by itself - it just calls ./quantize, so you might want to dig into examples/quantize/quantize.cpp for answers

github-actions · 2024-04-12T01:07:19Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

gjmulder added the enhancement label Mar 30, 2023

github-actions bot added the stale label Mar 25, 2024

github-actions bot closed this as completed Apr 12, 2024

Bearsaerker mentioned this issue Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Making a "quantize-ggml_16bit-to-gptq.py" script? #618

Making a "quantize-ggml_16bit-to-gptq.py" script? #618

BadisG commented Mar 30, 2023 •

edited

Loading

FNsi commented Mar 30, 2023

prusnak commented Mar 30, 2023

github-actions bot commented Apr 12, 2024

Making a "quantize-ggml_16bit-to-gptq.py" script? #618

Making a "quantize-ggml_16bit-to-gptq.py" script? #618

Comments

BadisG commented Mar 30, 2023 • edited Loading

FNsi commented Mar 30, 2023

prusnak commented Mar 30, 2023

github-actions bot commented Apr 12, 2024

BadisG commented Mar 30, 2023 •

edited

Loading