You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I know the quantize.py converts a ggml 16 bits into a ggml 4 bits RTN.
Do you think it's possible to create a script that converts a ggml 16 bits into a ggml 4bits GPTQ?
Referring to this repository, it appears that the current implementation of the quantization relies only on GPU, which demands a significant amount of VRAM and might not be suitable for the average user.
A new script, which we could call "quantize-ggml_16bit-to-gptq.py", could be designed to use only CPU and RAM resources, making it more accessible to the general public.
The text was updated successfully, but these errors were encountered:
Hello,
I know the quantize.py converts a ggml 16 bits into a ggml 4 bits RTN.
Do you think it's possible to create a script that converts a ggml 16 bits into a ggml 4bits GPTQ?
Referring to this repository, it appears that the current implementation of the quantization relies only on GPU, which demands a significant amount of VRAM and might not be suitable for the average user.
A new script, which we could call "quantize-ggml_16bit-to-gptq.py", could be designed to use only CPU and RAM resources, making it more accessible to the general public.
The text was updated successfully, but these errors were encountered: