You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi i want to quantize a model which is already quantized to 4bit q4_1 but i want to make it compute faster so i wanted to ask what is the command to quantize the quantized module. I tried once with the command that is in the readme file but that didnt work. so can anyone help me?
The text was updated successfully, but these errors were encountered:
At the moment, I believe only 4-bit quantization has been implemented and is natively supported. You can find discussions about possibly supporting 2-bit quantization here (and 3-bit as a side note):
The comment above explains the situation. One node: if there is a 2-bit or 3-bit quantization available, it should be always performed from the original f16 or f32 file.
Hi i want to quantize a model which is already quantized to 4bit
q4_1
but i want to make it compute faster so i wanted to ask what is the command to quantize the quantized module. I tried once with the command that is in the readme file but that didnt work. so can anyone help me?The text was updated successfully, but these errors were encountered: