-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to execute uniform quantization instead of NF4 quantization? #12
Comments
Hi, thank you for the interest of our work. LoftQ supports any existing quantization function in theory, but GPTQ implementation AutoGPTQ doesn't support dequantization, which is required in LoftQ (see Section 2.2 in LoftQ paper). If you can find GPTQ implementation that has the dequantization method, please let me know. I'm glad to add it to LoftQ :) |
Plus, we do have the experimental uniform quantization method at https://github.com/yxli2123/LoftQ/blob/main/glue/utils.py#L103. However, it's not the same uniform quantization used in GPTQ. |
Do you mean that vecquant4matmul is not a seperate dequantization function (dequantization + matmul) ? |
@yxli2123 Thank you for providing experimental details. And congratulations to LoftQ for being accepted as a oral at ICLR 2024! It's sure that AutoGPTQ uses group-wise quantization and bit compression. Maybe LoftQ requires a custom dequantization function if it have to integrate into PEFT. I found some related discussions about Pytorch-like Dequatization function: Faster Pytorch dequantize() + matmul for quantized models A dequantization function seems to be implemented by offical pytorch: I hope the above information will help you. |
Hi, thanks for your amzaing job. I found the code using NF4 quantization by default, but don't add any support to switch UQ. If I have a model quantized by GPTQ, how to use LoftQ on it?
I have tried a GPTQ-quantized model using PEFT, but it raised a exception as followed:
The text was updated successfully, but these errors were encountered: