-
Notifications
You must be signed in to change notification settings - Fork 672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serialize NF4 models (like 8-bit models), allowing them to be pushed to HuggingFace Repos #695
Comments
Thanks for you. And I find when bnb_4bit_quant_type is set to 'nf4' or 'fp4', the dtype of weight is uint8. Is this phenomenon caused by this feature? |
@XpracticeYSKM - setting the quant type means that certain weights are quantized - although not necessarily all to 4-bit . Both nf4 and fp4 are mostly quantizing to 4-bit, but it's a bit more nuanced because you don't want to fully quantize certain weights. This is a somewhat unrelated question to this issue of allowing quantized models to be pushed to hub, so best to create a new issue for it if you have questions. |
@RonanKMcGovern. Thanks for your reply! I mean when I use functional.quantize_4bit() to quantize weight,I find the output tensor called w_4bit will be wrapped with unit8(w_4bit.dtype is uint8). And it has 256 unique values from 0 to 255. I want to figure out why it need be wrapped with uint8 and why it have 256 unique values. Whether the feature leads to this phenomenon ? |
@XpracticeYSKM , best to create a new issue, because this issue is for a different topic. |
closing due to: #753 |
This would be a great feature because it would allow a 70B Llama model to be downloaded that is <40 GB in size in NF4 format.
The text was updated successfully, but these errors were encountered: