Serialize NF4 models (like 8-bit models), allowing them to be pushed to HuggingFace Repos #695

RonanKMcGovern · 2023-08-09T15:19:10Z

This would be a great feature because it would allow a 70B Llama model to be downloaded that is <40 GB in size in NF4 format.

XpracticeYSKM · 2023-09-17T03:22:02Z

Thanks for you. And I find when bnb_4bit_quant_type is set to 'nf4' or 'fp4', the dtype of weight is uint8. Is this phenomenon caused by this feature?

RonanKMcGovern · 2023-09-17T14:24:35Z

@XpracticeYSKM - setting the quant type means that certain weights are quantized - although not necessarily all to 4-bit . Both nf4 and fp4 are mostly quantizing to 4-bit, but it's a bit more nuanced because you don't want to fully quantize certain weights.

This is a somewhat unrelated question to this issue of allowing quantized models to be pushed to hub, so best to create a new issue for it if you have questions.

XpracticeYSKM · 2023-09-17T14:35:57Z

@RonanKMcGovern. Thanks for your reply! I mean when I use functional.quantize_4bit() to quantize weight，I find the output tensor called w_4bit will be wrapped with unit8(w_4bit.dtype is uint8). And it has 256 unique values from 0 to 255. I want to figure out why it need be wrapped with uint8 and why it have 256 unique values. Whether the feature leads to this phenomenon ？

RonanKMcGovern · 2023-09-17T16:04:21Z

@XpracticeYSKM , best to create a new issue, because this issue is for a different topic.

RonanKMcGovern · 2023-11-09T10:56:28Z

closing due to: #753

RonanKMcGovern mentioned this issue Aug 9, 2023

Support pushing of NF4 to hub huggingface/transformers#25391

Closed

This was referenced Aug 17, 2023

save_pretrained 4-bit models with bitsandbytes huggingface/transformers#23904

Closed

save_pretrained 4bits/8bits model huggingface/transformers#24851

Closed

alexcpn mentioned this issue Aug 21, 2023

How to save a quantized model artidoro/qlora#114

Closed

RonanKMcGovern mentioned this issue Aug 30, 2023

Is there any way to save models trained with 4 bit quantization? #738

Closed

RonanKMcGovern closed this as completed Nov 9, 2023

gianlucamacri mentioned this issue Nov 15, 2023

Saving pytorch_model.bin with QLORA dvlab-research/LongLoRA#123

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serialize NF4 models (like 8-bit models), allowing them to be pushed to HuggingFace Repos #695

Serialize NF4 models (like 8-bit models), allowing them to be pushed to HuggingFace Repos #695

RonanKMcGovern commented Aug 9, 2023

XpracticeYSKM commented Sep 17, 2023

RonanKMcGovern commented Sep 17, 2023

XpracticeYSKM commented Sep 17, 2023 •

edited

Loading

RonanKMcGovern commented Sep 17, 2023

RonanKMcGovern commented Nov 9, 2023

Serialize NF4 models (like 8-bit models), allowing them to be pushed to HuggingFace Repos #695

Serialize NF4 models (like 8-bit models), allowing them to be pushed to HuggingFace Repos #695

Comments

RonanKMcGovern commented Aug 9, 2023

XpracticeYSKM commented Sep 17, 2023

RonanKMcGovern commented Sep 17, 2023

XpracticeYSKM commented Sep 17, 2023 • edited Loading

RonanKMcGovern commented Sep 17, 2023

RonanKMcGovern commented Nov 9, 2023

XpracticeYSKM commented Sep 17, 2023 •

edited

Loading