-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge LoRA Adapter with int8 base model. #638
Comments
Hello @jenkspt, the bottleneck here is that the lora weights are in float32 and the quantized models have weights in int8/NF4. So, it will involve quite some effort to support when a simple workaround exists. Below is the workaround
cc @younesbelkada for adding more context |
Hi @jenkspt |
@pacman100 can you describe why this is difficult to implement? Using falcon-40b with LoRA adapters, the workaround takes an unreasonable amount of time on the CPU, and doesn't work with limited GPU memory (i.e. only enough to support the 8bit model). |
@jenkspt |
Thanks for the info @younesbelkada. Since I've been fine-tuning with int8 + LoRA, there shouldn't be any numerical differences. Can you point me at any functions or resources for de-quantization? Happy to create a PR for this. |
Sure @jenkspt , you can check out this specific section of the bnb integration blogpost: https://huggingface.co/blog/hf-bitsandbytes-integration#usage (layer.weight.CB * layer.weight.SCB) / 127 It would be great if you can open a PR for that! Let us know if you need any help |
If there is any PR about that issue @jenkspt , could you put the link here as well? Thank you! :D |
@jtrechot haven't gotten to it yet. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
I think this is still a desired feature -- I simply don't have the time to implement it right now. |
With #851 you can now call merge_and_unload on 4bit models, however keeping the issue open for 8bit models, please refer to: #851 (comment) |
#875 might provide the solution for 8bit. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
It is now supported! Please install PEFT from source , closing the issue |
Feature request
Support merging LoRA adapters with base model when base model is loaded in int8.
Motivation
Your contribution
Happy to create this PR. Any insights to avoid pitfalls are welcome.
The text was updated successfully, but these errors were encountered: