Merge LoRA Adapter with int8 base model. #638

jenkspt · 2023-06-26T23:06:04Z

Feature request

Support merging LoRA adapters with base model when base model is loaded in int8.

Motivation

This is helpful when the goal is to merge adapter weights for faster inference with 8bit model inference.
This is helpful for low memory environments when it may not be possible to load the model in half precision before merging.

Your contribution

Happy to create this PR. Any insights to avoid pitfalls are welcome.

jenkspt · 2023-06-26T23:09:21Z

Relevant code: https://github.com/huggingface/peft/blob/main/src/peft/tuners/lora.py#L417-L418

pacman100 · 2023-06-27T12:53:13Z

Hello @jenkspt, the bottleneck here is that the lora weights are in float32 and the quantized models have weights in int8/NF4. So, it will involve quite some effort to support when a simple workaround exists. Below is the workaround

model = AutoModelForXXXXX.from_pretrained()
model = PeftModel.from_pretrained(model, peft_model_id)

model = model.merge_and_unload()

model.save_pretrained("merged_model")

model = AutoModelForXXXXX.from_pretrained("merged_model", load_in_8bit=True)

# do inference

cc @younesbelkada for adding more context

younesbelkada · 2023-06-27T14:22:45Z

Hi @jenkspt
Yes I second what @pacman100 said, you need to first load the fp16/bf16 standalone model, merge it and load it back in 8bit/4bit to make that work.

jenkspt · 2023-06-27T20:46:39Z

@pacman100 can you describe why this is difficult to implement?

Using falcon-40b with LoRA adapters, the workaround takes an unreasonable amount of time on the CPU, and doesn't work with limited GPU memory (i.e. only enough to support the 8bit model).

younesbelkada · 2023-06-27T21:43:56Z

@jenkspt
Technically it is doable, one needs to dequantize the int8 weights to fp16 on the fly for each 8bit bnb layer, however it will lead to numerical differences that can ultimately leading to not getting the same results as the original fp16/bf16 weights, hence the recommendation

jenkspt · 2023-06-28T18:15:36Z

Thanks for the info @younesbelkada. Since I've been fine-tuning with int8 + LoRA, there shouldn't be any numerical differences. Can you point me at any functions or resources for de-quantization? Happy to create a PR for this.

younesbelkada · 2023-06-29T07:43:32Z

Sure @jenkspt , you can check out this specific section of the bnb integration blogpost: https://huggingface.co/blog/hf-bitsandbytes-integration#usage
TLDR is that you need to loop over all Linear8bitLt layers, and apply the following operation:

(layer.weight.CB * layer.weight.SCB) / 127

It would be great if you can open a PR for that! Let us know if you need any help

jtrechot · 2023-07-10T08:49:50Z

If there is any PR about that issue @jenkspt , could you put the link here as well?

Thank you! :D

jenkspt · 2023-07-10T15:51:58Z

@jtrechot haven't gotten to it yet.

github-actions · 2023-08-04T15:03:27Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

jenkspt · 2023-08-09T22:25:39Z

I think this is still a desired feature -- I simply don't have the time to implement it right now.

younesbelkada · 2023-08-28T14:30:44Z

With #851 you can now call merge_and_unload on 4bit models, however keeping the issue open for 8bit models, please refer to: #851 (comment)

BenjaminBossan · 2023-08-29T08:29:35Z

#875 might provide the solution for 8bit.

github-actions · 2023-09-22T15:03:47Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

younesbelkada · 2023-09-22T15:18:49Z

It is now supported! Please install PEFT from source , closing the issue

younesbelkada mentioned this issue Aug 23, 2023

Support merge lora module for 4bit and 8bit linear #851

Merged

pacman100 closed this as completed in #851 Aug 28, 2023

younesbelkada reopened this Aug 28, 2023

younesbelkada closed this as completed Sep 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge LoRA Adapter with int8 base model. #638

Merge LoRA Adapter with int8 base model. #638

jenkspt commented Jun 26, 2023 •

edited

Loading

jenkspt commented Jun 26, 2023

pacman100 commented Jun 27, 2023

younesbelkada commented Jun 27, 2023

jenkspt commented Jun 27, 2023

younesbelkada commented Jun 27, 2023

jenkspt commented Jun 28, 2023 •

edited

Loading

younesbelkada commented Jun 29, 2023

jtrechot commented Jul 10, 2023

jenkspt commented Jul 10, 2023

github-actions bot commented Aug 4, 2023

jenkspt commented Aug 9, 2023

younesbelkada commented Aug 28, 2023

BenjaminBossan commented Aug 29, 2023

github-actions bot commented Sep 22, 2023

younesbelkada commented Sep 22, 2023

Merge LoRA Adapter with int8 base model. #638

Merge LoRA Adapter with int8 base model. #638

Comments

jenkspt commented Jun 26, 2023 • edited Loading

Feature request

Motivation

Your contribution

jenkspt commented Jun 26, 2023

pacman100 commented Jun 27, 2023

younesbelkada commented Jun 27, 2023

jenkspt commented Jun 27, 2023

younesbelkada commented Jun 27, 2023

jenkspt commented Jun 28, 2023 • edited Loading

younesbelkada commented Jun 29, 2023

jtrechot commented Jul 10, 2023

jenkspt commented Jul 10, 2023

github-actions bot commented Aug 4, 2023

jenkspt commented Aug 9, 2023

younesbelkada commented Aug 28, 2023

BenjaminBossan commented Aug 29, 2023

github-actions bot commented Sep 22, 2023

younesbelkada commented Sep 22, 2023

jenkspt commented Jun 26, 2023 •

edited

Loading

jenkspt commented Jun 28, 2023 •

edited

Loading