Merge lora module to 8bit model #875

jiqing-feng · 2023-08-29T05:00:09Z

Hi @younesbelkada @pacman100 @BenjaminBossan @TimDettmers .

Relate to 851. I found a way to merge 8bit model.

BitsandBytes can only dequantize the int MatMul result. Therefore, I use an identify matrix to multiply the 8-bit weight, so the result is equal to the original weight after dequantization.

The test script is also attached

torch.manual_seed(1024)
model_origin = AutoModelForCausalLM.from_pretrained(
            "facebook/opt-125m",)
model = AutoModelForCausalLM.from_pretrained(
            "facebook/opt-125m",
            load_in_8bit=True,
        )
model = prepare_model_for_kbit_training(model)
random_input = torch.LongTensor([[1, 0, 1, 0, 1, 0]]).to(model.device)
print("original model outputs")
print(model_origin(random_input.clone().to(model_origin.device)).logits)

config = LoraConfig(
    r=8,
    init_lora_weights=False,
    target_modules=["k_proj", "v_proj", "q_proj", "out_proj", "fc1", "fc2"]
)
model = get_peft_model(model, config)
with torch.inference_mode():
    out_before_merge = model(random_input)
    print("out before merge")
    print(out_before_merge.logits)

model = model.merge_and_unload("default")
with torch.inference_mode():
    out_after_merge = model(random_input)
    print("out after merge")
    print(out_after_merge.logits)

The result should be

original model outputs
tensor([[[-3.9464, -3.9443,  3.2428,  ..., -3.9583, -3.9531, -4.0695],
         [ 2.1592,  2.1657,  3.4937,  ...,  2.1609,  2.2064,  1.8032],
         [ 1.7407,  1.7366,  3.3675,  ...,  1.7371,  1.7456,  1.5533],
         [ 1.7655,  1.7583,  3.1950,  ...,  1.7668,  1.7512,  1.5889],
         [ 1.8861,  1.8778,  3.0723,  ...,  1.8935,  1.8632,  1.7100],
         [ 2.0078,  1.9973,  2.9639,  ...,  2.0184,  1.9726,  1.8311]]],
       grad_fn=<UnsafeViewBackward0>)

out before merge
tensor([[[-4.8259, -4.8183,  0.9898,  ..., -4.9119, -4.9100, -4.8111],
         [-3.5888, -3.5779,  0.9716,  ..., -3.7284, -3.6713, -3.7122],
         [-3.6172, -3.6078,  0.8687,  ..., -3.7570, -3.6960, -3.7456],
         [-3.5690, -3.5609,  0.7098,  ..., -3.7097, -3.6471, -3.6961],
         [-3.6113, -3.6041,  0.6201,  ..., -3.7497, -3.6936, -3.7288],
         [-3.7012, -3.6957,  0.5803,  ..., -3.8397, -3.7881, -3.8144]]],
       device='cuda:0')

out after merge
tensor([[[-4.7348, -4.7283,  1.1155,  ..., -4.8248, -4.8206, -4.7260],
         [-3.5906, -3.5802,  0.9311,  ..., -3.7292, -3.6703, -3.7129],
         [-3.6750, -3.6668,  0.8512,  ..., -3.8123, -3.7480, -3.7943],
         [-3.6069, -3.5996,  0.7566,  ..., -3.7426, -3.6845, -3.7268],
         [-3.6024, -3.5957,  0.7001,  ..., -3.7365, -3.6801, -3.7149],
         [-3.7372, -3.7322,  0.5553,  ..., -3.8741, -3.8224, -3.8415]]],
       device='cuda:0')

We can see that the difference between the two logits is really small.

Would you please help me review it? Thx!

HuggingFaceDocBuilderDev · 2023-08-29T08:31:27Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

BenjaminBossan

Thanks a lot for keeping up with your good work and delivering the 8bit merging feature.

As you can see, there is a merge conflict now after we moved the tuners to sub-packages in #807. Don't worry, it is easy to fix:

Your change that starts with elif is_bnb_available() and ... should be moved to https://github.com/huggingface/peft/blob/main/src/peft/tuners/lora/model.py.

Your change that starts with def merge(self): should be moved to https://github.com/huggingface/peft/blob/main/src/peft/tuners/lora/bnb.py.

The test can stay the same.

On top of that, after merging #851, we found a small bug in your previous PR. The explanation is contained here. Please take a look so that we can ensure that the same thing does not happen with the 8bit layer. As you can see, we also worked on the test to make it a little more precise by checking probabilities and not tokens. I think this should work for this PR too.

pacman100

Very interesting approach!

jiqing-feng · 2023-08-29T12:22:39Z

Thanks a lot for keeping up with your good work and delivering the 8bit merging feature.

As you can see, there is a merge conflict now after we moved the tuners to sub-packages in #807. Don't worry, it is easy to fix:

Your change that starts with elif is_bnb_available() and ... should be moved to https://github.com/huggingface/peft/blob/main/src/peft/tuners/lora/model.py.

Your change that starts with def merge(self): should be moved to https://github.com/huggingface/peft/blob/main/src/peft/tuners/lora/bnb.py.

The test can stay the same.

On top of that, after merging #851, we found a small bug in your previous PR. The explanation is contained here. Please take a look so that we can ensure that the same thing does not happen with the 8bit layer. As you can see, we also worked on the test to make it a little more precise by checking probabilities and not tokens. I think this should work for this PR too.

Thanks for your help.

I have fixed the 8bit linear forward to support merge and disable adapter, but I have a question about the 4bit linear forward, see this. I wonder why the result need to add lora_B(lora_A(dropout(x))) * scaling twice?

BenjaminBossan · 2023-08-29T14:22:41Z

I wonder why the result need to add lora_B(lora_A(dropout(x))) * scaling twice?

Thanks for reporting, this is indeed a bug introduced -- by me :-/ -- recently. #878 should provide a fix.

BenjaminBossan · 2023-08-29T14:43:49Z

I did some scatter plots again (this time probas) and compared between the newly added 8bit merge and the previously added 4bit merge:

Now, the outputs are much closer for 8bit than for 4bit, contrary to what we saw earlier and in accordance with expectations. So for this example, the 8bit merge looks really good. It looks so good, in fact, that the tests locally pass for me even with atol=0.001 and rtol=0.3.

jiqing-feng · 2023-08-30T02:24:00Z

I did some scatter plots again (this time probas) and compared between the newly added 8bit merge and the previously added 4bit merge:

Now, the outputs are much closer for 8bit than for 4bit, contrary to what we saw earlier and in accordance with expectations. So for this example, the 8bit merge looks really good. It looks so good, in fact, that the tests locally pass for me even with atol=0.001 and rtol=0.3.

Thanks for your work. Do @younesbelkada and @pacman100 have any comments? Would like to hear your opinion.

jiqing-feng · 2023-08-31T01:28:59Z

Hi @HuggingFaceDocBuilderDev @pacman100 @younesbelkada .

I hope I can get your opinion about this PR. If nothing needs to be changed, could we merge this PR?

Thanks!

BenjaminBossan · 2023-08-31T08:58:16Z

@jiqing-feng Sorry for the delay. There is some further feedback we're waiting for regarding your PR, this should hopefully arrive by the end of this week or start of next week.

BenjaminBossan · 2023-09-06T08:02:55Z

@jiqing-feng Sorry for the delay. We think the changes are good and can be merged. Could you please fix the merge conflict? Thanks a lot for your patience.

jiqing-feng · 2023-09-06T08:14:30Z

@jiqing-feng Sorry for the delay. We think the changes are good and can be merged. Could you please fix the merge conflict? Thanks a lot for your patience.

Done.

BenjaminBossan

Thanks a lot for updating the PR. I gave this a final review and found two small issues (sorry for not noticing earlier), could you please take a look?

BenjaminBossan · 2023-09-06T09:17:50Z

src/peft/tuners/lora/bnb.py

+
+                if self.state.SCB is None:
+                    self.state.SCB = self.weight.SCB
+                # Dequantize the result of identify matrix and int8 weight because bitsandbytes only have this method.


Suggested change

# Dequantize the result of identify matrix and int8 weight because bitsandbytes only have this method.

# Dequantize the result of identity matrix and int8 weight because bitsandbytes does not support int8

# dequantization directly

BenjaminBossan · 2023-09-06T09:22:24Z

src/peft/tuners/lora/bnb.py

+                self.state.reset_grads()
+                self.merged = True
+
+        def unmerge(self):


Could you please extend the test, or add a separate one, to also test unmerging?

I think test_8bit_merge_and_disable_lora in test_common_gpu.py is for testing unmerge since model.disable_adapter() will call unmerge() in the forward.
Do I misunderstand it?

Yes, you are right, sorry I missed that.

np.

I have fixed the annotation and removed "default" in merge_and_unload() in the tests, would you please help me review these? Thx!

pacman100

Thank you @jiqing-feng for all the work on this, LGTM! 🚀

BenjaminBossan

Fantastic addition, big thanks.

jiqing-feng added 4 commits August 28, 2023 21:43

merge lora module to 8bit model

415fa4c

add warning for inference

231522e

add warning on merging lora to 8bit model

96dfa6a

remove useless warning

d2a41a5

BenjaminBossan mentioned this pull request Aug 29, 2023

Merge LoRA Adapter with int8 base model. #638

Closed

BenjaminBossan requested changes Aug 29, 2023

View reviewed changes

merge main

19ccf5f

pacman100 reviewed Aug 29, 2023

View reviewed changes

fix 8bit linear forward to support merge and disable adapter

c6d7095

rm trans x type

bd3b21a

jiqing-feng requested a review from BenjaminBossan August 30, 2023 02:25

jiqing-feng added 2 commits September 6, 2023 01:07

merge main

323c197

merge origin main

febe913

BenjaminBossan requested changes Sep 6, 2023

View reviewed changes

jiqing-feng added 2 commits September 6, 2023 04:39

change annotation

6c37c3a

fix annotation and tests

c80743c

pacman100 approved these changes Sep 7, 2023

View reviewed changes

BenjaminBossan approved these changes Sep 7, 2023

View reviewed changes

BenjaminBossan merged commit f5aae1b into huggingface:main Sep 7, 2023

LokeshJatangi mentioned this pull request Sep 26, 2023

Merging AdaLora module with 8bit base model and 4bit basemodel #959

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge lora module to 8bit model #875

Merge lora module to 8bit model #875

jiqing-feng commented Aug 29, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Aug 29, 2023

BenjaminBossan left a comment

pacman100 left a comment

jiqing-feng commented Aug 29, 2023

BenjaminBossan commented Aug 29, 2023

BenjaminBossan commented Aug 29, 2023

jiqing-feng commented Aug 30, 2023

jiqing-feng commented Aug 31, 2023

BenjaminBossan commented Aug 31, 2023

BenjaminBossan commented Sep 6, 2023

jiqing-feng commented Sep 6, 2023

BenjaminBossan left a comment

BenjaminBossan Sep 6, 2023

BenjaminBossan Sep 6, 2023

jiqing-feng Sep 6, 2023

BenjaminBossan Sep 7, 2023

jiqing-feng Sep 7, 2023

pacman100 left a comment

BenjaminBossan left a comment

	# Dequantize the result of identify matrix and int8 weight because bitsandbytes only have this method.
	# Dequantize the result of identity matrix and int8 weight because bitsandbytes does not support int8
	# dequantization directly

Merge lora module to 8bit model #875

Merge lora module to 8bit model #875

Conversation

jiqing-feng commented Aug 29, 2023 • edited Loading

HuggingFaceDocBuilderDev commented Aug 29, 2023

BenjaminBossan left a comment

Choose a reason for hiding this comment

pacman100 left a comment

Choose a reason for hiding this comment

jiqing-feng commented Aug 29, 2023

BenjaminBossan commented Aug 29, 2023

BenjaminBossan commented Aug 29, 2023

jiqing-feng commented Aug 30, 2023

jiqing-feng commented Aug 31, 2023

BenjaminBossan commented Aug 31, 2023

BenjaminBossan commented Sep 6, 2023

jiqing-feng commented Sep 6, 2023

BenjaminBossan left a comment

Choose a reason for hiding this comment

BenjaminBossan Sep 6, 2023

Choose a reason for hiding this comment

BenjaminBossan Sep 6, 2023

Choose a reason for hiding this comment

jiqing-feng Sep 6, 2023

Choose a reason for hiding this comment

BenjaminBossan Sep 7, 2023

Choose a reason for hiding this comment

jiqing-feng Sep 7, 2023

Choose a reason for hiding this comment

pacman100 left a comment

Choose a reason for hiding this comment

BenjaminBossan left a comment

Choose a reason for hiding this comment

jiqing-feng commented Aug 29, 2023 •

edited

Loading