Add Support for Mistral Model in Llama-Adapter Method #1433

PrakharSaxena24 · 2024-02-05T07:56:08Z

Hello PEFT team,
Purpose of This PR:
Add support for Mistral model for llama-adapter method.

Background:
I wanted to test how does the method in this paper works with Mistral based models. compared to Lora. Initially I though that since the architecture of Llama and Mistral are almost the same, this could be achieved by just changing the config, however I found out that the mistral models k_proj and v_proj dimensions are different from that of Llama.
Hence I added the model support for Mistral in the llama-adapter method (the naming is confusing).
I hope it will be useful for anyone else willing to experiment with different methods.

Request for Review:

Please provide review and let me know if my implementation makes sense.

Thank you for all your hardwork!

BenjaminBossan

Thanks for adding support for Mistral. This looks very promising. I only have a couple of comments, please check them out. Also, would it be possible to add a unit test to tests/test_adaption_prompt.py involving a small mistral model?

src/peft/tuners/adaption_prompt/layer.py

pacman100

Hello @PrakharSaxena24, thank you for the PR but the logic seems incorrect as mentioned in the comments. Please look at the code of Mistral modeling file on the correct way of going about GQA (grouped query attention).

pacman100 · 2024-02-07T07:20:26Z

src/peft/tuners/adaption_prompt/layer.py

        adapter_k = (
-            key.view(1, self.adapter_len, self.model.num_heads, self.model.head_dim)
+            key.view(1, self.adapter_len, self.model.num_heads, (self.model.head_dim // factor))


The head dim shouldn't change but the number of heads should be reduced in GQA.

I see! Thanks a lot, this seems correct.
Will edit this.

Also I think I will need to do the same in utils.py

src/peft/tuners/adaption_prompt/layer.py

pacman100 · 2024-02-07T07:48:36Z

src/peft/tuners/adaption_prompt/layer.py

@@ -100,6 +104,15 @@ def forward(self, **kwargs):
        query_states = compute_query_states(model=self.model, **kwargs)

        previous_dtype = query_states.dtype
+
+        # Reshape and average the extra tensors


No need to reshape and avg query states as the above key shape is (bsz, adapter_seq_len, num_kv_heads, head_dim), the value shape is (bsz, adapter_seq_len, num_kv_heads, head_dim) and query shape is (bsz, adapter_seq_len, num_heads, head_dim). Now, you would need to repeat the num_kv_heads to match num_heads as done in https://github.com/huggingface/transformers/blob/1c31b7aa3bb4e7ef24c77596d2a76f45a770159f/src/transformers/models/mistral/modeling_mistral.py#L193. After that the attn computation is same as normal MHA case.

Thanks a lot, so rather than repeating the adapter output, I should repeat adapter_k and adapter_v.
adapter_k = torch.repeat_interleave( adapter_k, repeats=factor, dim=1 )
adapter_v = torch.repeat_interleave( adapter_v, repeats=factor, dim=1 )
as the key, value shape is (bsz, num_kv_heads, adapter_seq_len, head_dim), (dim 1 for num_kv_heads)
Does this makes sense?

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

PrakharSaxena24 · 2024-02-08T13:35:46Z

@BenjaminBossan @pacman100 thank you for you kind comments.
, I have corrected the logic and added test.
I did not add test for test_bf16_inference as I could not find any mistral model in here.
Please have a look.
Thanks for your hard work!

BenjaminBossan

Thanks a lot. From my side, there are only a few comments left, please take a look.

tests/test_adaption_prompt.py

BenjaminBossan · 2024-02-09T13:12:49Z

tests/test_adaption_prompt.py

@@ -78,7 +106,19 @@ def test_attributes(self) -> None:
        self.assertTrue(hasattr(model, "from_pretrained"))
        self.assertTrue(hasattr(model, "push_to_hub"))

+        #Test Mistral
+        if self.mistral_available:


Instead of attaching the mistral tests to the llama tests, could you please create a separate test for each? You can decorate them with @unittest.skipIf(not is_mistral_available()) to avoid the if self.mistral_available: line.

Thank you for the advice! I will do it. Good feedback is very helpful for me to learn and apply the best practices :)

src/peft/tuners/adaption_prompt/config.py

HuggingFaceDocBuilderDev · 2024-02-12T10:34:35Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

BenjaminBossan

Thanks a lot for this PR, it looks good now. I also appreciate extending the existing tests.

Let's wait for @pacman100's final review before merging.

PrakharSaxena24 · 2024-02-16T14:33:02Z

@pacman100 please let me know it there is something to change in the PR.

BenjaminBossan · 2024-02-16T17:36:17Z

@PrakharSaxena24 Heads up, there is now a merge conflict, which stems from a recent PR where we switched from unittest-style self.assertFoo(...) to pytest-style plain asserts. Could you please fix the conflict?

PrakharSaxena24 · 2024-02-19T15:15:06Z

@BenjaminBossan Changed the assert to pytest style. Moreover there was a conflict in utils.py which was also resolved. Please have a look!

Edited: Seems like something is breaking in transformers (ci), will have a look at it tommorow, however if you have any idea why that will be very helpful!

BenjaminBossan

Thanks a lot, the PR LGTM. Let's wait for a final review by @pacman100 before merging.

PrakharSaxena24 · 2024-02-25T08:27:05Z

@BenjaminBossan
Current pr and this PR do very similar things, current PR adds Mistral and also Llama2 34b and 70b (GQA).
The PR above adds GQA(Llama2 34b and 70b).
I think that the there might be conflict if both are merged to the main.
How do you think we should proceed?

BenjaminBossan · 2024-02-26T15:17:41Z

I think that the there might be conflict if both are merged to the main.
How do you think we should proceed?

Yes, there will be conflicts, so whoever comes last will have to resolve them :)

I don't think it's a huge issue. Since both PRs have tests, we should hopefully have the guard rails to ensure that resolving the merge conflict won't lead to a regression in the other PR.

PrakharSaxena24 · 2024-02-27T05:35:14Z

I don't think it's a huge issue. Since both PRs have tests, we should hopefully have the guard rails to ensure that resolving the merge conflict won't lead to a regression in the other PR.

Thank you for the reply!

pacman100

Thank you @PrakharSaxena24 for supporting Mistral with Adaptation Prompt and the detailed tests! ✨

PrakharSaxena24 · 2024-03-12T08:01:04Z

@BenjaminBossan @pacman100
Thank you for your time and guidance.

* Support Mistral For llama-adapter * Update src/peft/tuners/adaption_prompt/layer.py Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com> * Update src/peft/tuners/adaption_prompt/layer.py Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com> * corrected logic and added test * removed commented out code * Added seperate test functions for mistral * missed self.assert * ruff formatting --------- Co-authored-by: Prakhar Saxena <prakharsxena11111@gmail.com> Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

Support Mistral For llama-adapter

2c71bd0

BenjaminBossan reviewed Feb 6, 2024

View reviewed changes

src/peft/tuners/adaption_prompt/layer.py Show resolved Hide resolved

src/peft/tuners/adaption_prompt/layer.py Outdated Show resolved Hide resolved

src/peft/tuners/adaption_prompt/layer.py Outdated Show resolved Hide resolved

pacman100 reviewed Feb 7, 2024

View reviewed changes

PrakharSaxena24 and others added 3 commits February 7, 2024 18:04

Update src/peft/tuners/adaption_prompt/layer.py

9ebe295

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

Update src/peft/tuners/adaption_prompt/layer.py

c42ef25

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

corrected logic and added test

139e315

removed commented out code

e401160

PrakharSaxena24 requested review from pacman100 and BenjaminBossan February 9, 2024 06:18

BenjaminBossan requested changes Feb 9, 2024

View reviewed changes

Added seperate test functions for mistral

6a5d729

PrakharSaxena24 requested a review from BenjaminBossan February 12, 2024 10:58

BenjaminBossan approved these changes Feb 12, 2024

View reviewed changes

Resolved merge conflicts by changing to pytest-style asserts

2920ded

PrakharSaxena24 requested a review from BenjaminBossan February 19, 2024 15:15

PrakharSaxena24 added 2 commits February 20, 2024 00:31

missed self.assert

5360745

ruff formatting

a97e7ea

BenjaminBossan approved these changes Feb 20, 2024

View reviewed changes

pacman100 approved these changes Mar 12, 2024

View reviewed changes

pacman100 merged commit d28fffb into huggingface:main Mar 12, 2024
14 checks passed

This was referenced Jul 25, 2024

[NEUTRAL] Update dependency peft to v0.12.0 - autoclosed jgeraigery/Dromedary#2

Closed

Update dependency peft to v0.12.0 - autoclosed jgeraigery/Dromedary#3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Support for Mistral Model in Llama-Adapter Method #1433

Add Support for Mistral Model in Llama-Adapter Method #1433

PrakharSaxena24 commented Feb 5, 2024

BenjaminBossan left a comment

pacman100 left a comment

pacman100 Feb 7, 2024

PrakharSaxena24 Feb 7, 2024

PrakharSaxena24 Feb 7, 2024

pacman100 Feb 7, 2024

PrakharSaxena24 Feb 7, 2024 •

edited

Loading

PrakharSaxena24 commented Feb 8, 2024

BenjaminBossan left a comment

BenjaminBossan Feb 9, 2024

PrakharSaxena24 Feb 9, 2024

HuggingFaceDocBuilderDev commented Feb 12, 2024

BenjaminBossan left a comment

PrakharSaxena24 commented Feb 16, 2024

BenjaminBossan commented Feb 16, 2024

PrakharSaxena24 commented Feb 19, 2024 •

edited

Loading

BenjaminBossan left a comment

PrakharSaxena24 commented Feb 25, 2024

BenjaminBossan commented Feb 26, 2024

PrakharSaxena24 commented Feb 27, 2024

pacman100 left a comment

PrakharSaxena24 commented Mar 12, 2024 •

edited

Loading

Add Support for Mistral Model in Llama-Adapter Method #1433

Add Support for Mistral Model in Llama-Adapter Method #1433

Conversation

PrakharSaxena24 commented Feb 5, 2024

BenjaminBossan left a comment

Choose a reason for hiding this comment

pacman100 left a comment

Choose a reason for hiding this comment

pacman100 Feb 7, 2024

Choose a reason for hiding this comment

PrakharSaxena24 Feb 7, 2024

Choose a reason for hiding this comment

PrakharSaxena24 Feb 7, 2024

Choose a reason for hiding this comment

pacman100 Feb 7, 2024

Choose a reason for hiding this comment

PrakharSaxena24 Feb 7, 2024 • edited Loading

Choose a reason for hiding this comment

PrakharSaxena24 commented Feb 8, 2024

BenjaminBossan left a comment

Choose a reason for hiding this comment

BenjaminBossan Feb 9, 2024

Choose a reason for hiding this comment

PrakharSaxena24 Feb 9, 2024

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Feb 12, 2024

BenjaminBossan left a comment

Choose a reason for hiding this comment

PrakharSaxena24 commented Feb 16, 2024

BenjaminBossan commented Feb 16, 2024

PrakharSaxena24 commented Feb 19, 2024 • edited Loading

BenjaminBossan left a comment

Choose a reason for hiding this comment

PrakharSaxena24 commented Feb 25, 2024

BenjaminBossan commented Feb 26, 2024

PrakharSaxena24 commented Feb 27, 2024

pacman100 left a comment

Choose a reason for hiding this comment

PrakharSaxena24 commented Mar 12, 2024 • edited Loading

PrakharSaxena24 Feb 7, 2024 •

edited

Loading

PrakharSaxena24 commented Feb 19, 2024 •

edited

Loading

PrakharSaxena24 commented Mar 12, 2024 •

edited

Loading