Add position ids in forward pass to opt model #33121

avishaiElmakies · 2024-08-26T12:30:46Z

What does this PR do?

This pull request adds position_ids to the forward of OPT in a similar fashion to gemma and llama. #32937

Some models didn't have an argument for position_ids in their forward pass.

There are two main reasons we would like for all LM models to get positions ids.

to have the API be consistent with all models.
position_ids are very important if you want to use flash-attention without padding, during training. if i want to be able to pack two or more sentences in the same sequence. I would like to know that the model handles the sentences accordingly and treats each sentence as it's own different sentence. flash-attention code uses position_ids to check if some sequences are packed and runs an appropriate function to make sure there is no cross-example contamination. but without this, the model can't use this feature. the code always checks if position_ids is not None:

https://github.com/huggingface/transformers/blob/v4.44.1/src/transformers/modeling_flash_attention_utils.py#L270

This handles only OPT, so i can start small and get some feedback.

changes:

changed OPTLearnedPositionalEmbedding forward method to get position_ids instead of attention_mask.
changed all forward passes in the file to get position_ids and pass them forward.
update prepare_inputs_for_generation to get position_ids.
if position_ids are None create position_ids based on attention (similar to original version, so it should work the same if position_ids are not given)
update relevent docs

a few notes:

this model because of the use of offset in OPTLearnedPositionalEmbedding needs to represent it's padding token position as -1. I think this is fine to keep compatibility with the weights and everything.
BioGPT seems to copy OPT's positional embedding class. I needed to remove the comment to disable that (since it no longer copies). this is needed for make fixup.

feature-request #32937

Before submitting

[] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
[] Did you write any new necessary tests?

@ArthurZucker would love some feedback.

ArthurZucker

Hey! in general the trick is that adding a new argument needs to be done at the end of the forward pass otherwise you are breaking the model for people who directly call the model

src/transformers/models/opt/modeling_opt.py

ArthurZucker · 2024-08-27T13:33:25Z

src/transformers/models/biogpt/modeling_biogpt.py

@@ -46,7 +46,6 @@
 _CONFIG_FOR_DOC = "BioGptConfig"


-# Copied from transformers.models.opt.modeling_opt.OPTLearnedPositionalEmbedding with OPT->BioGpt


let's just add a TODO here or also update that model

should I return the comment as well? it causes a fail when using make fixup.
I tried to look to update BioGPT, but it seems to be a lot of code from different models, so i didn't know if i should have touched it. I can work on it next.

ArthurZucker · 2024-08-27T13:33:49Z

src/transformers/models/opt/modeling_opt.py

@@ -71,17 +79,10 @@ def __init__(self, num_embeddings: int, embedding_dim: int):
        self.offset = 2
        super().__init__(num_embeddings + self.offset, embedding_dim)

-    def forward(self, attention_mask: torch.LongTensor, past_key_values_length: int = 0):
+    def forward(self, position_ids: torch.LongTensor):


that is kind of a breaking change for this module 😓

why is this a problem? the weights are the same so loading should work. and this module should not be used by outside code so it is not supposed to break anything.

Yeah but it has caused issue in the past 😉

OK, how do you think I should do it? The module should get position_ids to work with packed sentences. Should I add position_ids as the last argument with None as default?

src/transformers/models/opt/modeling_opt.py

avishaiElmakies · 2024-08-27T13:55:36Z

I tried to keep the API as similar as possible to llama and gamma. they both put it third. isn't it kinda of a problem with AutoModel to have them at different placements? it means that you can't load using AutoModel and use the model without keyword arguments?

Thanks for the feedback!

ArthurZucker

cc @gante WDYT about this? In general IMO we should just run basic position ids init. Tho taking padding into account should be "alright", it's already done in generate, this would help for forward and training.

Just need to be careful as we also want to support packing

src/transformers/models/biogpt/modeling_biogpt.py

src/transformers/models/opt/modeling_opt.py

ArthurZucker · 2024-08-29T09:19:28Z

src/transformers/models/opt/modeling_opt.py

@@ -71,17 +79,10 @@ def __init__(self, num_embeddings: int, embedding_dim: int):
        self.offset = 2
        super().__init__(num_embeddings + self.offset, embedding_dim)

-    def forward(self, attention_mask: torch.LongTensor, past_key_values_length: int = 0):
+    def forward(self, position_ids: torch.LongTensor):


Yeah but it has caused issue in the past 😉

src/transformers/models/opt/modeling_opt.py

avishaiElmakies · 2024-08-29T11:42:14Z

about the forward call for the embedding layer. I think it has to take position ids as an argument. otherwise it will not work with packed sentences.

avishaiElmakies · 2024-09-02T15:36:15Z

@ArthurZucker I am thinking that maybe the best solution for the embedding layer is to add position_ids as an arg to the forward pass with default None. this is probably backward compatible, but will still help with packed sentences. the problem is probably that the code will not be very nice

gante

@ArthurZucker I'm pro position_ids as it standardizes OPT wrt other models 🙌

@avishaiElmakies Thank you for adding the fix 🤗 Have a look at unresolved comments (you'd be surprised with how easy it is to break code for other external libraries, hyrum's law definitely applies to transformers)

avishaiElmakies · 2024-09-05T15:20:08Z

@gante, thanks! Happy to contribute.

I would love some guidance on the last two comments left, what should I do with the position_ids in the embedding module. In my opion it should be able to get position_ids to work with packed sentences. Maybe a last argument with default None and a check?

And about the one liners, would love some guidance on that

avishaiElmakies · 2024-09-16T10:45:19Z

@ArthurZucker would love some guidance here so i can finish and move on to other models.

avishaiElmakies · 2024-10-04T09:57:12Z

@ArthurZucker changed what you said and refactored the embedding class to be backward compatible. would love some feedback

ArthurZucker

Feel free to merge @gante if it's alright with you! 🤗 and thanks for your contribution!

ArthurZucker · 2024-10-04T10:06:01Z

src/transformers/models/biogpt/modeling_biogpt.py

@@ -46,7 +46,6 @@
 _CONFIG_FOR_DOC = "BioGptConfig"


-# Copied from transformers.models.opt.modeling_opt.OPTLearnedPositionalEmbedding with OPT->BioGpt


src/transformers/models/opt/modeling_opt.py

HuggingFaceDocBuilderDev · 2024-10-07T07:44:24Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

* start working on adding position ids * add docs * Refactor modeling_biogpt.py and modeling_opt.py for code consistency * fix 2 PR comments * move position_ids to end of args * remove trailing white space * add comment with TODO * bug fix gradient checkpointing * fixup * missed on position_ids * remove _attention_to_position_ids and refactor embedding class * remove redundent code --------- Co-authored-by: Avishai Elmakies <avishai.elma@cs.huji.ac.il>

Avishai Elmakies added 3 commits August 26, 2024 14:57

start working on adding position ids

ecddd51

add docs

0d52209

Refactor modeling_biogpt.py and modeling_opt.py for code consistency

b0ff1b5

ArthurZucker reviewed Aug 27, 2024

View reviewed changes

fix 2 PR comments

a19baed

ArthurZucker reviewed Aug 29, 2024

View reviewed changes

Avishai Elmakies added 6 commits August 29, 2024 15:13

move position_ids to end of args

3ae719a

remove trailing white space

c5facaf

add comment with TODO

48299c6

bug fix gradient checkpointing

e27a913

fixup

a9a9d9e

missed on position_ids

9ca9a9f

gante reviewed Sep 5, 2024

View reviewed changes

avishaiElmakies mentioned this pull request Sep 22, 2024

add sdpa to OPT #33298

Merged

5 tasks

Avishai Elmakies added 2 commits October 4, 2024 12:20

Merge branch 'main' into add_position_ids_to_opt

42e27f6

remove _attention_to_position_ids and refactor embedding class

466df4b

avishaiElmakies requested a review from ArthurZucker October 4, 2024 09:57

ArthurZucker approved these changes Oct 4, 2024

View reviewed changes

remove redundent code

376b184

ArthurZucker merged commit 4953ddf into huggingface:main Oct 7, 2024
16 of 18 checks passed

avishaiElmakies deleted the add_position_ids_to_opt branch October 7, 2024 07:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add position ids in forward pass to opt model #33121

Add position ids in forward pass to opt model #33121

avishaiElmakies commented Aug 26, 2024 •

edited

Loading

ArthurZucker left a comment

ArthurZucker Aug 27, 2024

avishaiElmakies Aug 27, 2024

ArthurZucker Oct 4, 2024

ArthurZucker Aug 27, 2024

avishaiElmakies Aug 27, 2024

ArthurZucker Aug 29, 2024

avishaiElmakies Sep 5, 2024

avishaiElmakies commented Aug 27, 2024 •

edited

Loading

ArthurZucker left a comment

ArthurZucker Aug 29, 2024

avishaiElmakies commented Aug 29, 2024

avishaiElmakies commented Sep 2, 2024

gante left a comment

avishaiElmakies commented Sep 5, 2024

avishaiElmakies commented Sep 16, 2024

avishaiElmakies commented Oct 4, 2024

ArthurZucker left a comment

ArthurZucker Oct 4, 2024

HuggingFaceDocBuilderDev commented Oct 7, 2024

		@@ -46,7 +46,6 @@
		_CONFIG_FOR_DOC = "BioGptConfig"


		# Copied from transformers.models.opt.modeling_opt.OPTLearnedPositionalEmbedding with OPT->BioGpt

Add position ids in forward pass to opt model #33121

Add position ids in forward pass to opt model #33121

Conversation

avishaiElmakies commented Aug 26, 2024 • edited Loading

What does this PR do?

Before submitting

ArthurZucker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

avishaiElmakies commented Aug 27, 2024 • edited Loading

ArthurZucker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

avishaiElmakies commented Aug 29, 2024

avishaiElmakies commented Sep 2, 2024

gante left a comment

Choose a reason for hiding this comment

avishaiElmakies commented Sep 5, 2024

avishaiElmakies commented Sep 16, 2024

avishaiElmakies commented Oct 4, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Oct 7, 2024

avishaiElmakies commented Aug 26, 2024 •

edited

Loading

avishaiElmakies commented Aug 27, 2024 •

edited

Loading