-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add position ids in forward pass to opt model #33121
Add position ids in forward pass to opt model #33121
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey! in general the trick is that adding a new argument needs to be done at the end of the forward pass otherwise you are breaking the model for people who directly call the model
@@ -46,7 +46,6 @@ | |||
_CONFIG_FOR_DOC = "BioGptConfig" | |||
|
|||
|
|||
# Copied from transformers.models.opt.modeling_opt.OPTLearnedPositionalEmbedding with OPT->BioGpt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's just add a TODO here or also update that model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should I return the comment as well? it causes a fail when using make fixup.
I tried to look to update BioGPT, but it seems to be a lot of code from different models, so i didn't know if i should have touched it. I can work on it next.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes !
@@ -71,17 +79,10 @@ def __init__(self, num_embeddings: int, embedding_dim: int): | |||
self.offset = 2 | |||
super().__init__(num_embeddings + self.offset, embedding_dim) | |||
|
|||
def forward(self, attention_mask: torch.LongTensor, past_key_values_length: int = 0): | |||
def forward(self, position_ids: torch.LongTensor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is kind of a breaking change for this module 😓
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this a problem? the weights are the same so loading should work. and this module should not be used by outside code so it is not supposed to break anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah but it has caused issue in the past 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, how do you think I should do it? The module should get position_ids to work with packed sentences. Should I add position_ids as the last argument with None as default?
I tried to keep the API as similar as possible to Thanks for the feedback! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @gante WDYT about this? In general IMO we should just run basic position ids init. Tho taking padding into account should be "alright", it's already done in generate, this would help for forward and training.
Just need to be careful as we also want to support packing
@@ -71,17 +79,10 @@ def __init__(self, num_embeddings: int, embedding_dim: int): | |||
self.offset = 2 | |||
super().__init__(num_embeddings + self.offset, embedding_dim) | |||
|
|||
def forward(self, attention_mask: torch.LongTensor, past_key_values_length: int = 0): | |||
def forward(self, position_ids: torch.LongTensor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah but it has caused issue in the past 😉
about the forward call for the embedding layer. I think it has to take position ids as an argument. otherwise it will not work with packed sentences. |
@ArthurZucker I am thinking that maybe the best solution for the embedding layer is to add position_ids as an arg to the forward pass with default None. this is probably backward compatible, but will still help with packed sentences. the problem is probably that the code will not be very nice |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ArthurZucker I'm pro position_ids
as it standardizes OPT wrt other models 🙌
@avishaiElmakies Thank you for adding the fix 🤗 Have a look at unresolved comments (you'd be surprised with how easy it is to break code for other external libraries, hyrum's law definitely applies to transformers
)
@gante, thanks! Happy to contribute. I would love some guidance on the last two comments left, what should I do with the position_ids in the embedding module. In my opion it should be able to get position_ids to work with packed sentences. Maybe a last argument with default None and a check? And about the one liners, would love some guidance on that |
@ArthurZucker would love some guidance here so i can finish and move on to other models. |
@ArthurZucker changed what you said and refactored the embedding class to be backward compatible. would love some feedback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to merge @gante if it's alright with you! 🤗 and thanks for your contribution!
@@ -46,7 +46,6 @@ | |||
_CONFIG_FOR_DOC = "BioGptConfig" | |||
|
|||
|
|||
# Copied from transformers.models.opt.modeling_opt.OPTLearnedPositionalEmbedding with OPT->BioGpt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes !
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
* start working on adding position ids * add docs * Refactor modeling_biogpt.py and modeling_opt.py for code consistency * fix 2 PR comments * move position_ids to end of args * remove trailing white space * add comment with TODO * bug fix gradient checkpointing * fixup * missed on position_ids * remove _attention_to_position_ids and refactor embedding class * remove redundent code --------- Co-authored-by: Avishai Elmakies <avishai.elma@cs.huji.ac.il>
* start working on adding position ids * add docs * Refactor modeling_biogpt.py and modeling_opt.py for code consistency * fix 2 PR comments * move position_ids to end of args * remove trailing white space * add comment with TODO * bug fix gradient checkpointing * fixup * missed on position_ids * remove _attention_to_position_ids and refactor embedding class * remove redundent code --------- Co-authored-by: Avishai Elmakies <avishai.elma@cs.huji.ac.il>
* start working on adding position ids * add docs * Refactor modeling_biogpt.py and modeling_opt.py for code consistency * fix 2 PR comments * move position_ids to end of args * remove trailing white space * add comment with TODO * bug fix gradient checkpointing * fixup * missed on position_ids * remove _attention_to_position_ids and refactor embedding class * remove redundent code --------- Co-authored-by: Avishai Elmakies <avishai.elma@cs.huji.ac.il>
What does this PR do?
This pull request adds position_ids to the forward of
OPT
in a similar fashion togemma
andllama
. #32937Some models didn't have an argument for position_ids in their forward pass.
There are two main reasons we would like for all LM models to get positions ids.
https://github.com/huggingface/transformers/blob/v4.44.1/src/transformers/modeling_flash_attention_utils.py#L270
This handles only OPT, so i can start small and get some feedback.
changes:
None
create position_ids based on attention (similar to original version, so it should work the same if position_ids are not given)a few notes:
make fixup
.feature-request #32937
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
@ArthurZucker would love some feedback.