-
Notifications
You must be signed in to change notification settings - Fork 27k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added unsqueeze_dim to apply_rotary_pos_emb #27117
added unsqueeze_dim to apply_rotary_pos_emb #27117
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for opening the PR @ShashankMosaicML! And nice touch, the docstring :)
CI is failing because of the # Copied from ...
statement, which is used by our internal tools and CI to maintain one file per model policy while enabling the propagation of bugfixes and upgrades.
In this particular case, there is also a bug on our end -- recently we've made Llama the base implementation, from which others are copied from. As such, you'll have to:
- Remove the
# Copied from ...
statement in L194 - Run make
fix-copies
in your terminal - Push the changes
You may also need to run make fixup
afterwards, which applies automatic code formatting.
Thank you @gante for reviewing and providing me the steps to fix the errors! However I still see some errors after following the steps, and on clicking the details for the error, I see the following. Could you please let me know how to fix this error? Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!
Change to the function LGTM - there's just some addition changes in this diff to docstrings which need to be removed before merging.
Accepting the proposed changes in formatting. Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Hi @amyeroberts , thanks for suggesting the changes. I have incorporated those, but some quality checks are still failing. Could you take a look? |
@ShashankMosaicML running |
@gante , I think that worked! (I wasn't running |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating!
* added unsqueeze_dim to apply_rotary_pos_emb * Added docstring * Modified docstring * Modified docstring * Modified docstring * Modified docstring * Modified docstring * ran make fix-copies and make fixup * Update src/transformers/models/llama/modeling_llama.py Accepting the proposed changes in formatting. Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * incorporating PR suggestions * incorporating PR suggestions * incorporating PR suggestions * incorporating PR suggestions * .. --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Making the unsqueeze dimension parameterized in the apply_rotary_pos_emb function in modeling_llama.py
This PR introduces a new parameter to the apply_rotary_pos_emb function which allows the specification of the dimension along which to unsqueeze to make the cosine and sine rotary tensors broadcastable to the query and key tensors. This will make the function compatible with codebases that have different shapes for the query and key tensors without needing any back-and-forth transposing.
Fixes #26948
Before submitting
to it if that's the case. Link to the issue
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@gante , @ArthurZucker