improve(llama): Faster apply_rotary_pos_emb #22785

fpgaminer · 2023-04-15T18:25:39Z

What does this PR do?

Faster implementation for apply_rotary_pos_emb in modeling_llama.py.

Please see issue #22683 for code that verifies the correctness of the change.

NOTE: Not marking as fixing the above issue, as speed is still not as good as before.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@gante

HuggingFaceDocBuilderDev · 2023-04-15T18:42:01Z

The documentation is not available anymore as the PR was closed or merged.

gante

Thanks for improving the performance @fpgaminer 🙏

@amyeroberts for context, this comment shows that a) it gets exactly the same numerical output b) it is faster than the previous version

amyeroberts

🔥 🔥 🔥 - thanks for updating and for the time to validate and benchmark 🙏

neggert · 2023-07-07T15:25:37Z

Should a similar patch be applied to GPT-NeoX?

amyeroberts · 2023-07-07T15:35:57Z

@neggert I believe it can be added to GPT-NeoX too - very happy to review a PR if you'd like to add!

improve(llama): Faster apply_rotary_pos_emb

2af4e6d

gante approved these changes Apr 17, 2023

View reviewed changes

gante requested a review from amyeroberts April 17, 2023 13:39

amyeroberts approved these changes Apr 17, 2023

View reviewed changes

amyeroberts merged commit 626c1b8 into huggingface:main Apr 17, 2023

sam-h-bean mentioned this pull request Apr 17, 2023

New Crash Using Llama #22807

Closed

4 tasks

DyeKuu mentioned this pull request Apr 17, 2023

Fix squeeze into torch 1.x compatible form in llama model #22808

Merged

5 tasks

novice03 pushed a commit to novice03/transformers that referenced this pull request Jun 23, 2023

improve(llama): Faster apply_rotary_pos_emb (huggingface#22785)

825162e

ArthurZucker mentioned this pull request Jul 3, 2023

Generate: support for left-padding on GPTNeoX and Llama #22382

Merged

ArthurZucker mentioned this pull request Aug 29, 2023

[GPTNeoX] Faster rotary embedding for GPTNeoX (based on llama changes) #25830

Merged

fxmarty mentioned this pull request Sep 16, 2023

Remove unnecessary unsqueeze - squeeze in rotary positional embedding #26162

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve(llama): Faster apply_rotary_pos_emb #22785

improve(llama): Faster apply_rotary_pos_emb #22785

fpgaminer commented Apr 15, 2023

HuggingFaceDocBuilderDev commented Apr 15, 2023 •

edited

Loading

gante left a comment •

edited

Loading

amyeroberts left a comment

neggert commented Jul 7, 2023

amyeroberts commented Jul 7, 2023

improve(llama): Faster apply_rotary_pos_emb #22785

improve(llama): Faster apply_rotary_pos_emb #22785

Conversation

fpgaminer commented Apr 15, 2023

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Apr 15, 2023 • edited Loading

gante left a comment • edited Loading

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

neggert commented Jul 7, 2023

amyeroberts commented Jul 7, 2023

HuggingFaceDocBuilderDev commented Apr 15, 2023 •

edited

Loading

gante left a comment •

edited

Loading