Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FA2 & SDPA support for RoBERTa & XLM-RoBERTa #30450

Closed

Conversation

tomaarsen
Copy link
Member

Hello!

Pull Request overview

  • Add FA2 & SDPA support for RoBERTa
  • Add FA2 & SDPA support for XLM-RoBERTa

Details

The world of embedding models still very much relies on bert, roberta, xlm_roberta, mpnet, etc., but these model architectures have not yet received the benefits of FA2/SDPA. I'd like to make a start with that today.
I recognize that these models are tricky to change, as BERT especially is tangled in a big web of "Copied from" connections. However, I suspect that I've implemented FA2/SDPA such that it could be extended for a lot of architectures. However, I'd like to get reviews on the current implementation before I potentially expand to new architectures.

Most of the code is based on the Llama2 FA2/SDPA, so it should be fairly familiar. I want to note some limitations:

  • output_attentions does not work for FA2/SDPA - this is fairly standard.
  • head_mask does not work for FA2/SDPA.
  • position_embedding_type with anything other than "absolute" (i.e., the default) does not work for FA2/SDPA.

Additionally, I have yet to write tests & I haven't tested all ways to use these models. Instead, I've only experimented with Sentence Transformers.
For a small RoBERTa-based model (https://huggingface.co/sentence-transformers/all-distilroberta-v1, 82M params), I get about a 10% speedup at one sample and a ~25% speedup at a large batch size with FA2 or SDPA. For a large XLM-RoBERTa-based model (https://huggingface.co/BAAI/bge-m3, 8192 sequence length), the speedup is up to 3x with FA2. Because newer embedding models are using larger sequence lengths, FA2/SDPA will become more important for them.

Before submitting

Who can review?

@ArthurZucker @younesbelkada

If I have a bit of a go-ahead, I can move forward with other architectures. Let me know if you'd like me to work on tests first, though. I'm also aware that the "copies" tests will currently fail due to these changes.

  • Tom Aarsen

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@tomaarsen tomaarsen marked this pull request as ready for review April 25, 2024 08:55
@huseinzol05
Copy link
Contributor

thanks for this PR, use sdpa saved a ton of memory for GPU poor like me, super grateful

@ArthurZucker
Copy link
Collaborator

@tomaarsen do you need a review on this one?

@tomaarsen
Copy link
Member Author

@ArthurZucker Would be nice, though there's some conflicts now. I'll be off next week, so I'll be able to take care of the conflicts & any comments starting the 17th again.
Ideally, in the long term I'd like to get FA2/SDPA support for all common encoder architectures (notably BERT, RoBERTa, and their multilingual variants) as this is important for the efficiency of embedding models.

  • Tom Aarsen

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty clean already, thanks a lot @tomaarsen ! Can you make sure to propagate the changes into the encoders that copy from Roberta by running make fix-copies. You would also need to update this file: https://github.com/huggingface/transformers/blob/main/docs/source/en/perf_infer_gpu_one.md to mention Roberta and all other models that now support FA2 & SDPA.
You also need to fix the merge conflicts that should be easy to fix ! 🙏

src/transformers/models/roberta/modeling_roberta.py Outdated Show resolved Hide resolved
tomaarsen and others added 3 commits June 7, 2024 14:19
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
@nbroad1881
Copy link
Contributor

There is another similar PR by the way: #30510

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot closed this Jul 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants