Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for SDPA to NLLB in Huggingface Transformers #478

Closed
ddaspit opened this issue Aug 9, 2024 · 4 comments
Closed

Add support for SDPA to NLLB in Huggingface Transformers #478

ddaspit opened this issue Aug 9, 2024 · 4 comments
Assignees
Labels
optimization Model training/inferencing optimization

Comments

@ddaspit
Copy link
Collaborator

ddaspit commented Aug 9, 2024

NLLB currently supports FlashAttention in HF Transformers. Unfortunately, FlashAttention results in degradation of quality, because it does not properly support padding masks. SDPA provides an alternative route for applying attention optimizations. Under the hood, it supports FlashAttention and Memory Efficient Attention. Memory Efficient Attention should support masking. Here is the issue for adding SDPA support to models in Transformers. For a list of currently supported models, check out the Transformers documentation. A good example to follow would be BART, which has a full encoder-decoder architecture. It might also be useful to check out this PR that adds SDPA support to T5, another encoder-decoder LLM.

@ddaspit ddaspit added enhancement New feature or request pipeline 6: infer Issue related to using a trained model to translate. pipeline 4: train Issue related to training a model. optimization Model training/inferencing optimization and removed enhancement New feature or request pipeline 6: infer Issue related to using a trained model to translate. pipeline 4: train Issue related to training a model. labels Aug 9, 2024
@isaac091 isaac091 self-assigned this Aug 9, 2024
@isaac091
Copy link
Collaborator

PR submitted to transformers library last week, waiting on review.

@ddaspit
Copy link
Collaborator Author

ddaspit commented Sep 13, 2024

Here is the PR: huggingface/transformers#33309

@isaac091
Copy link
Collaborator

Merged!

@ddaspit
Copy link
Collaborator Author

ddaspit commented Sep 30, 2024

That is awesome. Good job.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimization Model training/inferencing optimization
Projects
Status: Done
Development

No branches or pull requests

2 participants