Add support for SDPA to NLLB in Huggingface Transformers #478

ddaspit · 2024-08-09T17:20:13Z

NLLB currently supports FlashAttention in HF Transformers. Unfortunately, FlashAttention results in degradation of quality, because it does not properly support padding masks. SDPA provides an alternative route for applying attention optimizations. Under the hood, it supports FlashAttention and Memory Efficient Attention. Memory Efficient Attention should support masking. Here is the issue for adding SDPA support to models in Transformers. For a list of currently supported models, check out the Transformers documentation. A good example to follow would be BART, which has a full encoder-decoder architecture. It might also be useful to check out this PR that adds SDPA support to T5, another encoder-decoder LLM.

isaac091 · 2024-09-13T21:12:00Z

PR submitted to transformers library last week, waiting on review.

ddaspit · 2024-09-13T21:14:19Z

Here is the PR: huggingface/transformers#33309

isaac091 · 2024-09-30T17:18:15Z

Merged!

ddaspit · 2024-09-30T18:16:02Z

That is awesome. Good job.

isaac091 self-assigned this Aug 9, 2024

isaac091 mentioned this issue Sep 16, 2024

Upgrade transformers to the latest version #521

Open

isaac091 closed this as completed Sep 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for SDPA to NLLB in Huggingface Transformers #478

Add support for SDPA to NLLB in Huggingface Transformers #478

ddaspit commented Aug 9, 2024

isaac091 commented Sep 13, 2024

ddaspit commented Sep 13, 2024

isaac091 commented Sep 30, 2024

ddaspit commented Sep 30, 2024

Add support for SDPA to NLLB in Huggingface Transformers #478

Add support for SDPA to NLLB in Huggingface Transformers #478

Comments

ddaspit commented Aug 9, 2024

isaac091 commented Sep 13, 2024

ddaspit commented Sep 13, 2024

isaac091 commented Sep 30, 2024

ddaspit commented Sep 30, 2024