Add sliding window attention to sdpa in mistral #28980

ehuaa · 2024-02-12T17:03:41Z

Feature request

https://github.com/huggingface/transformers/blob/main/src/transformers/models/mistral/modeling_mistral.py#L1006-L1023

In the code listed above, the latest version of transformers cannot use sliding window feature in mistral model.
I doubt that the reason is you mentioned above,
https://github.com/huggingface/transformers/blob/main/src/transformers/models/mistral/modeling_mistral.py#L687-L688

And this issue in PyTorch makes you bugged with custom attn_mask like sliding window attention mask.
pytorch/pytorch#112577

While this issue has been fixed since torch 2.2.0, and it has been released two weeks ago, can you add this feature back to sdpa kernel in mistral?

Motivation

I cannot use sliding window with sdpa right now, cause my gpu card is V100, i cannot work with flashattention2.

Your contribution

I think we can pass sliding_window param to _prepare_4d_causal_attention_mask_for_sdpa function.

amyeroberts · 2024-02-12T17:42:22Z

cc @fxmarty

fxmarty · 2024-02-19T09:12:52Z

Hi, thank you for the suggestion, SDPA support for mistral was added by @ArthurZucker in #28133, maybe he has more insight.

ArthurZucker · 2024-02-20T04:01:52Z

I think it comes down to just adding sliding_window to the call for _prepare_4d_causal_attention_mask_for_sdpa yes. Would you like to open a PR?

ehuaa · 2024-02-21T02:06:54Z

I think it comes down to just adding sliding_window to the call for _prepare_4d_causal_attention_mask_for_sdpa yes. Would you like to open a PR?

Sure，and i'll open a PR later in this week

cyr0930 · 2024-03-21T08:52:40Z

any plan for pr?

ArthurZucker · 2024-03-21T08:55:31Z

#29407 should fix this issue

cyr0930 · 2024-03-22T00:12:44Z

@ArthurZucker Oh you are right. Thanks.

fxmarty · 2024-04-17T09:21:45Z

Fixed in #30127

ArthurZucker added the Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want! label Feb 20, 2024

ehuaa mentioned this issue Feb 22, 2024

[Mistral&Mixtral]Add sliding window param to sdpa after torch 2.2.0 #29220

Closed

2 tasks

cyr0930 mentioned this issue Apr 8, 2024

[Mistral&Mixtral]Add sliding window for sdpa #29407

Closed

fxmarty mentioned this issue Apr 8, 2024

Fix SDPA sliding window compatibility #30127

Merged

fxmarty closed this as completed in #30127 Apr 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sliding window attention to sdpa in mistral #28980

Add sliding window attention to sdpa in mistral #28980

ehuaa commented Feb 12, 2024 •

edited

Loading

amyeroberts commented Feb 12, 2024

fxmarty commented Feb 19, 2024 •

edited

Loading

ArthurZucker commented Feb 20, 2024

ehuaa commented Feb 21, 2024

cyr0930 commented Mar 21, 2024

ArthurZucker commented Mar 21, 2024

cyr0930 commented Mar 22, 2024

fxmarty commented Apr 17, 2024

Add sliding window attention to sdpa in mistral #28980

Add sliding window attention to sdpa in mistral #28980

Comments

ehuaa commented Feb 12, 2024 • edited Loading

Feature request

Motivation

Your contribution

amyeroberts commented Feb 12, 2024

fxmarty commented Feb 19, 2024 • edited Loading

ArthurZucker commented Feb 20, 2024

ehuaa commented Feb 21, 2024

cyr0930 commented Mar 21, 2024

ArthurZucker commented Mar 21, 2024

cyr0930 commented Mar 22, 2024

fxmarty commented Apr 17, 2024

ehuaa commented Feb 12, 2024 •

edited

Loading

fxmarty commented Feb 19, 2024 •

edited

Loading