Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sliding window attention to sdpa in mistral #28980

Closed
ehuaa opened this issue Feb 12, 2024 · 8 comments · Fixed by #30127
Closed

Add sliding window attention to sdpa in mistral #28980

ehuaa opened this issue Feb 12, 2024 · 8 comments · Fixed by #30127
Labels
Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want!

Comments

@ehuaa
Copy link
Contributor

ehuaa commented Feb 12, 2024

Feature request

https://github.com/huggingface/transformers/blob/main/src/transformers/models/mistral/modeling_mistral.py#L1006-L1023
image

In the code listed above, the latest version of transformers cannot use sliding window feature in mistral model.
I doubt that the reason is you mentioned above,
https://github.com/huggingface/transformers/blob/main/src/transformers/models/mistral/modeling_mistral.py#L687-L688
image
And this issue in PyTorch makes you bugged with custom attn_mask like sliding window attention mask.
pytorch/pytorch#112577

While this issue has been fixed since torch 2.2.0, and it has been released two weeks ago, can you add this feature back to sdpa kernel in mistral?

Motivation

I cannot use sliding window with sdpa right now, cause my gpu card is V100, i cannot work with flashattention2.

Your contribution

I think we can pass sliding_window param to _prepare_4d_causal_attention_mask_for_sdpa function.

@amyeroberts
Copy link
Collaborator

cc @fxmarty

@fxmarty
Copy link
Contributor

fxmarty commented Feb 19, 2024

Hi, thank you for the suggestion, SDPA support for mistral was added by @ArthurZucker in #28133, maybe he has more insight.

@ArthurZucker
Copy link
Collaborator

I think it comes down to just adding sliding_window to the call for _prepare_4d_causal_attention_mask_for_sdpa yes. Would you like to open a PR?

@ArthurZucker ArthurZucker added the Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want! label Feb 20, 2024
@ehuaa
Copy link
Contributor Author

ehuaa commented Feb 21, 2024

I think it comes down to just adding sliding_window to the call for _prepare_4d_causal_attention_mask_for_sdpa yes. Would you like to open a PR?

Sure,and i'll open a PR later in this week

@cyr0930
Copy link

cyr0930 commented Mar 21, 2024

any plan for pr?

@ArthurZucker
Copy link
Collaborator

#29407 should fix this issue

@cyr0930
Copy link

cyr0930 commented Mar 22, 2024

@ArthurZucker Oh you are right. Thanks.

@fxmarty
Copy link
Contributor

fxmarty commented Apr 17, 2024

Fixed in #30127

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants