-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
[Bugfix] Fallback ViT attn backend to SDPA for blackwell #25851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Fallback ViT attn backend to SDPA for blackwell #25851
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the ViT attention backend fallback for Blackwell GPUs by moving the logic from a model-specific file (qwen3_vl.py) to the general CUDA platform file (cuda.py). While this is a good architectural improvement, the new implementation in cuda.py has a logical flaw that prevents the fallback from working as intended. I've provided a critical comment with a suggested fix to correct the device capability check order.
|
Perhaps we should link a tracking issue in the code? |
Good point! |
| self.attn_backend = get_vit_attn_backend( | ||
| head_size=head_dim, dtype=torch.get_default_dtype()) | ||
| use_upstream_fa = False | ||
| if self.attn_backend != _Backend.FLASH_ATTN and \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QQ: Does FA has the similar problem in Blackwell? Because this logic may still select upstream FA is available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
upstream FA seems okay on blackwell (our user reported that installing upstream FA also fixes the issue, therefore I didn't delete the code here)
Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: simon-mo <simon.mo@hey.com>
…t#25851) Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: yewentao256 <zhyanwentao@126.com>
…t#25851) Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…t#25851) Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: simon-mo <simon.mo@hey.com>
…t#25851) Signed-off-by: Roger Wang <hey@rogerw.io>
…t#25851) Signed-off-by: Roger Wang <hey@rogerw.io>
…t#25851) Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Purpose
#25788 Fixed the issue for Qwen3-VL - while we don't know if xformers is going to work with other head sizes, it's not officially supported yet according to facebookresearch/xformers#1317 (comment). Therefore it's probably safer for us to force ViT backend to SDPA for all models on blackwell for now.
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.