-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FIX] Make flash_attn
optional
#3269
Conversation
This reverts commit 2daf23a.
Hi @WoosukKwon, why you revert your refraction in #3005? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably we can just use a flag to turn flash_attn off instead of deleting everything? What do you think?
I think it would be safe to simply disable the flash attention backend if |
@zhuohan123 @AlpinDale Thanks for the suggestion. Just updated the PR accordingly. Please take another look. |
flash_attn
optional
Thank you @WoosukKwon, I tested this on a few environments and this restored the original behavior for me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Do you wanna keep the downloading logic as well?
@zhuohan123 Let's remove it for this fix and bring it back once we find a more stable way to do it. |
Cherry-picked from vllm-project/vllm#3269
Cherry-picked from vllm-project/vllm#3269 Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
The FlashAttention backend introduced in #3005 causes build errors in some environments and increase the package size significantly (44 MB -> 160 MB) as the vLLM package now includes the
flash-attn
(116 MB) package. This PR addresses this by removingflash_attn
from vLLM's dependency and enables falling back to xFormers whenflash_attn
is not found.