-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update ptxla training #9864
base: main
Are you sure you want to change the base?
Update ptxla training #9864
Conversation
Cc: @yiyixuxu could you review the changes made to |
@entrpn can you use a custom attention instead? (without updating our default attention processor) |
Hi @yiyixuxu , we wrapped the flash attention kernel call under condition |
I'm just wondering if it makes sense for Flash Attention to have its attention processor since this one is meant for SDPA cc @DN6 here too |
Hi @yiyixuxu , what about we create another AttnProcess with flash attention in parallel with |
@zpcore this way user can explicitly set to use flash attention if they want to |
@yiyixuxu - to better understand, can you please help me understand why wrapping the flash attention kernel call under condition |
is it not possible that XLA_AVAILABLE but the user does not want to use flash attention? |
@sayakpaul can you please review. This new PR supersedes the other one I had opened a while back, which I just closed. Thank you.
Fixes # (issue)
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.