Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for FP32 #13

Closed
houEricSY opened this issue Jun 23, 2022 · 2 comments
Closed

Support for FP32 #13

houEricSY opened this issue Jun 23, 2022 · 2 comments

Comments

@houEricSY
Copy link

Thanks for the code of flash-attention, it's brilliant! But it seems that it doesn't support FP32 computation, I wonder how can I use flash-attention in FP32 setting.

@tridao
Copy link
Contributor

tridao commented Aug 7, 2022

We currently support Turing (e.g. RTX 2080) and Ampere (e.g. RTX 3080) GPUs. We rely on tensor cores for matrix multiplication, which older GPUs lack.
You might be interested in the memory-efficient attention implemented by the xformers team (targeting fp32 instead of fp16):
facebookresearch/xformers#267
facebookresearch/xformers#281

@fmassa
Copy link

fmassa commented Aug 10, 2022

Hi,

We have just pushed a PR in facebookresearch/xformers#362 which contains V100 and P100 support as well, and dispatches to FlashAttention for the cases where it is supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants