Support for FP32 #13

houEricSY · 2022-06-23T02:32:51Z

Thanks for the code of flash-attention, it's brilliant! But it seems that it doesn't support FP32 computation, I wonder how can I use flash-attention in FP32 setting.

tridao · 2022-08-07T15:57:32Z

We currently support Turing (e.g. RTX 2080) and Ampere (e.g. RTX 3080) GPUs. We rely on tensor cores for matrix multiplication, which older GPUs lack.
You might be interested in the memory-efficient attention implemented by the xformers team (targeting fp32 instead of fp16):
facebookresearch/xformers#267
facebookresearch/xformers#281

fmassa · 2022-08-10T15:59:26Z

Hi,

We have just pushed a PR in facebookresearch/xformers#362 which contains V100 and P100 support as well, and dispatches to FlashAttention for the cases where it is supported.

tridao closed this as completed Nov 10, 2022

z562 mentioned this issue Feb 8, 2023

An illegal memory access was encountered #121

Closed

njhill pushed a commit to njhill/flash-attention that referenced this issue Sep 27, 2024

Sync with FA v2.6.0 to support soft capping (Dao-AILab#13)

d562aa6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for FP32 #13

Support for FP32 #13

houEricSY commented Jun 23, 2022

tridao commented Aug 7, 2022

fmassa commented Aug 10, 2022

Support for FP32 #13

Support for FP32 #13

Comments

houEricSY commented Jun 23, 2022

tridao commented Aug 7, 2022

fmassa commented Aug 10, 2022