Please make it clear in the install guide it doesn't work for sm_75 GPUs yet #421

horiacristescu · 2024-08-05T09:10:28Z

I wasted a lot of time trying to install flashinfer only to find out that it doesn't actually support sm_75

It would be a good idea to put this information up so people know from the start not to go into that direction

Amrabdelhamed611 · 2024-08-05T15:14:01Z

they working on it,
from flashinfer installation
Supported GPU architectures: sm80, sm86, sm89, sm90 (sm75 / sm70 support is working in progress).
i think they need to add this info here on githup, i also spent hours try to get it work 😅🤡

yzh119 · 2024-08-05T20:29:17Z

Thanks for your suggestion, the codebase used to work with sm_75 (#128 ), but since that a lot of new features were introduced and I haven't tested them on sm_75.

Do you have any concrete error messages when compiling flashinfer from source on sm_75, that would be helpful for me to fix the issues (I don't have a sm_75 dev machine at the moment).

Amrabdelhamed611 · 2024-08-06T21:27:52Z

mainly those 2 errors:
RuntimeError: FlashAttention only supports Ampere GPUs or newer. , as my GPU not supported yet

CHECK_EQ(paged_kv_indptr.size(0), batch_size + 1), i solved thise error by installing flashinfer==0.1.2 through pip,
pip install flashinfer==0.1.2 -i https://flashinfer.ai/whl/cu121/torch2.3, the solution was from vllm issue VLLM#7070

yzh119 · 2024-08-07T04:12:40Z

@Amrabdelhamed611 the first error was not reported by flashinfer, that might be related to flashattn package and flashinfer doesn't depend on that.

Regarding the second issue, check my reply here vllm-project/vllm#7070 (comment)

esmeetu · 2024-08-08T15:53:31Z

Currently Flashinfer only support decode function with #128 , but not for prefill function. Some condition check failed when running prefill:

Invalid configuration : num_frags_x=1 num_frags_y=4 num_frags_z=1 num_warps_x=1 num_warps_z=4

After tuning some parameters, it work but got wrong result. And i thought if i lower some parameter, it will affect performance, and also Flashattention-2 which Flashinfer use in prefill, doesn't support sm_75, so I lost expectations for performance boost.
And i tried refactor vLLM flashinfer backend using xformers's prefill function and Flashinfer‘s decode (https://github.com/esmeetu/vllm/tree/sm75-flashinfer), it works well and get right answer. But didn't get better performance than xformers (almost same).

yzh119 · 2024-08-08T22:11:10Z

@esmeetu thanks for confirming, I'll take a look at the correctness issue of prefill kernels on sm75, but I don't expect too much about its performance as well because of its small shared memory size and missing of async copy features in Ampere.

yzh119 · 2024-08-08T22:12:40Z

In general I think having a decode kernels and sampling kernels on hopper for sm75 is great. I'm considering releasing a sm75 wheel for these kernels at v0.1.5.

Update: flashinfer v0.1.6 officially supports sm75, not only decode/sampling, but also prefill kernels.

zhyncs · 2024-08-27T05:45:51Z

fixed with #449

qism · 2024-09-05T03:35:42Z

@Amrabdelhamed611 the first error was not reported by flashinfer, that might be related to flashattn package and flashinfer doesn't depend on that.

Regarding the second issue, check my reply here vllm-project/vllm#7070 (comment)

@yzh119

use vLLm0.5.5 and FlashInfer0.1.6 on T4
I met the same error : RuntimeError: FlashAttention only supports Ampere GPUs or newer.

issue

yzh119 · 2024-09-05T04:34:01Z

@qism , that error is reported by flash-attn package which flashinfer do not rely on. If you see that error, it suggests your are not using flashinfer backend.

zhyncs mentioned this issue Aug 12, 2024

[Bug] T4 not work sgl-project/sglang#1058

Closed

4 tasks

zhyncs closed this as completed Aug 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Please make it clear in the install guide it doesn't work for sm_75 GPUs yet #421

Please make it clear in the install guide it doesn't work for sm_75 GPUs yet #421

horiacristescu commented Aug 5, 2024

Amrabdelhamed611 commented Aug 5, 2024 •

edited

Loading

yzh119 commented Aug 5, 2024

Amrabdelhamed611 commented Aug 6, 2024 •

edited

Loading

yzh119 commented Aug 7, 2024

esmeetu commented Aug 8, 2024 •

edited

Loading

yzh119 commented Aug 8, 2024

yzh119 commented Aug 8, 2024 •

edited

Loading

zhyncs commented Aug 27, 2024

qism commented Sep 5, 2024 •

edited

Loading

yzh119 commented Sep 5, 2024

Please make it clear in the install guide it doesn't work for sm_75 GPUs yet #421

Please make it clear in the install guide it doesn't work for sm_75 GPUs yet #421

Comments

horiacristescu commented Aug 5, 2024

Amrabdelhamed611 commented Aug 5, 2024 • edited Loading

yzh119 commented Aug 5, 2024

Amrabdelhamed611 commented Aug 6, 2024 • edited Loading

yzh119 commented Aug 7, 2024

esmeetu commented Aug 8, 2024 • edited Loading

yzh119 commented Aug 8, 2024

yzh119 commented Aug 8, 2024 • edited Loading

zhyncs commented Aug 27, 2024

qism commented Sep 5, 2024 • edited Loading

yzh119 commented Sep 5, 2024

Amrabdelhamed611 commented Aug 5, 2024 •

edited

Loading

Amrabdelhamed611 commented Aug 6, 2024 •

edited

Loading

esmeetu commented Aug 8, 2024 •

edited

Loading

yzh119 commented Aug 8, 2024 •

edited

Loading

qism commented Sep 5, 2024 •

edited

Loading