-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Please make it clear in the install guide it doesn't work for sm_75 GPUs yet #421
Comments
they working on it, |
Thanks for your suggestion, the codebase used to work with sm_75 (#128 ), but since that a lot of new features were introduced and I haven't tested them on sm_75. Do you have any concrete error messages when compiling flashinfer from source on sm_75, that would be helpful for me to fix the issues (I don't have a sm_75 dev machine at the moment). |
mainly those 2 errors:
|
@Amrabdelhamed611 the first error was not reported by flashinfer, that might be related to flashattn package and flashinfer doesn't depend on that. Regarding the second issue, check my reply here vllm-project/vllm#7070 (comment) |
Currently Flashinfer only support
After tuning some parameters, it work but got wrong result. And i thought if i lower some parameter, it will affect performance, and also Flashattention-2 which Flashinfer use in prefill, doesn't support sm_75, so I lost expectations for performance boost. |
@esmeetu thanks for confirming, I'll take a look at the correctness issue of prefill kernels on sm75, but I don't expect too much about its performance as well because of its small shared memory size and missing of async copy features in Ampere. |
In general I think having a decode kernels and sampling kernels on hopper for sm75 is great. I'm considering releasing a sm75 wheel for these kernels at v0.1.5. Update: flashinfer v0.1.6 officially supports sm75, not only decode/sampling, but also prefill kernels. |
fixed with #449 |
use vLLm0.5.5 and FlashInfer0.1.6 on T4 |
@qism , that error is reported by flash-attn package which flashinfer do not rely on. If you see that error, it suggests your are not using flashinfer backend. |
I wasted a lot of time trying to install flashinfer only to find out that it doesn't actually support sm_75
It would be a good idea to put this information up so people know from the start not to go into that direction
The text was updated successfully, but these errors were encountered: