We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expected release date: Feb 28th, 2024
head_dim=256
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Expected release date: Feb 28th, 2024
[ ] faster batch prefill/append attention with kv partition for small query length[WIP][Feature] Support KV Partition for BatchPrefill kernel for Paged & Ragged KV-Cache. #75[ ] faster fused-rope gqa(Doesn't seem to work well, it's encouraged to use prefill kernels instead).[ ] Python interface for 4/8bit kernelsHow to use low-bit KV Cache in flashinfer? #125head_dim=256
for attention kernels #132[ ] More versatile group sizes[Feature Request] More versatile GQA group sizes #140The text was updated successfully, but these errors were encountered: