New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[Roadmap] 0.0.3 Release Checklist #138

Closed

3 tasks done

yzh119 opened this issue Feb 27, 2024 · 0 comments

Collaborator

yzh119 commented Feb 27, 2024 •

edited

Loading

Expected release date: Feb 28th, 2024

python 3.8 wheels ci: py38 wheels #131
alibi attention bias Could you support AliBi attention bias? #137 feat: support ALiBi #146
~~[ ] faster batch prefill/append attention with kv partition for small query length~~ [WIP][Feature] Support KV Partition for BatchPrefill kernel for Paged & Ragged KV-Cache. #75
~~[ ] faster fused-rope gqa~~ (Doesn't seem to work well, it's encouraged to use prefill kernels instead).
~~[ ] Python interface for 4/8bit kernels~~ How to use low-bit KV Cache in flashinfer? #125
256 head-dim Suppose Gemma model shape #130 feat: enable head_dim=256 for attention kernels #132
~~[ ] More versatile group sizes~~ [Feature Request] More versatile GQA group sizes #140

The text was updated successfully, but these errors were encountered:

yzh119 pinned this issue

yzh119 mentioned this issue

Could you support AliBi attention bias? #137

Closed

yzh119 closed this as completed

yzh119 unpinned this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment