-
-
Couldn't load subscription status.
- Fork 10.9k
Reshape cache flash kernel to support HND layout #8200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reshape cache flash kernel to support HND layout #8200
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
76b2706 to
7df8e91
Compare
|
👀 |
|
This pull request has merge conflicts that must be resolved before it can be |
86c54bc to
36245ed
Compare
|
This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you! |
|
This pull request has merge conflicts that must be resolved before it can be |
44f6b58 to
c845f81
Compare
|
@WoosukKwon could you review the changes? Thanks! |
Signed-off-by: shuw <shuw@nvidia.com>
c845f81 to
492d7d8
Compare
d3fbb2b to
193e637
Compare
5db863e to
19548ab
Compare
19548ab to
7f538d7
Compare
Signed-off-by: shuw <shuw@nvidia.com>
Signed-off-by: shuw <shuw@nvidia.com>
Signed-off-by: shuw <shuw@nvidia.com>
7f538d7 to
dd2613f
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
|
This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you! |
|
This pull request has been automatically closed due to inactivity. Please feel free to reopen if you intend to continue working on it. Thank you! |
NHD: [num_blocks, block_size, num_heads, head_size]
HND: [num_blocks, num_heads, block_size, head_size]
Many fast attention kernels only support HND layout for kv_cache and this PR make reshape_and_cache_flash kernel support both.