-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
[v1][Spec Decode] Make sliding window compatible with eagle prefix caching #17398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v1][Spec Decode] Make sliding window compatible with eagle prefix caching #17398
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for finding the bug and fixing it.
…ching (vllm-project#17398) Signed-off-by: Chen Zhang <zhangch99@outlook.com>
…ching (vllm-project#17398) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
…ching (vllm-project#17398) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: Yuqi Zhang <yuqizhang@google.com>
#17137 drops the last matched block to support eagle. This strategy is not correct for sliding window layers.
When sliding window size is 4 and block_size is 2, we need at least 2 blocks. However,
find_longest_cache_hitof sliding_window_layers will return [NULL_BLOCK, NULL_BLOCK, BLOCK_A, BLOCK_B], and the blocks are changed to [NULL_BLOCK, NULL_BLOCK, BLOCK_A] due to eagle, which is not correct.To fix the bug, this pr moves the logic to handle eagle into specialized_manager. This pr also fixes an outdated comment in
test_eagle_enabled_removes_last_block