Skip to content

llama : use n_swa + n_ubatch cells for SWA cache #13833

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: gg/auto-batch
Choose a base branch
from

Conversation

ggerganov
Copy link
Member

@ggerganov ggerganov commented May 27, 2025

target #13845

Although still WIP, this is ready for testing. Any feedback is welcome. It will be merged after #13746 and #13845.

Overview

  • SWA cache now uses less memory
  • Enable SWA speculative decoding
  • Allow short SWA rollbacks

@ggerganov ggerganov changed the title llama : use n_swa + n_ubatch cells for SWA cache + auto-batch llama : use n_swa + n_ubatch cells for SWA cache May 28, 2025
@ggerganov ggerganov changed the base branch from gg/kv-cache-simplify-part3 to gg/auto-batch May 28, 2025 07:52
@aviallon
Copy link
Contributor

aviallon commented May 28, 2025

I'll try testing.
Edit: I got distracted, and forgot to test it. Oops.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants