llama : use n_swa + n_ubatch cells for SWA cache #13833

ggerganov · 2025-05-27T17:16:45Z

target #13845

Although still WIP, this is ready for testing. Any feedback is welcome. It will be merged after #13746 and #13845.

Overview

SWA cache now uses less memory
Enable SWA speculative decoding
Allow short SWA rollbacks

ggml-ci

aviallon · 2025-05-28T10:15:14Z

I'll try testing.
Edit: I got distracted, and forgot to test it. Oops.

github-actions bot added examples server labels May 27, 2025

ggerganov changed the title ~~llama : use n_swa + n_ubatch cells for SWA cache + auto-batch~~ llama : use n_swa + n_ubatch cells for SWA cache May 28, 2025

llama : use n_swa + n_ubatch cells for SWA cache

6468631

ggml-ci

ggerganov force-pushed the gg/swa-optimize branch from 1bce7e8 to 6468631 Compare May 28, 2025 07:43

ggerganov changed the base branch from gg/kv-cache-simplify-part3 to gg/auto-batch May 28, 2025 07:52

This was referenced May 28, 2025

kv-cache : simplify #13746

Draft

Feature Request: --swa-extra parameter needed to restore speculative decode function with SWA #13747

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama : use n_swa + n_ubatch cells for SWA cache #13833

llama : use n_swa + n_ubatch cells for SWA cache #13833

ggerganov commented May 27, 2025 •

edited

Loading

Uh oh!

aviallon commented May 28, 2025 •

edited

Loading

Uh oh!

Uh oh!

llama : use n_swa + n_ubatch cells for SWA cache #13833

Are you sure you want to change the base?

llama : use n_swa + n_ubatch cells for SWA cache #13833

Conversation

ggerganov commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Uh oh!

aviallon commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ggerganov commented May 27, 2025 •

edited

Loading

aviallon commented May 28, 2025 •

edited

Loading