Prompt Caching #91

lucasavila00 · 2024-04-08T01:52:42Z

Use case:

In a conversation, skip prefilling for the entire prompt every time a new message is sent.

Store the previously calculated KVs if there is enough VRAM for it.

Examples

Automatic Prefix Caching

Fast and Expressive LLM Inference with RadixAttention and SGLang

EricLBuehler · 2024-04-08T13:15:04Z

Tasklist:

Store kv caches of seqs by hash(token ids) to avoid prefill step.
Ensure that storing the kv caches does not cause OOM, in that case swap to CPU.

EricLBuehler added optimization backend Backend work models Additions to model or architectures processing Processing related to the model new feature New feature or request and removed backend Backend work labels Apr 8, 2024

EricLBuehler mentioned this issue Apr 8, 2024

Implement prefix caching #95

Merged

4 tasks

EricLBuehler closed this as completed Apr 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prompt Caching #91

Prompt Caching #91

lucasavila00 commented Apr 8, 2024

EricLBuehler commented Apr 8, 2024 •

edited

Loading

Prompt Caching #91

Prompt Caching #91

Comments

lucasavila00 commented Apr 8, 2024

Use case:

Examples

EricLBuehler commented Apr 8, 2024 • edited Loading

EricLBuehler commented Apr 8, 2024 •

edited

Loading