Prompt Caching #91
Labels
models
Additions to model or architectures
new feature
New feature or request
optimization
processing
Processing related to the model
Use case:
In a conversation, skip prefilling for the entire prompt every time a new message is sent.
Store the previously calculated KVs if there is enough VRAM for it.
Examples
Automatic Prefix Caching
Fast and Expressive LLM Inference with RadixAttention and SGLang
The text was updated successfully, but these errors were encountered: