Skip to content

KV-Cache #143

Answered by iofu728
i3ullbum asked this question in Q&A
Discussion options

You must be logged in to vote

Hi @i3ullbum,

Sorry for the late response.

The KV-cache compression introduced in LLMLingua is an experimental feature. Specifically, after the pre-filling stage, log PPL is used to decide which KV cache entries need to be retained.

We will be updating with a new KV-cache compression work soon, which will include the implementation details in LLMLingua.

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by iofu728
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
question Further information is requested
2 participants