-
Hello, I am trying to use LLMLingua in my research. My research requires compressed KV-Cache instead of the prompt, but I couldn't figure out how to get this. Could you give me an insight... about which part should I modify to get the KV-Cache returned? Thanks. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi @i3ullbum, Sorry for the late response. The KV-cache compression introduced in LLMLingua is an experimental feature. Specifically, after the pre-filling stage, log PPL is used to decide which KV cache entries need to be retained. We will be updating with a new KV-cache compression work soon, which will include the implementation details in LLMLingua. |
Beta Was this translation helpful? Give feedback.
Hi @i3ullbum,
Sorry for the late response.
The KV-cache compression introduced in LLMLingua is an experimental feature. Specifically, after the pre-filling stage, log PPL is used to decide which KV cache entries need to be retained.
We will be updating with a new KV-cache compression work soon, which will include the implementation details in LLMLingua.