KV-Cache #143

i3ullbum · 2024-05-03T06:34:27Z

i3ullbum
May 3, 2024

Hello, I am trying to use LLMLingua in my research.

My research requires compressed KV-Cache instead of the prompt, but I couldn't figure out how to get this. Could you give me an insight... about which part should I modify to get the KV-Cache returned?

Thanks.

Answered by iofu728

Jun 20, 2024

Hi @i3ullbum,

Sorry for the late response.

The KV-cache compression introduced in LLMLingua is an experimental feature. Specifically, after the pre-filling stage, log PPL is used to decide which KV cache entries need to be retained.

We will be updating with a new KV-cache compression work soon, which will include the implementation details in LLMLingua.

View full answer

iofu728 · 2024-06-20T09:11:13Z

iofu728
Jun 20, 2024
Maintainer

Hi @i3ullbum,

Sorry for the late response.

The KV-cache compression introduced in LLMLingua is an experimental feature. Specifically, after the pre-filling stage, log PPL is used to decide which KV cache entries need to be retained.

We will be updating with a new KV-cache compression work soon, which will include the implementation details in LLMLingua.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KV-Cache #143

{{title}}

Replies: 1 comment

{{title}}

Select a reply

KV-Cache #143

i3ullbum May 3, 2024

Replies: 1 comment

iofu728 Jun 20, 2024 Maintainer

i3ullbum
May 3, 2024

iofu728
Jun 20, 2024
Maintainer