Decoding time compression #55

Dominic789654 · 2025-03-05T12:17:25Z

Do you know how to do decoding time compression?
Is there any code example?

SimJeg · 2025-03-06T17:14:41Z

So far we have focused on the pre-filling phase as most use cases involving long context are related to a long prompt. This is changing with reasoning models and kvpress might evolve in this direction in the future too.

The default forward_hook method used by all presses starts with the following lines:

        # Don't compress after pre-filling
        if kwargs["cache_position"][-1] > q_len:
            return output
[...]

        keys, values = self.compress(module, hidden_states, keys, values, output[1], kwargs)

This could be replaced by something like:

[...]
        if kwargs["cache_position"][-1] <= q_len:
            keys, values = self.compress_prefilling(module, hidden_states, keys, values, output[1], kwargs)
        else:
            keys, values = self.compress_decoding(module, hidden_states, keys, values, output[1], kwargs)

What do you have in mind ?

Dominic789654 · 2025-03-07T02:07:40Z

Thank you for your detailed explanation!

I was thinking about a similar solution. Indeed, this approach would require us to modify the function names in all presses, splitting the original compress method into two separate functions: compress_prefilling and compress_decoding. While this involves some refactoring work, from the perspective of code structure and maintainability, this solution appears to be quite clear and intuitive.

If this is currently the most straightforward implementation approach, I'll proceed with trying this method first. Although this change involves some refactoring work, it would better distinguish between the pre-filling phase and decoding phase logic, resulting in a clearer code structure.

Dominic789654 added the feature request New feature or request label Mar 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decoding time compression #55

Decoding time compression #55

Dominic789654 commented Mar 5, 2025

SimJeg commented Mar 6, 2025

Dominic789654 commented Mar 7, 2025

Decoding time compression #55

Decoding time compression #55

Comments

Dominic789654 commented Mar 5, 2025

SimJeg commented Mar 6, 2025

Dominic789654 commented Mar 7, 2025