Skip to content

Conversation

@manueldeprada
Copy link
Contributor

@manueldeprada manueldeprada commented Jul 1, 2025

This PR depends on #39106

Look at the last commit, f09e0cd:

I think having the get_mask_sizes out of cache makes much more sense. There is only one extra change:

past_seen_tokens = cache_position.shape[0] if cache_position.shape[0] > 1 else cache_position[0] + 1

It substitutes past_seen_tokens=past_key_values.get_seq_length() (which depends on cache info that might be hard to cumpute, i.e., QuantizedCaches). What we would like to compute is

past_seen_tokens = cache_position[-1]

but that is not compatible with torch.export.

The new solution is torch.export friendly and works both when cache_position = torch.tensor([ 0, 1, 2, 3, 4, 5, 6]) (prefill phase) and when cache_position = torch.tensor([16]).

…yeredCache (huggingface#38077)

- Introduces CacheLayer and Cache base classes
- Ports Static, Dynamic, Offloaded, Quantized, Hybrid, etc. to use layers
- Implements method/attr dispatch across layers to reduce boilerplate
- Adds CacheProcessor hooks for offloading, quantization, etc.
- Updates and passes tests
@manueldeprada manueldeprada force-pushed the cache-move-mask-sizes-out branch from 6b6314d to f09e0cd Compare July 1, 2025 08:26
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@manueldeprada manueldeprada force-pushed the cache-move-mask-sizes-out branch 4 times, most recently from c030aa2 to b78affa Compare July 2, 2025 15:08
@manueldeprada manueldeprada force-pushed the cache-move-mask-sizes-out branch from b78affa to 16a6624 Compare July 2, 2025 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants