Move get_mask_sizes from Cache to masking_utils and remove use of get_seq_length. #39142

manueldeprada · 2025-07-01T08:22:53Z

This PR depends on #39106

Look at the last commit, f09e0cd:

I think having the get_mask_sizes out of cache makes much more sense. There is only one extra change:

transformers/src/transformers/masking_utils.py

Line 643 in 6b6314d

    
           past_seen_tokens = cache_position.shape[0] if cache_position.shape[0] > 1 else cache_position[0] + 1

It substitutes past_seen_tokens=past_key_values.get_seq_length() (which depends on cache info that might be hard to cumpute, i.e., QuantizedCaches). What we would like to compute is

past_seen_tokens = cache_position[-1]

but that is not compatible with torch.export.

The new solution is torch.export friendly and works both when cache_position = torch.tensor([ 0, 1, 2, 3, 4, 5, 6]) (prefill phase) and when cache_position = torch.tensor([16]).

…yeredCache (huggingface#38077) - Introduces CacheLayer and Cache base classes - Ports Static, Dynamic, Offloaded, Quantized, Hybrid, etc. to use layers - Implements method/attr dispatch across layers to reduce boilerplate - Adds CacheProcessor hooks for offloading, quantization, etc. - Updates and passes tests

HuggingFaceDocBuilderDev · 2025-07-01T08:39:42Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

manueldeprada added 3 commits June 29, 2025 11:46

fix quantized, add tests

04d7a0b

remove CacheProcessorList

26c28af

manueldeprada force-pushed the cache-move-mask-sizes-out branch from 6b6314d to f09e0cd Compare July 1, 2025 08:26

manueldeprada force-pushed the cache-move-mask-sizes-out branch 4 times, most recently from c030aa2 to b78affa Compare July 2, 2025 15:08

raushan review, arthur review

16a6624

manueldeprada force-pushed the cache-move-mask-sizes-out branch from b78affa to 16a6624 Compare July 2, 2025 15:17

manueldeprada closed this Jul 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Move get_mask_sizes from Cache to masking_utils and remove use of get_seq_length. #39142

Move get_mask_sizes from Cache to masking_utils and remove use of get_seq_length. #39142

Uh oh!

manueldeprada commented Jul 1, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Jul 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Move get_mask_sizes from Cache to masking_utils and remove use of get_seq_length. #39142

Move get_mask_sizes from Cache to masking_utils and remove use of get_seq_length. #39142

Uh oh!

Conversation

manueldeprada commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jul 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

manueldeprada commented Jul 1, 2025 •

edited

Loading