Skip to content

Commit 1c3cbcc

Browse files
committed
Squash for refactor: Replace monolithic cache classes with modular LayeredCache (#38077)
- Introduces CacheLayer and Cache base classes - Ports Static, Dynamic, Offloaded, Quantized, Hybrid, etc. to use layers - Implements method/attr dispatch across layers to reduce boilerplate - Adds CacheProcessor hooks for offloading, quantization, etc. - Updates and passes tests
1 parent ccf2ca1 commit 1c3cbcc

File tree

21 files changed

+2246
-1573
lines changed

21 files changed

+2246
-1573
lines changed

docs/source/en/internal/generation_utils.md

Lines changed: 0 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -366,43 +366,22 @@ A [`Constraint`] can be used to force the generation to include specific tokens
366366
- validate
367367

368368
[[autodoc]] DynamicCache
369-
- update
370-
- get_seq_length
371-
- reorder_cache
372-
- to_legacy_cache
373-
- from_legacy_cache
374369

375370
[[autodoc]] QuantizedCache
376-
- update
377-
- get_seq_length
378371

379372
[[autodoc]] QuantoQuantizedCache
380373

381374
[[autodoc]] HQQQuantizedCache
382375

383376
[[autodoc]] OffloadedCache
384-
- update
385-
- prefetch_layer
386-
- evict_previous_layer
387377

388378
[[autodoc]] StaticCache
389-
- update
390-
- get_seq_length
391-
- reset
392379

393380
[[autodoc]] OffloadedStaticCache
394-
- update
395-
- get_seq_length
396-
- reset
397381

398382
[[autodoc]] HybridCache
399-
- update
400-
- get_seq_length
401-
- reset
402383

403384
[[autodoc]] SlidingWindowCache
404-
- update
405-
- reset
406385

407386
[[autodoc]] EncoderDecoderCache
408387
- get_seq_length

docs/source/en/model_doc/falcon_mamba.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,13 @@ outputs = model.generate(**inputs, max_new_tokens=100)
110110
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
111111
```
112112

113+
## FalconMambaCache
114+
115+
[[autodoc]] FalconMambaCache
116+
- update_conv_state
117+
- update_ssm_state
118+
- reset
119+
113120
## FalconMambaConfig
114121

115122
[[autodoc]] FalconMambaConfig

docs/source/en/model_doc/mamba.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,13 @@ print(tokenizer.decode(output[0], skip_special_tokens=True))
115115
trainer.train()
116116
```
117117

118+
## MambaCache
119+
120+
[[autodoc]] MambaCache
121+
- update_conv_state
122+
- update_ssm_state
123+
- reset
124+
118125
## MambaConfig
119126

120127
[[autodoc]] MambaConfig

src/transformers/__init__.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -358,7 +358,6 @@
358358
"EncoderDecoderCache",
359359
"HQQQuantizedCache",
360360
"HybridCache",
361-
"MambaCache",
362361
"OffloadedCache",
363362
"OffloadedStaticCache",
364363
"QuantizedCache",
@@ -839,7 +838,6 @@
839838
EncoderDecoderCache,
840839
HQQQuantizedCache,
841840
HybridCache,
842-
MambaCache,
843841
OffloadedCache,
844842
OffloadedStaticCache,
845843
QuantizedCache,

0 commit comments

Comments
 (0)