Fix: StaticCache & `inputs_embeds` #32932

zucchini-nlp · 2024-08-22T06:58:37Z

What does this PR do?

Fixes #32911. Enables generation with Static Cache and inputs embeds, previously it was failing due to incorrect calculation of max_cache_length

Added a test for that and added tests for Gemma2ForCausalLM. Some things to note:

Gemma2 doesn't support StaticCache. It can with some small changes but imo we shouldn't
Static shape cache classes have no support for contrastive search, dola, low-memory generation and assisted decoding. So these tests are all skipped in Gemma2. I think if we want to enable the, it should go on another PR for upgrading static cache classes

HuggingFaceDocBuilderDev · 2024-08-22T07:31:25Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

Thanks, to nits but good otherwise. Do we take the max of num beams, num return sequences because they stem from beams?

src/transformers/generation/utils.py

gante

Thank you for taking care of gemma 2 🤗

src/transformers/generation/utils.py

tests/generation/test_utils.py

gante · 2024-08-23T13:24:16Z

tests/models/gemma2/test_modeling_gemma2.py

@@ -59,7 +60,7 @@ class Gemma2ModelTest(GemmaModelTest, unittest.TestCase):
        if is_torch_available()
        else ()
    )
-    all_generative_model_classes = ()
+    all_generative_model_classes = (Gemma2ForCausalLM,) if is_torch_available() else ()


😱 good spot!

This was removed because it was faiiling too many tests

yes, I skipped those that shouldn't be triggered due to model-specific cache and fixed other failing ones

gante · 2024-08-23T13:38:13Z

tests/models/gemma2/test_modeling_gemma2.py

+    def test_generate_from_inputs_embeds_with_static_cache(self):
+        pass
+
+    def _check_attentions_for_generate(


Let's add the reason for the overwrite at the top of the fn as a comment, here an on the other functions that need an overwrite! That way, we immediately know why the function needs to exist :)

(I see that you added a few comments below, like HybridCache has fixed length for key/values, moving it to the top suffices)

gante

Thank you for iterating 💛

zzxslp · 2024-09-06T06:11:44Z

Hi, run into similar errors as in #32911, will this PR get merged?

zucchini-nlp · 2024-09-06T07:33:41Z

Yes, merging now, should be ready

squash commit

zucchini-nlp requested review from gante and ArthurZucker August 22, 2024 06:58

ArthurZucker reviewed Aug 22, 2024

View reviewed changes

src/transformers/generation/utils.py Outdated Show resolved Hide resolved

src/transformers/generation/utils.py Outdated Show resolved Hide resolved

gante reviewed Aug 23, 2024

View reviewed changes

squash commit

fce9e7e

zucchini-nlp force-pushed the embeds-with-static-cache branch from 4dd1494 to fce9e7e Compare August 30, 2024 17:33

zucchini-nlp and others added 2 commits August 30, 2024 19:34

Merge branch 'main' into embeds-with-static-cache

926eaa0

Merge branch 'huggingface:main' into embeds-with-static-cache

0378197

zucchini-nlp requested a review from ArthurZucker September 4, 2024 10:05

gante approved these changes Sep 5, 2024

View reviewed changes

Merge branch 'huggingface:main' into embeds-with-static-cache

862fddc

zucchini-nlp merged commit 1759bb9 into huggingface:main Sep 6, 2024
23 checks passed

zucchini-nlp added a commit to zucchini-nlp/transformers that referenced this pull request Sep 6, 2024

Fix: StaticCache & inputs_embeds (huggingface#32932)

7098861

squash commit

itazap pushed a commit to NielsRogge/transformers that referenced this pull request Sep 20, 2024

Fix: StaticCache & inputs_embeds (huggingface#32932)

548d485

squash commit

BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024

Fix: StaticCache & inputs_embeds (huggingface#32932)

12cdc55

squash commit

BernardZach pushed a commit to innovationcore/transformers that referenced this pull request Dec 6, 2024

Fix: StaticCache & inputs_embeds (huggingface#32932)

8be35f6

squash commit

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: StaticCache & `inputs_embeds` #32932

Fix: StaticCache & `inputs_embeds` #32932

zucchini-nlp commented Aug 22, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Aug 22, 2024

ArthurZucker left a comment

gante left a comment

gante Aug 23, 2024

ArthurZucker Aug 27, 2024

zucchini-nlp Aug 27, 2024

gante Aug 23, 2024

gante left a comment

zzxslp commented Sep 6, 2024

zucchini-nlp commented Sep 6, 2024

Fix: StaticCache & inputs_embeds #32932

Fix: StaticCache & inputs_embeds #32932

Conversation

zucchini-nlp commented Aug 22, 2024 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Aug 22, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

gante left a comment

Choose a reason for hiding this comment

gante Aug 23, 2024

Choose a reason for hiding this comment

ArthurZucker Aug 27, 2024

Choose a reason for hiding this comment

zucchini-nlp Aug 27, 2024

Choose a reason for hiding this comment

gante Aug 23, 2024

Choose a reason for hiding this comment

gante left a comment

Choose a reason for hiding this comment

zzxslp commented Sep 6, 2024

zucchini-nlp commented Sep 6, 2024

Fix: StaticCache & `inputs_embeds` #32932

Fix: StaticCache & `inputs_embeds` #32932

zucchini-nlp commented Aug 22, 2024 •

edited

Loading