Fix max_length criteria when using inputs_embeds #28994

zucchini-nlp · 2024-02-13T09:50:24Z

What does this PR do?

Fixes #28953 . StoppingCriteria with max_length behaves differently when provided input_ids or inputs_embeds, this happens only on decoder-only models. The PR fixes it so that the criteria accounts for the length of input_embeds when generating

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@gante

src/transformers/generation/utils.py

gante

Technically fulfils the main request of the GH issue, but I'd like for us to go one step further!

In the test you wrote, we check self.assertEqual(out_gen.shape[-1], input_len + out_gen_embeds.shape[-1] - 1). Ideally, the final -1 shouldn't be there: we initialize input_ids with decoder_start_id, causing the additional length, and we probably shouldn't. As such, we can add an additional condition in _prepare_decoder_input_ids_for_generation: in this specific case, input_ids should be empty.

src/transformers/generation/utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

zucchini-nlp · 2024-02-13T12:45:40Z

oh, i see, added a new fix and checked that creating an empty tensor does not break anything

gante

Perfect! Thank you for iterating 🤗

Regarding failing CI: it seems unrelated to this PR and main does not have this failure. Therefore, it will likely be solved by rebasing with main and then force-pushing.

amyeroberts

Thanks for fixing this! Very nice and clean PR :)

Just some outstanding questions so I can understand what's happening here before approving

amyeroberts · 2024-02-14T11:04:11Z

tests/generation/test_utils.py

@@ -2730,6 +2730,20 @@ def test_max_length_warning_if_different(self):
                **model_kwargs,
            )

+    def test_max_length_if_input_embeds(self):
+        # PT-only test: TF doesn't have StoppingCriteria
+        article = "Hey, are you conscious?"


Can we use a different phrase here? Talking about consciousness with these LLMs isn't ideal

amyeroberts · 2024-02-14T11:06:16Z

tests/generation/test_utils.py

+        input_len = input_ids.shape[-1]
+        out_gen = model.generate(input_ids=input_ids, max_length=max_length)
+        out_gen_embeds = model.generate(inputs_embeds=inputs_embeds, max_length=max_length)
+        self.assertEqual(out_gen.shape[-1], input_len + out_gen_embeds.shape[-1])


For my own understanding - why is the returned generation when passing in input_ids, a concatenation of the input and newly generated tokens, but for embeds we only return the new embeddings?

The addition of input_length here is needed because the output of generation with inputs_embeds return only newly generated text, while the input_ids return the whole text, including prompt. So, we are just making sure the lengths of both are equal

Right, but why is the behaviour different for embeddings and input_ids?

If I understand the question correctly, the lengths here differ because we return the whole text (prompt+new) when user passes ids. But we cannot recover prompt text from input_embeds, so we just return the newly generated part

As @zucchini-nlp wrote.

There is no mismatch if the user passes input_ids and inputs_embeds, as generate continues populating input_ids. But passing both kinda defeats the point of feeding inputs_embeds, which is used mostly for experimental purposes, and thus the shape difference when only inputs_embeds is set. Although we can technically recover input_ids from inputs_embeds (reverse lookup search) in most cases to make the shapes consistent, it's probably not a good use of our engineering time :D

@zucchini-nlp @gante Thanks for the explanation!

src/transformers/generation/utils.py

amyeroberts · 2024-02-14T11:19:00Z

src/transformers/generation/utils.py

@@ -441,6 +441,9 @@ def _maybe_initialize_input_ids_for_generation(
            if isinstance(value, torch.Tensor):
                batch_size = value.shape[0]
                break
+
+        if "inputs_embeds" in model_kwargs:
+            return torch.ones((batch_size, 0), dtype=torch.long, device=self.device)


For my own understanding - am I correct in understanding when using input_embeds we don't use any initialization then, this is just an empty placeholder?

Yep, when we initialized with size 1 filled with BOS tokens, that ruined max_length by one token. We want want the final generation be a continuation of input_embeds and not start with BOS

amyeroberts · 2024-02-14T11:19:57Z

src/transformers/generation/utils.py

@@ -1421,6 +1424,11 @@ def generate(
                )
            generation_config.max_length = generation_config.max_new_tokens + input_ids_length

+        # adjust max_length when using `input_embeds` in decoder-only models


Rather than saying what this is doing (we can tell from the code) it would be useful for the comment to explain why we need to do this.

@amyeroberts done for all comments

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

HuggingFaceDocBuilderDev · 2024-02-14T12:30:46Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

amyeroberts

Looks great - thanks for iterating!

gante · 2024-02-15T14:43:57Z

@amyeroberts unrelated CI failures, I believe this can be merged 🤗

amyeroberts · 2024-02-15T20:42:11Z

@zucchini-nlp Can you try rebasing? Fixes should have been merged into main with resolve the currently failing tests

zucchini-nlp · 2024-02-16T09:59:12Z

@amyeroberts thanks, now it's all green and can be merged

* fix max_length for inputs_embeds * make style * Update src/transformers/generation/utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Static Cache: load models with MQA or GQA (huggingface#28975) * fix * fix tests * fix tests * Update src/transformers/generation/utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * more fixes * make style --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix max_length for inputs_embeds * make style * Update src/transformers/generation/utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Static Cache: load models with MQA or GQA (#28975) * fix * fix tests * fix tests * Update src/transformers/generation/utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * more fixes * make style --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

zucchini-nlp added 3 commits February 13, 2024 10:41

fix max_length for inputs_embeds

48bbf62

Merge 'main' into fix/max_length_generation

8389f1d

make style

e831a5e

zucchini-nlp commented Feb 13, 2024

View reviewed changes

src/transformers/generation/utils.py Outdated Show resolved Hide resolved

gante reviewed Feb 13, 2024

View reviewed changes

src/transformers/generation/utils.py Outdated Show resolved Hide resolved

zucchini-nlp and others added 4 commits February 13, 2024 17:31

Update src/transformers/generation/utils.py

ef47ea6

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

Static Cache: load models with MQA or GQA (huggingface#28975)

431740a

fix

4160975

Merge 'main' into fix/max_length_generation

285355c

zucchini-nlp added 3 commits February 13, 2024 17:47

fix tests

8117977

Merge 'main' into fix/max_length_generation

a2c4a6c

fix tests

6d721e8

gante approved these changes Feb 14, 2024

View reviewed changes

gante requested a review from amyeroberts February 14, 2024 10:37

amyeroberts reviewed Feb 14, 2024

View reviewed changes

zucchini-nlp and others added 4 commits February 14, 2024 16:42

Update src/transformers/generation/utils.py

965e28a

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

more fixes

13a11bd

Merge 'main' into fix/max_length_generation

77126de

make style

44d181f

Merge 'main' into fix/max_length_generation

d35f932

amyeroberts approved these changes Feb 14, 2024

View reviewed changes

Merge 'main' into fix/max_length_generation

7b3a08b

Merge 'main' into fix/max_length_generation

5a0f1bf

Merge 'main' into fix/max_length_generation

3ccb4cb

amyeroberts merged commit aee11fe into huggingface:main Feb 16, 2024
21 checks passed

zucchini-nlp mentioned this pull request Feb 19, 2024

Generation doesn't work as expected with input_embeds #29093

Closed

4 tasks

zucchini-nlp deleted the fix/max_length_generation branch February 26, 2024 12:47

zucchini-nlp mentioned this pull request Feb 26, 2024

Fix max length for BLIP generation #29296

Merged

zucchini-nlp mentioned this pull request Mar 13, 2024

Prepend bos token to Blip generations #29642

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix max_length criteria when using inputs_embeds #28994

Fix max_length criteria when using inputs_embeds #28994

zucchini-nlp commented Feb 13, 2024

gante left a comment •

edited

Loading

zucchini-nlp commented Feb 13, 2024

gante left a comment

amyeroberts left a comment

amyeroberts Feb 14, 2024

amyeroberts Feb 14, 2024

zucchini-nlp Feb 14, 2024

amyeroberts Feb 14, 2024

zucchini-nlp Feb 14, 2024

gante Feb 14, 2024 •

edited

Loading

amyeroberts Feb 14, 2024

amyeroberts Feb 14, 2024

zucchini-nlp Feb 14, 2024

amyeroberts Feb 14, 2024

zucchini-nlp Feb 14, 2024

HuggingFaceDocBuilderDev commented Feb 14, 2024

amyeroberts left a comment

gante commented Feb 15, 2024

amyeroberts commented Feb 15, 2024

zucchini-nlp commented Feb 16, 2024

Fix max_length criteria when using inputs_embeds #28994

Fix max_length criteria when using inputs_embeds #28994

Conversation

zucchini-nlp commented Feb 13, 2024

What does this PR do?

Before submitting

Who can review?

gante left a comment • edited Loading

Choose a reason for hiding this comment

zucchini-nlp commented Feb 13, 2024

gante left a comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gante Feb 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Feb 14, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

gante commented Feb 15, 2024

amyeroberts commented Feb 15, 2024

zucchini-nlp commented Feb 16, 2024

gante left a comment •

edited

Loading

gante Feb 14, 2024 •

edited

Loading