Generate: fix generation with `inputs_embeds` when `input_ids=None` for llama and gemma #29821

njhill · 2024-03-23T03:50:22Z

The changes in #29467 break generation with inputs_embeds when input_ids is None since they expect input_ids to be non-None even for the prefill forward without past_key_values.

@gante

The changes in huggingface#29467 break generation with inputs_embeds when input_ids is None since they expect input_ids to be non-None even for the prefill forward without past_key_values.

gante · 2024-03-27T11:07:48Z

Hi @njhill 👋

Can you share an example of failure? We have a test for generation with inputs_embeds (which is passing on e.g. Llama), so our test suite is likely incomplete :)

njhill · 2024-03-27T15:54:17Z

Thanks @gante, this is specifically when using inputs_embeds and passing input_ids as None. I think both are passed to generate() in the current test. I'll update the PR title to clarify this.

gante · 2024-03-28T18:35:13Z

@njhill uhmmm the test checks that combination as well 🤔

            # input_ids is not a required input -- if we don't pass it, the newly generated tokens will be the same
            outputs_from_embeds_wo_ids = model.generate(
                inputs_embeds=inputs_embeds, max_new_tokens=20 - inputs_embeds.shape[1]
            )
            self.assertListEqual(
                outputs_from_embeds[:, inputs_embeds.shape[1] :].tolist(),
                outputs_from_embeds_wo_ids.tolist(),
            )

(you can run the test with py.test tests/models/llama/test_modeling_llama.py -k test_generate_from_inputs_embeds_decoder_only)

This means that there is probably something else going on, which could be interesting to pin down :)

njhill · 2024-03-28T22:56:57Z

@gante ah apologies for not looking at those closely enough, and thank you for the tip of how to run. Let me dig in deeper to see what's going on here.

github-actions · 2024-04-22T08:03:30Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

njhill added 2 commits March 22, 2024 20:45

Generate: fix generation with inputs_embeds for llama and gemma

1a59a20

The changes in huggingface#29467 break generation with inputs_embeds when input_ids is None since they expect input_ids to be non-None even for the prefill forward without past_key_values.

Make corresponding change to cohere

a55caaf

njhill mentioned this pull request Mar 25, 2024

Update base image, rust, python deps, rust crates IBM/text-generation-inference#68

Merged

njhill changed the title ~~Generate: fix generation with inputs_embeds for llama and gemma~~ Generate: fix generation with inputs_embeds when input_ids=None for llama and gemma Mar 27, 2024

github-actions bot closed this Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate: fix generation with `inputs_embeds` when `input_ids=None` for llama and gemma #29821

Generate: fix generation with `inputs_embeds` when `input_ids=None` for llama and gemma #29821

njhill commented Mar 23, 2024

gante commented Mar 27, 2024

njhill commented Mar 27, 2024

gante commented Mar 28, 2024

njhill commented Mar 28, 2024

github-actions bot commented Apr 22, 2024

Generate: fix generation with inputs_embeds when input_ids=None for llama and gemma #29821

Generate: fix generation with inputs_embeds when input_ids=None for llama and gemma #29821

Conversation

njhill commented Mar 23, 2024

gante commented Mar 27, 2024

njhill commented Mar 27, 2024

gante commented Mar 28, 2024

njhill commented Mar 28, 2024

github-actions bot commented Apr 22, 2024

Generate: fix generation with `inputs_embeds` when `input_ids=None` for llama and gemma #29821

Generate: fix generation with `inputs_embeds` when `input_ids=None` for llama and gemma #29821