tracker: move `prepare_inputs_for_generation` into the generation mixin 🧹 #32685

gante · 2024-08-14T14:33:53Z

🧹 This is a tracker regarding the move of prepare_inputs_for_generation into the generation mixin 🧹

Why?

prepare_inputs_for_generation is not part of the core modeling, but rather a utility for generate
it should greatly reduce the need to touch modeling code, on generate changes. Fewer modeling changes -> improved model stability
greatly reduced number of lines of code 🙏

Tracker

Kinda ordered list of tasks:

1. Fix related slow tests before we start — all llama, generate, and cache_utils [except sink cache, broken atm] slow tests should be passing to ensure we don’t break anything (Llama: make slow tests green 🟢 #33138)
2. PreTrainedModel doesn't inherit from GenerationMixin, so that can_generate() becomes independent of prepare_inputs_for_generation being overwritten or not (Generation: deprecate PreTrainedModel inheriting from GenerationMixin #33203)
3. Move llama’s prepare_inputs_for_generation to the generation mixin. This implies moving one function that prepares the 4D mask too (the one that is called there) (Generate: move llama prepare_inputs_for_generation to GenerationMixin #33677)
4. Add tests for the generalist prepare_inputs_for_generation — currently we don’t test it directly, and we should (decoder-only llms: Generate: remove most decoder-only LLMs prepare_inputs_for_generation #33870, encoder-decoder llms: Generate: move prepare_inputs_for_generation in encoder-decoder llms #34048)
5. Address the case of synced_gpus in generate: when synced_gpus and cache_positions is out of bounds, take the latest available input_ids for dummy computations (Generate: Fix modern llm generate calls with synced_gpus #34095)
6. Delete prepare_inputs_for_generation from as many models as possible. There may be merge conflicts here, due to the 4D mask function. Try to iron out as many trivial cases as possible (decoder-only llms: Generate: remove most decoder-only LLMs prepare_inputs_for_generation #33870, encoder-decoder llms: Generate: move prepare_inputs_for_generation in encoder-decoder llms #34048)
7. Change prepare_inputs_for_generation to forward **kwargs from its input to its output. With minimal changes, this should enable most VLMs to use the shared function -- they forward pixel_values from the input to the output (support for **kwargs Generate: remove most decoder-only LLMs prepare_inputs_for_generation #33870)
8. By this point most cases of prepare_inputs_for_generation should have been removed 🤗 We would need to check the others individually, there may be further simplification patterns available!

The text was updated successfully, but these errors were encountered:

gante · 2024-08-14T14:34:46Z

@ydshieh edit the tracker above as soon as you start working on a task, so we don't risk doing redundant work 🤗 (e.g. with the link to a draft PR)

I'll do the same!

ydshieh · 2024-08-14T14:36:02Z

Thanks

gante self-assigned this Aug 14, 2024

ydshieh self-assigned this Aug 14, 2024

gante added the Generation label Aug 14, 2024

ydshieh added the WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress label Aug 14, 2024

gante mentioned this issue Aug 16, 2024

confusing deprecation msg for DynamicCache.seen_tokens - no cache_position in this class #32855

Closed

This was referenced Sep 24, 2024

Generate: move llama prepare_inputs_for_generation to GenerationMixin #33677

Merged

tracker: generate composability refactor #30810

Open

gante mentioned this issue Oct 1, 2024

Generate: remove most decoder-only LLMs prepare_inputs_for_generation #33870

Merged

This was referenced Oct 9, 2024

Generate: move prepare_inputs_for_generation in encoder-decoder llms #34048

Merged

Generate: Fix modern llm generate calls with synced_gpus #34095

Merged

Generate: visit non-llm prepare_inputs_for_generation #34199

Merged

gante closed this as completed in #34199 Oct 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tracker: move `prepare_inputs_for_generation` into the generation mixin 🧹 #32685

tracker: move `prepare_inputs_for_generation` into the generation mixin 🧹 #32685

gante commented Aug 14, 2024 •

edited

Loading

gante commented Aug 14, 2024 •

edited

Loading

ydshieh commented Aug 14, 2024

tracker: move prepare_inputs_for_generation into the generation mixin 🧹 #32685

tracker: move prepare_inputs_for_generation into the generation mixin 🧹 #32685

Comments

gante commented Aug 14, 2024 • edited Loading

Why?

Tracker

gante commented Aug 14, 2024 • edited Loading

ydshieh commented Aug 14, 2024

tracker: move `prepare_inputs_for_generation` into the generation mixin 🧹 #32685

tracker: move `prepare_inputs_for_generation` into the generation mixin 🧹 #32685

gante commented Aug 14, 2024 •

edited

Loading

gante commented Aug 14, 2024 •

edited

Loading