tracker: move prepare_inputs_for_generation
into the generation mixin 🧹
#32685
Labels
Generation
WIP
Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress
🧹 This is a tracker regarding the move of
prepare_inputs_for_generation
into the generation mixin 🧹Why?
prepare_inputs_for_generation
is not part of the core modeling, but rather a utility forgenerate
generate
changes. Fewer modeling changes -> improved model stabilityTracker
Kinda ordered list of tasks:
llama
,generate
, andcache_utils
[except sink cache, broken atm] slow tests should be passing to ensure we don’t break anything (Llama: make slow tests green 🟢 #33138)PreTrainedModel
doesn't inherit fromGenerationMixin
, so thatcan_generate()
becomes independent ofprepare_inputs_for_generation
being overwritten or not (Generation: deprecatePreTrainedModel
inheriting fromGenerationMixin
#33203)prepare_inputs_for_generation
to the generation mixin. This implies moving one function that prepares the 4D mask too (the one that is called there) (Generate: move llamaprepare_inputs_for_generation
toGenerationMixin
#33677)prepare_inputs_for_generation
— currently we don’t test it directly, and we should (decoder-only llms: Generate: remove most decoder-only LLMsprepare_inputs_for_generation
#33870, encoder-decoder llms: Generate: moveprepare_inputs_for_generation
in encoder-decoder llms #34048)synced_gpus
ingenerate
: whensynced_gpus
andcache_positions
is out of bounds, take the latest availableinput_ids
for dummy computations (Generate: Fix modern llmgenerate
calls withsynced_gpus
#34095)prepare_inputs_for_generation
from as many models as possible. There may be merge conflicts here, due to the 4D mask function. Try to iron out as many trivial cases as possible (decoder-only llms: Generate: remove most decoder-only LLMsprepare_inputs_for_generation
#33870, encoder-decoder llms: Generate: moveprepare_inputs_for_generation
in encoder-decoder llms #34048)prepare_inputs_for_generation
to forward**kwargs
from its input to its output. With minimal changes, this should enable most VLMs to use the shared function -- they forwardpixel_values
from the input to the output (support for **kwargs Generate: remove most decoder-only LLMsprepare_inputs_for_generation
#33870)prepare_inputs_for_generation
should have been removed 🤗 We would need to check the others individually, there may be further simplification patterns available!The text was updated successfully, but these errors were encountered: