Skip to content

Conversation

@gante
Copy link
Contributor

@gante gante commented Aug 29, 2025

What does this PR do?

This PR (🔴 = BC breaking):

  1. Adds config.get_sub_config(). This function is an upgrade to get_text_config:
    a. Allows model-agnostic extraction of different modalities in a single function, future-proofing its uses;
    b. Uses heuristics to determine correct encoder/decoder and modality matches (as opposed to relying on hardcoded sub config names that could match). These heuristics prevent us from having to manually update matching names as new models come out. They also nudge future models towards the same patterns (👉 standardization)
    c. Related to [generate] handle support for cache classes when num enc layers != num dec layers #40277: when pulling encoder/decoder attributes from older encoder-decoder configs, make sure to consider config.attribute_map
    d. Sometimes we want any text config (e.g resizing embeddings), other times we want any decoder modality (e.g. setting up the kv cache)... and sometimes we want specifically the text decoder. get_text_config didn't support this level of control, but get_sub_config does 🤗
    e. (b.) + (c.) + (d.) = several bug fixes + allow us to remove a few overwrites
  2. 🔴 Deprecates get_text_config, replacing its calls by get_sub_config(modality="text", ...). This replacement is BC-breaking because of 1.b. However, in most cases, it fixes subtle bugs.
  3. Enables SlidingWindowLayer, the static one, on cross-attention cases (where no cache_position is provided).
  4. Corrects is_encoder_decoder check in generate tests.
  5. Fixes a few related corner cases in test_generate_continue_from_inputs_embeds and test_past_key_values_format (which allows us to delete skips/overwrites)

This PR is a follow-up to #40454 (incomplete fix for the underlying issues, which ended up being quite complex)
Fixes #40644

output_generate = self._greedy_generate(model=model, inputs_dict=inputs_dict)

if model.config.get_text_config(decoder=True).is_encoder_decoder:
if model.config.is_encoder_decoder:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original model.config.get_text_config(decoder=True).is_encoder_decoder was added because of BLIP2, which was not setting its outer-level is_encoder_decoder attribute correctly. This PR fixes the root issue there.

In practice, this line was not right: if we want to check if a model is an encoder-decoder model, we can't look at it's decoder config only :)

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@gante gante changed the title 🔴🔴 [config] Upgrade get_text_config and fix related bugs 🔴🔴 [config] get_text_config no longer errors out if it find multiple matches (and fix related bugs) Sep 1, 2025
@gante gante force-pushed the upgrade_get_text_config branch 2 times, most recently from 2d50d7f to 2b7f8dc Compare September 1, 2025 12:49
"""
for model_class in self.all_generative_model_classes:
if any(model_name in model_class.__name__.lower() for model_name in ["imagegpt"]):
# To be more precise: technically we can run this test on all models that have `inputs_embeds` or
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tl;dr moved model-level overwrites here, so we can better track odd models, and updated skip conditions

@gante gante changed the title 🔴🔴 [config] get_text_config no longer errors out if it find multiple matches (and fix related bugs) 🔴🔴 [config] Introduce get_sub_config (in place of get_text_config) and fix related bugs Sep 3, 2025
@gante gante changed the title 🔴🔴 [config] Introduce get_sub_config (in place of get_text_config) and fix related bugs 🔴🔴 [config] Add get_sub_config (in place of get_text_config) and fix related bugs Sep 3, 2025
config = config.get_text_config()
# We pull the decoder sub-config here to allow composite models to easily initialize the cache as
# `DynamicCache(config=model.config)`
config = config.get_sub_config(decoder=True)
Copy link
Contributor Author

@gante gante Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. in text-to-speech models, the decoder is not a text decoder :)

(it was working before because all TTS decoders had a common name in their decoder config)

@github-actions
Copy link
Contributor

github-actions bot commented Sep 4, 2025

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, blip_2, clvp, colqwen2, dia, gemma3, gemma3n, idefics2, idefics3, kyutai_speech_to_text, llava_next, llava_next_video, llava_onevision, mllama, moshi, paligemma

@gante
Copy link
Contributor Author

gante commented Sep 19, 2025

Closing, the root issue is solved in #40939 (the remaining fixes here were targetting old models)

@gante gante closed this Sep 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BlenderbotForConditionalGeneration errors out with list index out of range

3 participants