Deprecate #36741 and map Causal to Conditional #36917

zucchini-nlp · 2025-03-24T08:19:34Z

What does this PR do?

Fixes #36886, fixes #36926 and loading in SmolAgents (from feedback in internal slack)

NOTE: this is a temp fix. In long term we will converge under one Auto for all multimodals, which we don't have yet. As such we will keep CausalLM as temporal dump for custom code users. 🔴 We will break this pattern in the near future and might enforce text-only models under this mapping

After #36741, we unintentionally broke model loading for most remote code users, because many Vision/Audio/Omni LLMs on the hub use CausalLM mapping with AutoTokenizer. This happens because AutoImageTextToText is less visible, and also because we have no mapping for other modalities.

This PR deprecates the previous fix and properly maps Gemma3 4B+ models to its ConditionalGeneration class which aligns with info in model card. As discussed internally, all vision-audio-multimodal models will converge under AutoModelForCausalLM in the future to maintain consistency and stop adding new mapping for each new modality.

Why? All Audio LMs are already in causal mapping due to the lack of AutoAudioTextToText or AutoAudioToText mappings. Vision LMs have also been inconsistently mapped under CausalLM
Consequences: No breaking changes for users, including those using remote code. A warning is raised only if a model has both full config and text config mapped in CausalLM which is a super rare case. (gemma-3 was exception)
Edge cases: Other models like llava-1.5 cannot load anymore under AutoModelForCausalLM after this PR, but checkpoint keys won’t match anyway + we never got user issues. This was an existing issue, not introduced by this PR. It might be resolved by a bigger refactor for vLLM after adding base models for all VLMs and correct base-prefix-keys

I verified that Gemma3 case works as expected, without raising warnings (I just changed mapping class). The only difference from the previous fix is that vision tower is loaded as well, which might affect advanced users who manipulated configs or access model layers manually.

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained('googel/gemma-3-4b-it')
tokenizer = AutoTokenizer.from_pretrained('googel/gemma-3-4b-it')
inputs = tokenizer('What is your name?', return_tensors="pt")
out = model.generate(**inputs)
tokenizer.batch_decode(out)
>>> ['<bos>What is your name?\n\nI am called Pixel.\n\nHow are you?\n\nI am functioning optimally, thank you for']

# Chat template
msg = [{"role": "user", "content": "What is your name?"}]
tokenizer.apply_chat_template([msg], tokenize=True, return_dict=True, return_tensors="pt", add_generation_prompt=True)
out = model.generate(**inputs)
tokenizer.batch_decode(out)
>>> ['<bos><start_of_turn>user\nWhat is your name?<end_of_turn>\n<start_of_turn>model\nMy name is Gemma. I was created by the Gemma team at Google DeepMind. \n\nI']

github-actions · 2025-03-24T08:19:46Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the Ready for review button (at the bottom of the PR page).

HuggingFaceDocBuilderDev · 2025-03-24T08:44:51Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

TLDR of our internal thread:

we need a AutoForAny, as today AutoForCausalLM is a dumpster used to map anything
we cannot break, so for now it will remain this way
we also want text only parts to be loadable only, similarly you would want to load only Image and Text -> ImageTextToText.

Let's patch this!

src/transformers/models/auto/modeling_auto.py

* deprecate the prev fix * reword warning and update docs * reword warning * tests * dont bloat `get_text_config()`

zucchini-nlp added 2 commits March 23, 2025 14:05

deprecate the prev fix

83d3807

reword warning and update docs

1183e07

zucchini-nlp requested a review from ArthurZucker March 24, 2025 08:19

zucchini-nlp added the for patch Tag issues / labels that should be included in the next patch label Mar 24, 2025

github-actions bot marked this pull request as draft March 24, 2025 08:19

zucchini-nlp marked this pull request as ready for review March 24, 2025 08:38

ArthurZucker reviewed Mar 24, 2025

View reviewed changes

zucchini-nlp added 2 commits March 24, 2025 11:13

reword warning

b2e76ef

tests

45ed39c

gante reviewed Mar 24, 2025

View reviewed changes

src/transformers/models/auto/modeling_auto.py Outdated Show resolved Hide resolved

zucchini-nlp mentioned this pull request Mar 24, 2025

Mllama not supported by AutoModelForCausalLM after updating transformers to 4.50.0 #36926

Closed

4 tasks

dont bloat get_text_config()

5610e5e

ArthurZucker approved these changes Mar 25, 2025

View reviewed changes

ArthurZucker merged commit 47e5432 into huggingface:main Mar 25, 2025
23 checks passed

ArthurZucker pushed a commit that referenced this pull request Mar 25, 2025

Deprecate #36741 and map Causal to Conditional (#36917)

d9ccb9a

* deprecate the prev fix * reword warning and update docs * reword warning * tests * dont bloat `get_text_config()`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deprecate #36741 and map Causal to Conditional #36917

Deprecate #36741 and map Causal to Conditional #36917

zucchini-nlp commented Mar 24, 2025 •

edited

Loading

github-actions bot commented Mar 24, 2025

HuggingFaceDocBuilderDev commented Mar 24, 2025

ArthurZucker left a comment

Deprecate #36741 and map Causal to Conditional #36917

Deprecate #36741 and map Causal to Conditional #36917

Conversation

zucchini-nlp commented Mar 24, 2025 • edited Loading

What does this PR do?

github-actions bot commented Mar 24, 2025

HuggingFaceDocBuilderDev commented Mar 24, 2025

ArthurZucker left a comment

Choose a reason for hiding this comment

zucchini-nlp commented Mar 24, 2025 •

edited

Loading