Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate #36741 and map Causal to Conditional #36917

Merged
merged 5 commits into from
Mar 25, 2025

Conversation

zucchini-nlp
Copy link
Member

@zucchini-nlp zucchini-nlp commented Mar 24, 2025

What does this PR do?

Fixes #36886, fixes #36926 and loading in SmolAgents (from feedback in internal slack)

NOTE: this is a temp fix. In long term we will converge under one Auto for all multimodals, which we don't have yet. As such we will keep CausalLM as temporal dump for custom code users. 🔴 We will break this pattern in the near future and might enforce text-only models under this mapping

After #36741, we unintentionally broke model loading for most remote code users, because many Vision/Audio/Omni LLMs on the hub use CausalLM mapping with AutoTokenizer. This happens because AutoImageTextToText is less visible, and also because we have no mapping for other modalities.

This PR deprecates the previous fix and properly maps Gemma3 4B+ models to its ConditionalGeneration class which aligns with info in model card. As discussed internally, all vision-audio-multimodal models will converge under AutoModelForCausalLM in the future to maintain consistency and stop adding new mapping for each new modality.

  • Why? All Audio LMs are already in causal mapping due to the lack of AutoAudioTextToText or AutoAudioToText mappings. Vision LMs have also been inconsistently mapped under CausalLM

  • Consequences: No breaking changes for users, including those using remote code. A warning is raised only if a model has both full config and text config mapped in CausalLM which is a super rare case. (gemma-3 was exception)

  • Edge cases: Other models like llava-1.5 cannot load anymore under AutoModelForCausalLM after this PR, but checkpoint keys won’t match anyway + we never got user issues. This was an existing issue, not introduced by this PR. It might be resolved by a bigger refactor for vLLM after adding base models for all VLMs and correct base-prefix-keys

I verified that Gemma3 case works as expected, without raising warnings (I just changed mapping class). The only difference from the previous fix is that vision tower is loaded as well, which might affect advanced users who manipulated configs or access model layers manually.

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained('googel/gemma-3-4b-it')
tokenizer = AutoTokenizer.from_pretrained('googel/gemma-3-4b-it')
inputs = tokenizer('What is your name?', return_tensors="pt")
out = model.generate(**inputs)
tokenizer.batch_decode(out)
>>> ['<bos>What is your name?\n\nI am called Pixel.\n\nHow are you?\n\nI am functioning optimally, thank you for']

# Chat template
msg = [{"role": "user", "content": "What is your name?"}]
tokenizer.apply_chat_template([msg], tokenize=True, return_dict=True, return_tensors="pt", add_generation_prompt=True)
out = model.generate(**inputs)
tokenizer.batch_decode(out)
>>> ['<bos><start_of_turn>user\nWhat is your name?<end_of_turn>\n<start_of_turn>model\nMy name is Gemma. I was created by the Gemma team at Google DeepMind. \n\nI']

@zucchini-nlp zucchini-nlp added the for patch Tag issues / labels that should be included in the next patch label Mar 24, 2025
@github-actions github-actions bot marked this pull request as draft March 24, 2025 08:19
Copy link

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the Ready for review button (at the bottom of the PR page).

@zucchini-nlp zucchini-nlp marked this pull request as ready for review March 24, 2025 08:38
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLDR of our internal thread:

  • we need a AutoForAny, as today AutoForCausalLM is a dumpster used to map anything
  • we cannot break, so for now it will remain this way
  • we also want text only parts to be loadable only, similarly you would want to load only Image and Text -> ImageTextToText.

Let's patch this!

@ArthurZucker ArthurZucker merged commit 47e5432 into huggingface:main Mar 25, 2025
23 checks passed
ArthurZucker pushed a commit that referenced this pull request Mar 25, 2025
* deprecate the prev fix

* reword warning and update docs

* reword warning

* tests

* dont bloat `get_text_config()`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
for patch Tag issues / labels that should be included in the next patch
Projects
None yet
4 participants