Skip to content

Conversation

@vasqu
Copy link
Contributor

@vasqu vasqu commented Sep 10, 2025

The gemma model family + qwen3 next use a flavor of RMSNorm that centers around 1:

output = output * (1.0 + self.weight.float())

The issue here that this accentuates computations with our standard init of 1 (1+1=2, doubling effect) - these can be seen on flaky fa tests for example. This PR addresses the init to be centered around 1 again just as the other models.

cc @Cyrilvallez
cc @ydshieh for viz

@vasqu vasqu requested a review from Cyrilvallez September 10, 2025 13:50
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, thanks! Just a few simple comments!

@vasqu vasqu requested a review from Cyrilvallez September 17, 2025 17:48
Copy link
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, let's just remove the useless existence check everywhere, then let's go! Thanks for this! Feel free to merge after having updated the checks!

Comment on lines 329 to 332
if "RMSNorm" in module.__class__.__name__:
# Norms can exist without weights (in which case they are None from torch primitives)
if hasattr(module, "weight") and module.weight is not None:
module.weight.data.zero_()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just remove the check everywhere (they do exist)!

Suggested change
if "RMSNorm" in module.__class__.__name__:
# Norms can exist without weights (in which case they are None from torch primitives)
if hasattr(module, "weight") and module.weight is not None:
module.weight.data.zero_()
if "RMSNorm" in module.__class__.__name__:
module.weight.data.zero_()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Omg yes, comes from the generalized function I copied from 🤦

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: gemma, gemma2, gemma3, qwen3_next, recurrent_gemma, t5gemma, vaultgemma

@vasqu vasqu enabled auto-merge (squash) September 19, 2025 12:13
@vasqu vasqu merged commit b4ba4e1 into huggingface:main Sep 19, 2025
19 checks passed
@vasqu vasqu deleted the fix-rms-norm-init branch September 19, 2025 12:15
ErfanBaghaei pushed a commit to ErfanBaghaei/transformers that referenced this pull request Sep 25, 2025
…gface#40796)

* fix

* fixup inits

* oops

* fixup gemma

* fixup modular order

* how does this keep happen lol

* vaultgemma is new i forgot

* remove init check
vijayabhaskar-ev pushed a commit to vijayabhaskar-ev/transformers that referenced this pull request Oct 2, 2025
…gface#40796)

* fix

* fixup inits

* oops

* fixup gemma

* fixup modular order

* how does this keep happen lol

* vaultgemma is new i forgot

* remove init check
ArthurZucker pushed a commit that referenced this pull request Oct 3, 2025
* fix

* fixup inits

* oops

* fixup gemma

* fixup modular order

* how does this keep happen lol

* vaultgemma is new i forgot

* remove init check
LysandreJik added a commit that referenced this pull request Oct 3, 2025
* Update expected values for one more `test_speculative_generation` after #40949 (#40967)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* FIX(trainer): ensure final checkpoint is saved when resuming training (#40347)

* fix(trainer): ensure final checkpoint is saved when resuming training

* add test

* make style && slight fix of test

* make style again

* move test code to test_trainer

* remove outdated test file

* Apply style fixes

---------

Co-authored-by: rangehow <rangehow@foxmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Add new model LFM2-VL (#40624)

* Add LFM2-VL support

* add tests

* linting, formatting, misc review changes

* add siglip2 to auto config and instantiate it in lfm2-vl configuration

* decouple image processor from processor

* remove torch import from configuration

* replace | with Optional

* remove layer truncation from modeling file

* fix copies

* update everything

* fix test case to use tiny model

* update the test cases

* fix finally the image processor and add slow tests

* fixup

* typo in docs

* fix tests

* the doc name uses underscore

* address comments from Yoni

* delete tests and unsuffling

* relative import

* do we really handle imports better now?

* fix test

* slow tests

* found a bug in ordering + slow tests

* fix copies

* dont run compile test

---------

Co-authored-by: Anna <anna@liquid.ai>
Co-authored-by: Anna Banaszak <48625325+ankke@users.noreply.github.com>

* Fix outdated version checks of accelerator (#40969)

* Fix outdated version checks of accelerator

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix outdated version checks of accelerator

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use `skip_predictor=True` in vjepa2 `get_vision_features` (#40966)

use skip_predictor in vjepa2 `get_vision_features`

* [Trainer] Fix DP loss (#40799)

* fix

* style

* Fix fp16

* style

---------

Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>

* [timm_wrapper] better handling of "Unknown model" exception in timm (#40951)

* fix(timm): Add exception handling for unknown Gemma3n model

* nit: Let’s cater to this specific issue

* nit: Simplify error handling

* Fix Issue #39030: AutoTokenizer.from_pretrained does not propagate token (#40956)

* fix merge conflicts

* change token typing

---------

Co-authored-by: Ubuntu <ubuntu@ip-172-31-27-253.ec2.internal>

* [tests] Really use small models in all fast tests (#40945)

* start

* xcodec

* chameleon

* start

* layoutlm2

* layoutlm

* remove skip

* oups

* timm_wrapper

* add default

* doc

* consistency

* Add captured actual outputs to CI artifacts (#40965)

* fix

* fix

* Remove `# TODO: ???` as it make me `???`

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Revert change in `compile_friendly_resize` (#40645)

fix

* Track the CI (model) jobs that don't produce test output files (process being killed etc.) (#40981)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Remove `set_model_tester_for_less_flaky_tests` (#40982)

remove

* Benchmarking v2 GH workflows (#40716)

* WIP benchmark v2 workflow

* Container was missing

* Change to sandbox branch name

* Wrong place for image name

* Variable declarations

* Remove references to file logging

* Remove unnecessary step

* Fix deps install

* Syntax

* Add workdir

* Add upload feature

* typo

* No need for hf_transfer

* Pass in runner

* Runner config

* Runner config

* Runner config

* Runner config

* Runner config

* mi325 caller

* Name workflow runs properly

* Copy-paste error

* Add final repo IDs and schedule

* Review comments

* Remove wf params

* Remove parametrization from worfkflow files

* Fix callers

* Change push trigger to pull_request + label

* Add back schedule event

* Push to the same dataset

* Simplify parameter description

* ENH: Enable readline support for transformers chat (#40911)

ENH Enable readline support for chat

This small change enables GNU readline support for the transformers chat
command. This includes, among others:

- advanced navigation and editing: ctrl + a ctrl + e alt + b alt + f
  ctrl + k alt + d etc.
- navigate and search history: arrow up/down ctrl + p ctrl + n  ctrl + r
- undo: ctrl + _
- clear screen: ctrl + l

Implementation

Although it may look strange, just importing readline is enough to
enable it in Python, see:

https://docs.python.org/3/library/functions.html#input

As readline is not available on some
platforms (https://docs.python.org/3/library/readline.html), the import
is guarded.

Readline should work on Linux, MacOS, and with WSL, I'm not sure about
Windows though. Ideally, someone can give it a try. It's possible that
Windows users would have to install
pyreadline (https://pypi.org/project/pyreadline3/).

* [testing] test `num_hidden_layers` being small in model tester (#40992)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* blt wip (#38579)

* blt wip

* cpu version

* cpu friendly with full entropy model (real time patching)

* adding config file instead of args file

* enable MPS

* refactoring unused code

* single config class in config file

* inherit from PreTrainedModel

* refactor LMTransformer --> BLTPatcher

* add conversion script

* load from new checkpoing with form_pretrained

* fixed demo from_pretrained

* clean up

* clean a few comments

* cleanup folder

* clean up dir

* cleaned up modeling further

* rename classes

* adding transformers Attention class and RotaryEmbedding class

* exchanged blt modules for transformers modules: attention, rotary_emb, create_causal_mask, etc

* seperate out patcher config, update modeling and conversion script

* rename vars to be more transformers-like

* rm unused functions

* adding cross attention from transformers

* pass arg

* rename weights

* updated conversion script

* overwritten commit! fixing PR

* apply feedback

* adding BLTRMSNorm like Llama

* add repeat_kv and eager_attention_forward copied from

* BLTMLP identical to MllamTextMLP

* clean up some args'

* more like mllama, but busier inits

* BLTTransformerLayer config

* decoder, encoder, global configs

* wip working on modular file

* cleaning up patch and configs

* clean up patcher helpers

* clean up patcher helpers further

* clean up

* some config renaming

* clean up unused configs

* clean up configs

* clean up configs

* update modular

* clean

* update demo

* config more like mllama, seperated subconfigs from subdicts

* read from config instead of self args

* update demo file

* model weights to causal lm weights

* missed file

* added tied weights keys

* BLTForCausalLM

* adding files after add-new-model-like

* update demo

* working on tests

* first running integration tests

* added integration tests

* adding tokenization tests, integration tests, and cleaned up tokenization file, + ruff

* tokenizer clean up

* modular file

* fixing rebase

* ruff

* adding correct basemodel output and updating config with checkpoint vals (for testing)

* BLTModelTests git status

* enabling inputs_embeds, although won't be equal to input_ids since need ids for patching logic

* fix sdpa == causal tests

* fix small model test and some gradient checkpointing

* skip training GC tests

* fix test

* updated modular

* update modular

* ruff

* adding modular + modeling

* modular

* more modern is_casual check

* cleaning up modular

* more modular reduction

* ruff

* modular fix

* fix styling

* return 2

* return 2

* fix some tests

* fix bltcrossattention after modular break

* some fixes / feedback

* try cache generate fix

* try cache generate fix

* fix generate tests

* attn_impl workaround

* refactoring to use recent TransformersKwargs changes

* fix hidden_states shape test

* refactor to new outputs

* simplify outputs a bit

* rm unneeded decoderlayer overwriting

* rename blt

* forgot tokenizer test renamed

* Reorder

* Reorder

* working on modular

* updates from modular

* new modular

* ruff and such

* update pretrainedmodel modular

* using cohere2 apply_rotary_pos_emb

* small changes

* apply feedback r2

* fix cross_attention

* apply more feedback

* update modeling fix

* load submodules from pretrainedmodel

* set initializer_range to subconfigs

* rm cross_attnetion_states pass when not needed

* add 7b projection layer support

* check repo

* make copies

* lost cohere2 rotate_half

* ruff

* copies?

* don't tie weights for submodules

* tie weights setting

* check docstrings

* apply feedback

* rebase

* rebased modeling

* update docs

* applying feedback

* few more fixes

* fix can_record_outputs

* fast tokenizer

* no more modulelist

* tok auto

* rm tokenizersss

* fix docs

* ruff

* fix after rebase

* fix test, configs are not subscriptable

---------

Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-30.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-103.ec2.internal>
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-36.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-45.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-173-121.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-103.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-178.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-79.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-169-239.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-111.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-100.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-153.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-15.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-131.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-138.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-215.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-142.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-147.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-0.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-58.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-202.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-244.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-186.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-192.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-14.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-171-249.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-75.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-78.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-134.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-180.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-175-241.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-225.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-9.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-34.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-68.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-175.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-170-160.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-95.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-73.ec2.internal>

* [`RMSNorm`] Fix rms norm init for models that center around 1 (#40796)

* fix

* fixup inits

* oops

* fixup gemma

* fixup modular order

* how does this keep happen lol

* vaultgemma is new i forgot

* remove init check

* Make `EfficientLoFTRModelTest` faster (#41000)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix typoes in src and tests (#40845)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix more dates in model cards and wrong modalities in _toctree.yml (#40955)

* Fix model cards and modalities in toctree

* fix new models

* RUFF fix on CI scripts (#40805)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* fix dict like init for ModelOutput (#41002)

* fix dict like init

* style

* [tests] update `test_left_padding_compatibility` (and minimize overwrites) (#40980)

* update test (and overwrites)

* better test comment

* 0 as a default for

* Patch more `unittest.case.TestCase.assertXXX` methods (#41008)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* 🚨 [lightglue] fix: matches order changed because of early stopped indices (#40859)

* fix: bug that made early stop change order of matches

* fix: applied code suggestion

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fix: applied code suggestion to modular

* fix: integration tests

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Fix `PhimoeIntegrationTest` (#41007)

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix Glm4v test (#41011)

fix

* Update after #41007 (#41014)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix benchmark runner argument name (#41012)

* Adding support for Qwen3Omni (#41025)

* Add Qwen3Omni

* make fix-copies, import properly

* nit

* fix wrong setup. Why was audio_token_id renamed ?

* upds

* more processing fixes

* yup

* fix more generation tests

* down to 1?

* fix import issue

* style, update check repo

* up

* fix quality at my best

* final quality?

* fix doc building

* FINAL COMMIT: SKIP IMPORTANT BUT FAILING TESTS FOR MERGE

* SKIP THE TEMPLATE ONE

---------

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>

* Making compute_loss_func always take priority in Trainer (#40632)

* logger warn, if-else logic improved

* redundant if condition fix

* Modify Qwen3Omni parameter name since VL changed it (#41045)

Modify parameter name since VL changed it

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>

* Fix Qwen video tests (#41049)

fix test

* [testing] Fix `qwen2_audio` (#41018)

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix typing of tuples (#41028)

* Fix tuple typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove optax (#41030)

Remove optax dep

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typos in English/Chinese documentation (#41031)

* Fix typos and formatting in English docs

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typos and formatting in Chinese docs

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use torch.autocast (#40975)

* Use torch.autocast

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Format code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* docs: improved RoPE function Docstrings (#41004)

* docs: improved RoPE functuon docstrings

* Update src/transformers/modeling_rope_utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Fix condition for emitting warning when generation exceeds max model length (#40775)

correct warning when generation exceeds max model length

Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>

* Fix outdated torch version check (#40925)

Update torch minimum version check to 2.2

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add Whole Word Masking and Padding Strategy to DataCollatorForLanguageModeling (#39485)

* Add whole word masking

* Vectorize whole word masking functions

* Unit test whole word masking

* Remove support for TF in whole word masking

* [testing] Fix `seed_oss` (#41052)

* fix

* fix

* fix

* fix

* fix

* fix

* Update tests/models/seed_oss/test_modeling_seed_oss.py

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Remove repeated import (#40937)

* Remove repeated import

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix conflict

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Simplify unnecessary Optional typing (#40839)

Remove Optional

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add write token for uploading benchmark results to the Hub (#41047)

* Separate write token for Hub upload

* Address review comments

* Address review comments

* Ci utils (#40978)

* Add CI reports dir to gitignore

* Add utils to run local CI

* Review compliance

* Style

* License

* Fix CI jobs being all red 🔴 (false positive) (#41059)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Update quantization CI (#41068)

* fix

* new everything

* fix

* [i18n-bn] Add Bengali language README file (#40935)

* [i18n-bn] Add Bengali language README file and update links in existing language files

* Update Bengali README for clarity and consistency in model descriptions

* Improve documentation and errors in Mamba2-based models (#41063)

* fix bug in Mamba2 docs

* correct 'because on of' issue

* link to other Mamba2 model types

* github URL is not changed

* update error message in generated files

* Update team member list for some CI workflows (#41094)

* update list

* update list

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* fix crash when using chat to send 2+ request to gptoss (#40536)

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

* Minor addition, no split modules for VideoMAEE (#41051)

* added no split modules

* fixed typo

---------

Co-authored-by: Raushan Turganbay <raushan@huggingface.co>

* Switch to `python:3.10-slim` for CircleCI docker images (#41067)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix argument name in benchmarking script (#41086)

* Fix argument name in benchmarking script

* Adjust vars

* Fix typos in documentation (#41087)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typing (#40788)

* Fix optional typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix optional typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix schema typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typing

* Fix typing

* Fix typing

* Fix typing

* Use np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Format code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Improve typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix quote string of np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix code

* Format

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove unused arguments (#40916)

* Fix unused arguments

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* fix wrong height and width when read video use torchvision (#41091)

* docs: Fix Tool Use links and remove dead RAG links (#41104)

docs: Fix tool use links. Remove dead RAG links. Fix style

* [tests] gpt2 + `CausalLMModelTester` (#41003)

* tmp commit

* tmp commit

* tmp commit

* rm old GPT2ModelTester

* nit bug

* add facilities for encoder-decoder tests; add comments on ALL overwrites/extra fns

* vision_encoder_decoder

* Fix `_get_test_info` for inherited tests (#41106)

* fix _get_test_info

* fix patched

* add comment

* ruff

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Remove bad test skips (#41109)

* remove bad skips

* remove more

* fix inits

* Format empty lines and white space in markdown files. (#41100)

* Remove additional white space and empty lines from markdown files

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add empty lines around code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update ruff to 0.13.1 + target Python 3.10 + apply fixes (#37809)

Update ruff to 0.13.1 target it to Python 3.10 and apply its fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* Support loading LFM2 GGUF (#41111)

* add gguf config mapping for lfm2

* add lfm2 tensor process to unsqueeze conv weights

* adjust values from gguf config to HF config

* add test for lfm2 gguf

* ruff

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* [torchao safetensors] integrate torchao safetensors support with transformers  (#40735)

* enable torchao safetensors

* enable torchao safetensors support

* add more version checking

* [Qwen3-next] Fix dimension mismatch in torch_chunk_gated_delta_rule and torch_recurrent_gated_delta_rule (#40963) (#41036)

* fix mismatched dims for qwen3 next

* propagate changes

* chore: renamed tot_heads to total_sequence_length

* Apply suggestion from @vasqu

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* minor fix to modular qwen3 next file

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Fix the error where a keyword argument appearing before *args (#41099)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix broken `` expressions in markdown files (#41113)

Fix broken expressions in markdown files

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove self-assignment (#41062)

* Remove self-assignment

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update src/transformers/integrations/flash_paged.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Clear pass

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Clear pass

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Clear pass

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Fixed MXFP4 model storage issue (#41118)

* Fixed loading LongT5 from legacy checkpoints (#40724)

* Fixed loading LongT5 from legacy checkpoints

* Adapted the fix to work with missing lm_head

* dummy commit (#41133)

* dummy commit, nothing interesting

* dummy commit, nothing interesting

* dummy commit, nothing interesting

* dummy commit, nothing interesting

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix loading logic flaw with regards to unexpected and missing keys (#40850)

* Unexpected keys should be ignored at load with device map

* remove them all

* fix logic flaw

* fix

* simplify

* style

* fix

* revert caching allocator change

* add other test

* add nice doc

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Fix: align Qwen2.5-VL inference rope index with training by passing s… (#41153)

Fix: align Qwen2.5-VL inference rope index with training by passing second_per_grid_ts

* Fix single quotes in markdown (#41154)

Fix typos

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* extend gemma3n integration ut cases on XPU (#41071)

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Add Parakeet (#39062)

* first commit

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* update to handle masking for bs>1

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* Add tests and docs

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* update model ids

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* update docs and improve style

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* update librosa location

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* import guard torch too

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* ruff code checks fix

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* ruff format check

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* updated to parakeet names

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* update script

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* Add tokenizer decoding

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* Remove other model dependency

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* clean tests

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* fix tests

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* linting

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* fix ruff lint warnings

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* move to seperate folders

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* add parakeet ctc model code

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* simplify encoder structure

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* update documentation

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* add parakeet to toctree

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* fix tests

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* add parakeet doc

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* Address comments

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* Update featurizer to compute lens directly

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* fix ruff tests

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* fix encoding format

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* fix minor ctc decoding

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* revert modular_model_converter.py changes

* revert check_config_attributes.py changes

* refactor: fastconformer & parakeet_ctc -> parakeet

* modeling update

* test update

* propagate feature extractor updates

* propagate doc changes

* propagate doc changes

* propagate tokenization changes

* propagate conversion changes

* remove fastconformer tests

* remove modular

* update processor

* update processor

* tset update

* diverse fixes

* 100% macthing greedy batched

* Update conversion script.

* Refactor docs.

* Reafactor auto loading.

* Refactor and fix tokenization and processing.

* Update integration test.

* Modeling fixes:
- ensure correct attention mask shape
- ensure layer drop returns valid output
- correct blank token ID when computing CTC loss

* Format and repo consistency.

* Update model doc.

* Fix feature extraction tests.

* Fix (most) tokenizer tests.

* Add pipeline example.

* Fixes

* Use eager_attention_forward from Llama.

* Small tweaks.

* Replace Sequential with ModuleList

* Add check if not all layers copied

* Clean tokenizer.

* Standardize FastSpeech2ConformerConvolutionModule for Parakeet.

* Switch to modular for modeling and processing.

* Add processor tests.

* Fix modeling tests.

* Formating and docstrings.

* Add `return_attention_mask` like other feature extractors.

* clean up after merging main.

* nits on modeling

* configuration update

* nit

* simplification: use PretrainedTokenizerFast, simplify processor

* add dtype arg to mel_filter_bank

* feature extraction: simplify!

* modeling update

* change to ParakeetTokenizerFast

* correct attention mask handling

* auto update

* proc update

* test update

* feature extraction fixes

* modeling update

* conversion script update

* udpate tests feature integration

* update tokenization and tests

* processor tests

* revert audio_utils

* config docstring update

* blank_token -> pad_token

* modeling udpate

* doc update

* fix tests

* fix test

* fix tests

* address review comments

* add comment

* add comment

* explicitly not support flash

* atttention straightforward masking

* fix

* tokenizer update: skipping blank tokens by default

* doc update

* fix max_positions_embeddings handling

* nits

* change atol faeture extraction integration tests

* doc update + fix loss

* doc update

* nit

* update integration test for A10

* repo id name

* nit

---------

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
Co-authored-by: Eric B <ebezzam@gmail.com>

* Fix format of compressed_tensors.md (#41155)

* Fix table format

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix format

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Simplify and improve model loading logic (#41103)

* remove unexpected keys from inputs (they have nothing to do there)

* remove input

* simplify a lot init

* fix

* fix check for non-persistent buffer

* revert because too many old and bad models...

* remove comment

* type hint

* make it a real test

* remove model_to_load -> always use the same model

* typo

* remove legacy offload_folder (we never waste that memory anymore)

* do not change prefix anymore

* change very bad function name

* create adjust method

* remove useless method

* restrict

* BC

* remove unused method

* CI

* remove unused args

* small fix

* fix

* CI

* CI

* avoid too many loops

* fix regex

* cleaner

* typo

* fix

* fix

* Force new vision models addition to include a fast image processor (#40802)

* add test

* fix test and change cutoff date

* Add documentation to test

* Add language specifiers to code blocks of markdown files (#41114)

* Add language specifiers to code blocks of markdown files

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update docs/source/en/model_doc/qwen3_omni_moe.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/chat_templating_writing.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/chat_templating_writing.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/chat_templating_writing.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update nemotron.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update phimoe.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update README.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fix syntax error

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Improve `add_dates` script (#41167)

* utils/add_dates.py

* put lfm2-vl in correct category

* Fix flash-attn for paged_attention when no kernels (#41078)

* Fix non-kernels flash attention paged implementation

* Cover all cases

* Style

* Update src/transformers/integrations/flash_paged.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Apply style fixes

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Remove data from examples (#41168)

Remove telemetry

* Enable fa in amd docker (#41069)

* Add FA to docker

* Use caching mechanism for qwen2_5

* Fix a typo in important models list

* Partial fixes for gemma3

* Added a commit ID for FA repo

* Detailled  the expectation storage format

* Rebase fix

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* handle flash slow tests (#41072)

* handle flash slow tests

* update patch mask to 1/0 for flash

* don't skip flash

* flash

* raise tols

* rm flash support :(

* nits

---------

Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-173-7.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-171-230.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-95.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-214.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-147.ec2.internal>

* Modernbert fix (#41056)

* Add FA to docker

* Fixed padding for mdernbert

* Fixed logits and hidden states extraction in ModernBertForMultipleChoice

* Added a test for ModernBertForMultipleChoice

* fixes

* More fixes and GREEN CI

* consistency

* moar consistency

* CI Runners - move amd runners mi355 and 325 to runner group (#41193)

* Update CI workflows to use devmi355 branch

* Add workflow trigger for AMD scheduled CI caller

* Remove unnecessary blank line in workflow YAML

* Add trigger for workflow_run on main branch

* Update workflow references from devmi355 to main

* Change runner_scale_set to runner_group in CI config

* [XPU] Add MXFP4 support for XPU (#41117)

* XPU supports gpt-oss MXFP4

* Complete MXFP4 UT file and comment information

* Complete MXFP4 UT file and comment information

* Fix code style

* Fix code style

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* [tests] `CausalLMTester` automatically infers other test classes from `base_model_class` 🐛 🔫  (#41066)

* halfway through the models

* update test checks

* refactor all

* another one

* use tuples

* more deletions

* solve bad inheritance patterns

* type

* PR ready?

* automatic model class inference from the base class

* vaultgemma

* make fixup

* make fixup

* rebase with gpt2

* make fixup :'(

* gpt2 is special

* More typing fixes (#41102)

* Fix noqa

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* fix typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* remove noqa

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix chars

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* enable flex attention ut cases on XPU (#40989)

* enable flex attention ut cases on XPU

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* fix(trainer): Avoid moving model with device_map (#41032)

* fix(trainer): Avoid moving model with device_map

When a model is loaded with `device_map="auto"` and is too large to fit on a single GPU, `accelerate` will offload some layers to the CPU or disk. The `Trainer` would previously attempt to move the entire model to the specified device, causing a `RuntimeError` because a model dispatched with `accelerate` hooks cannot be moved.

This commit fixes the issue by adding a check in `_move_model_to_device` to see if the model has an `hf_device_map` attribute. If it does, the device placement is assumed to be handled by `accelerate`, and the `model.to(device)` call is skipped.

A regression test is added to ensure the `Trainer` can be initialized with a model that has a `hf_device_map` that simulates offloading without raising an error.

* Added the logger warning for the move model

---------

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>

* Fix attention sink implementation in flex attention (#41083)

* Fix attention sink implementation in flex attention

* fix dim

* fix

* Remove print

* raisae error when return_lse is False yet s_aux is providewd

* Clean test files for merge

* Update src/transformers/integrations/flex_attention.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* force return lse

* Add to doc

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Separate docker images for Nvidia and AMD in benchmarking (#41119)

Separate docker images for Nvidia and AMD

* Make quantizers good citizens loading-wise (#41138)

* fix param_needs_quantization

* rewrite most hqq

* clean

* fix

* comment

* remove it from exception of safetensors

* start on bnb 4bits

* post-rebase fix

* make bnb4 bit a good citizen

* remove forgotten print

* make bnb 8bits a good citizen

* better hqq

* fix

* clean

* remove state dict from signature

* switch method

* make torchao a good citizen

* fixes

* fix torchao

* add check

* typo

* [`Kernels Attention`] Change fallback logic to error out on explicit kernels request and include FA3 (#41010)

* fix

* be more strict

* change logic to include fa3

* fix the case where nothing is requested

* modify old tests + add kernels related tests

* style

* Add EdgeTAM (#39800)

* initial comment

* test

* initial conversion for outline

* intermediate commit for configuration

* chore:init files for sam2

* adding arbitary undefined config

* check

* add vision

* make style

* init sam2 base model

* Fix imports

* Linting

* chore:sam to sam2 classes

* Linting

* Add sam2 to models.__init__

* chore:match prompt encoder with sam2 code

* chore:prepare kwargs for mask decoder

* Add image/video predictors

* Add CUDA kernel

* Add output classes

* linting

* Add logging info

* tmp commit

* docs for sam2

* enable image processing

* check difference of original SAM2
- difference is the order of ToTensor()
- please see https://pytorch.org/vision/main/_modules/torchvision/transforms/functional.html#resize

* enable promptencoder of sam2

* fix promprencoder

* Confirmed that PromptEncoder is exactly same (Be aware of bfloat16 and float32 difference)

* Confirmed that ImageEncoder is exactly same (Be aware the linting of init)

* Confirmed that MaskDecoder is exactly same (TO DO: lint variable name)

* SamModel is now available (Need more chore for name)

* make fix-copies

* make style

* make CI happy

* Refactor VisionEncoder and PostioinEmbedding

* TO DO : fix the image_embeddings and sparse_embeddings part

* pure image inference done

* reusable features fix and make style

* styling

* refactor memoryattention

* tmp

* tmp

* refactor memoryencoder
TO DO : convert and inference the video pipeline

* TO DO : fix the image_encoder shape

* conversion finish
TO DO: need to check video inference

* make style

* remove video model

* lint

* change

* python utils/check_docstringspy --check_all

* python utils/check_config_attributes.py

* remove copies for sam2promptencoder due to configuration

* change __init__.py

* remove tensorflow version

* fix that to not use direct comparison

* make style

* add missing import

* fix image_embedding_size

* refactor Sam2 Attention

* add fully working video inference (refactoring todo)

* clarify _prepare_memory_conditioned_features

* simplify modeling code, remove unused paths

* use one model

* use auto_docstring

* refactor rope embeddings

* nit

* not using multimask when several points given

* add all sam2.1

* add video tmp

* add Sam2VideoSessionState + fast image proc + video proc

* remove init_states from model

* fix batch inference

* add image integration tests

* uniformize modeling code with other sam models and use modular

* pass vision tests an most model tests

* All tests passing

* add offloading inference state and video to cpu

* fix inference from image embedding and existing mask

* fix multi_boxes mask inference

* Fix batch images + batch boxes inference

* improve processing for image inference

* add support for mask generation pipeline

* add support for get_connected_components post processing in mask generation

* add fast image processor sam, image processor tests and use modular for sam2 image processor

* fix mistake in sam after #39120

* fix init weights

* refactor convert

* add integration tests for video + other improvements

* add needed missing docstrings

* Improve docstrings and

* improve inference speed by avoiding cuda sync

* add test

* skip test for vision_model

* minor fix for vision_model

* fix vision_model by adding sam2model and change the torch dependencies

* remove patch_size

* remove image_embedding_size

* fix patch_size

* fix test

* make style

* Separate hieradet and vision encoder in sam2

* fixup

* review changes part 1

* remove MemoryEncoderConfig and MemoryAttentionConfig

* pass q_stride instead of q_pool module

* add inference on streamed videos

* explicitely process streamed frames

* nit

* Improve docstrings in Sam2Model

* update sam2 modeling with better gestion of inference state and cache, and separate Sam2Model and Sam2VideoModel

* improve video inference api

* change inference_state to inference_session

* use modular for Sam2Model

* fix convert sam2 hf

* modular

* Update src/transformers/models/sam2/video_processing_sam2.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fix minor config

* fix attention loading error

* update modeling tests to use hub checkpoints

* Use CI A10 runner for integration tests values + higher tolerance for video integration tests

* PR review part 1

* fix doc

* nit improvements

* enforce one input format for points, labels and boxes

* nit

* last few nits from PR review

* fix style

* fix the input type

* fix docs

* add sam2 model as conversion script

* improve sam2 doc

* add rough necessarry changes

* first working edgetam

* fix issue with object pointers

* Use modular as much as possible

* nit fixes + optimization

* refactor spatial perceiver

* cleanup after merge

* add working edgetam

* improve perceiver resampler code

* simplify/unify rope attention logic

* Improve comments in apply_rotary_pos_emb_2d

* add working tests

* fix test timmwrapper

* add docs

* make fixup

* nits

* fix modular

* fix modular

* PR review part 1

* split apply_rotary_pos_emb_2d

* add granularity to _prepare_memory_conditioned_features

* add dates to doc

* add separate mlp for memory attention

* Fix memory on wrong device

* store processed frames in dict

* update checkpoints in tests

* update dates

---------

Co-authored-by: sangbumchoi <danielsejong55@gmail.com>
Co-authored-by: RUFFY-369 <prakarshkaushik369@gmail.com>
Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>
Co-authored-by: Haitham Khedr <haithamkhedr@meta.com>
Co-authored-by: sangbum choi <sangbumchoi@sangbumui-MacBookAir.local>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Fix EXAONE-4.0 dummy id (#41089)

* Fix EXAONE-4.0 dummy id

* Fix exaone4 dummy (#1)

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix 8bit bnb loading (#41200)

* Fix 8bit

* oups forgot the case where it is not prequantized

* Fix docker quantization (#41201)

* launch docker

* remove gptq for now

* run tests

* Revert "run tests"

This reverts commit f85718ce3a21d5937bf7405b8925c125c67d1a3e.

* revert

* Embed interactive timeline in docs (#41015)

* embed timeline in docs (test web componentand Iframe)

* test scaling

* test multiple scales

* compensate scale in width

* set correct syle and scale

* remove bottom space created by scale

* add timeline as a separate page

* reformulate docs after review

* [docs] Fix links (#41110)

fix

* Remove unnecessary Optional typing (#41198)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* docs/examples(speech): pin CTC commands to Hub datasets; add Windows notes (#41027)

* examples(speech): load Common Voice from Hub; remove deprecated dataset-script references (Windows-friendly notes)

* docs/examples(speech): pin CTC streaming & other CTC commands to Hub datasets; add Windows notes

* make style

* examples(speech): align DataTrainingArguments help with datasets docs; minor wording fixes

* docs/examples(speech): address review  remove Hub subsection & Whisper tip; align dataset help text

* style: apply ruff/black/usort/codespell on examples/speech-recognition

* Apply style fixes

* Update examples/pytorch/speech-recognition/README.md

* update doc to match load_dataset

---------

Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Fix Qwen3-Omni audio_token_id serialization issue (#41192)

Fix Qwen3-Omni audio_token_id serialization by overriding parent's attribute_map

- Override attribute_map in Qwen3OmniMoeThinkerConfig to prevent inheritance of incorrect mapping
- Parent class maps audio_token_id -> audio_token_index, but implementation uses audio_token_id directly
- Fixes issue where custom audio_token_id values were not preserved during save_pretrained/from_pretrained cycles

Fixes #41191

* Wait for main process in _save_checkpoint to ensure best checkpoint exists (#40923)

* Update trainer.py

* fix

* fix format

* move barrier, delete redundant

* Avoid assumption that model has config attribute in deepspeed (#41207)

Avoid assumption that model has config in deepspeed

* Trainer: Pass `num_items_in_batch` to `compute_loss` in `prediction_step` (#41183)

* Add num_items_in_batch computation to predict_step.

* address comments.

* Fix test cases.

* fixup

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* [ESM] add accepts_loss_kwargs=False to EsmPreTrainedModel (#41006)

add accepts_loss_kwargs=False to EsmPreTrainedModel

Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Align pull request template to bug report template (#41220)

The only difference is that I don't users to https://discuss.huggingface.co/ for hub issues.

* [generate] cache missing custom generate file (#41216)

* cache missing custom generate file

* make fixup

* Remove old Python code (#41226)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Adapt to the SDPA interface to enable the NPU to call FlashAttentionScore (#41143)

Adapt to the SDPA interface to enable the NPU to call FlashAttentionScore.

Co-authored-by: frozenleaves <frozen@Mac.local>

* update code owners (#41221)

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Unify is_torchvision_v2_available with is_torchvision_available (#41227)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typing of train_args (#41142)

* Fix typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix fsdp typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix sliding window attn mask (#41228)

* Fix sliding window attn mask

* Clearer test

* Apply style fixes

* If Picasso made ascii drawings he would have made this

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Revert "Fix DeepSpeed mixed precision precedence over Accelerate defaults" (#41124)

* Revert "Fix DeepSpeed mixed precision precedence over Accelerate defaults (#3…"

This reverts commit df67cd35f0ca1a1cbf7147b2576db31b16200cf4.

* fix

* [docs] Fix tp_plan (#41205)

remove manual

* Fix white space in documentation (#41157)

* Fix white space

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Revert changes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix autodoc

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* fix qwen text config (#41158)

* fix qwen text config

* fix tests

* fix one more test

* address comments

* Video processor accepts single frames on cuda (#41218)

* fix

* why was is np if input is in torch

* Use math.log2 (#41241)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* fix TrainerIntegrationDeepSpeed UT failures (#41236)

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* [repo utils] Update `models_to_deprecate.py` (#41231)

* update models_to_deprecate

* exclude this file

* handle typos and aliases

* don't commit files

* PR suggestions; make fixup

* Use removeprefix and removesuffix (#41240)

* Use removeprefix and removesuffix

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix pylint warnings (#41222)

* Remove unused variables

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove reimported packages

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix pylint warnings

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Simplify

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove all instances of `is_safetensors_available` (#41233)

* safetensors is a core dep

* fix

* ok

* simplify branching

* keep it for now

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* FP-Quant NVFP4 and Python 3.9 support (#39876)

* quartet

* quartet qat -> quartet

* format

* bf16 backward

* interfaces

* forward_method

* quartet -> fp_quant

* style

* List -> list

* list typing

* fixed format and annotations

* test_fp_quant

* docstrings and default dtypes

* better docstring and removed noop checks

* docs

* pseudoquantization support to test on non-blackwell

* pseudoquant

* Pseudoquant docs

* Update docs/source/en/quantization/fp_quant.md

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update docs/source/en/quantization/fp_quant.md

* Update docs/source/en/quantization/fp_quant.md

* Update src/transformers/utils/quantization_config.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Update tests/quantization/fp_quant_integration/test_fp_quant.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Update tests/quantization/fp_quant_integration/test_fp_quant.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* small test fixes

* dockerfile update

* spec link

* removed `_process_model_after_weight_loading`

* toctree

* nvfp4

* nvfp4 tests

* FP-Quant version bumped

* nvfp4 default and docs update

* trainable

* cpu if pseudoquant

* proper group size selection

* gsr

* qutlass requirement version bumo

* Upstream docker copy

* docs update

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* [`FA3`] Fix masking and loading logic in same process (#41217)

fix loading and fa3 masking

* [t5gemma] fix `get_text_config` and related fixes (#40939)

* tmp commit

* t5gemma fixes

* Don't convert to `safetensors` on the fly if the call is from testing (#41194)

* don't convert

* disable

* Update src/transformers/modeling_utils.py

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>

* fix

* disable

* disable

* disable

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>

* Resolve remote custom module path warnings (#41243)

* add peft team members to issue/pr template (#41262)

* add

* Update .github/PULL_REQUEST_TEMPLATE.md

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* docs: update bitsandbytes platform support (#41266)

* add more activation kernels, follow up  (#40944)

* add more activation kernels

* fixing style

* fix version

* fix asr pipeline ut failures (#41275)

* fix asr pipeline ut failures

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* make style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Use regex defailed flags (#41264)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix multi-video timestamp bug in Qwen-3-VL and GLM4V (#41229)

* fix multi-video timestamp bug in qwen3vl,glm4v

* run make fix-copies to sync modular files

* run make fix-copies to sync modular files

---------

Co-authored-by: UBT <daqin.luo@ubtrobot.com>

* Fix binding of video frames to video placeholder in `InternVL` model (#41237)

* Fix binding video frames to video placeholder in prompt

Signed-off-by: Daniel Bershatsky <daniel.bershatsky@gmail.com>

* Add test on binding video frames to prompt

Signed-off-by: Daniel Bershatsky <daniel.bershatsky@gmail.com>

* Fix code style issues

Signed-off-by: Daniel Bershatsky <daniel.bershatsky@gmail.com>

* Fix broken tests on `InternVLProcessor`

Signed-off-by: Daniel Bershatsky <daniel.bershatsky@gmail.com>

* Add `return_tensors` to video processor defaults

Signed-off-by: Daniel Bershatsky <daniel.bershatsky@gmail.com>

---------

Signed-off-by: Daniel Bershatsky <daniel.bershatsky@gmail.com>

* Deprecate Trackio environment variables and deploy to Spaces by default (#40950)

* allow prive space id for trackio

* complete docstring

* Deprecate environment variables for Trackio integration; use TrainingArguments instead and deploy by default

* style

* Enhance documentation for Trackio Space ID in TrainingArguments

* Allow private Space id for Trackio (#40948)

* allow prive space id for trackio

* complete docstring

* fix async client for transformers chat (#41255)

* fix-client

* fix

* Unify is_torchvision_v2_available with is_torchvision_available (#41259)

Fix is_torchvision_v2_available

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use max/min (#41280)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Biogptlogits (#41270)

added logits slicing to BioGpt for seq classifier

Signed-off-by: Aviral <aviralkamaljain@gmail.com>

* Fix unnecessary single-item container checks (#41279)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix pylint generator warnings (#41258)

Fix pylint generator warnings

Signed-off-by: cyy <cyyever@outlook.com>

* feat: use `aws-highcpu-32-priv` for amd docker img build (#41285)

* feat: use `aws-highcpu-32-priv` for amd docker img build

* feat: add `workflow_dispatch` event to docker build CI

* Add processor and intergration test for qwen3vl (#41277)

* support aux loss in qwen3vlmoe

* update qwen3vl processor test!

* add integration tests for qwen3vl-30a3

* remove duplicated decorator

* code clean

* fix consistency

* do not inherit from nn.Linear for better quantization

* pass check

* Remove `test_initialization` (#41261)

remove it

* Remove some previous team members from allow list of triggering Github Actions (#41263)

* delete

* delete

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Build doc in 2 jobs: `en` and `other languages` (#41290)

* separate

* separate

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix mxfp4 dequantization (#41292)

fix

* [`Flex Attn`] Fix lse x attention sinks logic   (#41249)

fix

* FIX: Bug in PEFT integration delete_adapter method (#41252)

The main content of this PR is to fix a bug in the delete_adapter method
of the PeftAdapterMixin. Previously, it did not take into account
auxiliary modules from PEFT, e.g. those added by modules_to_save. This
PR fixes this oversight.

Note that the PR uses a new functionality from PEFT that exposes
integration functions like delete_adapter. Those will be contained in
the next PEFT release, 0.18.0 (yet unreleased). Therefore, the bug is
only fixed when users have a PEFT version fullfilling this requirement.
I ensured that with old PEFT versions, the integration still works the
same as previously. The newly added test for this is skipped if the PEFT
version is too low.

(Note: I tested locally with that the test will pass with PEFT 0.18.0)

While working on this, I also cleaned up the following:

- The active_adapter property has been deprecated for more than 2 years
  (#26407). It is safe to remove it now.
- There were numerous small errors or outdated pieces of information in
  the docstrings, which have been addressed.

When PEFT < 0.18.0 is used, although we cannot delete modules_to_save,
we can still detect them and warn about it.

* Italian translation for README.md (#41269)

chore: add Italian translation for README.md

* Fix README.md error when installing from source (#41303)

* download and use HF Hub Cache (#41181)

use hub cache

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* fix some merge issues

* [test_all]

* [test-all]

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Signed-off-by: Daniel Bershatsky <daniel.bershatsky@gmail.com>
Signed-off-by: Aviral <aviralkamaljain@gmail.com>
Signed-off-by: cyy <cyyever@outlook.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Rangehow <88258534+rangehow@users.noreply.github.com>
Co-authored-by: rangehow <rangehow@foxmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
Co-authored-by: Anna <anna@liquid.ai>
Co-authored-by: Anna Banaszak <48625325+ankke@users.noreply.github.com>
Co-authored-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Hamish Scott <41787553+hamishs@users.noreply.github.com>
Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
Co-authored-by: Harshal Janjani <75426551+harshaljanjani@users.noreply.github.com>
Co-authored-by: Branden <brandenkmurray@gmail.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-27-253.ec2.internal>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
Co-authored-by: Ákos Hadnagy <akos@ahadnagy.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Ita Zaporozhets <31893021+itazap@users.noreply.github.com>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-30.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-103.ec2.internal>
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-36.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-45.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-173-121.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-103.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-178.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-79.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-169-239.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-111.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-100.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-153.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-15.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-131.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-138.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-215.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-142.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-147.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-0.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-58.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-202.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-244.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-186.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-192.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-14.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-171-249.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-75.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-78.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-134.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-180.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-175-241.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-225.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-9.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-34.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-68.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-175.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-170-160.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-95.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-73.ec2.internal>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: StevenBucaille <steven.bucaille@gmail.com>
Co-authored-by: BakerBunker <17872844+BakerBunker@users.noreply.github.com>
Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
Co-authored-by: Ayush <ayushtanwar1729@gmail.com>
Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>
Co-authored-by: Yannick Schnider <Yannick.Schnider1@ibm.com>
Co-authored-by: Ralph Gleaton <70818603+rjgleaton@users.noreply.github.com>
Co-authored-by: Rémi Ouazan <83456801+remi-or@users.noreply.github.com>
Co-authored-by: Saidur Rahman Pulok <59414463+saidurpulok@users.noreply.github.com>
Co-authored-by: Nick Doiron <ndoiron@mapmeld.com>
Co-authored-by: Wang, Yi <yi.a.wang@intel.com>
Co-authored-by: Duygu Altinok <duygu.altinok12@gmail.com>…
yuchenxie4645 pushed a commit to yuchenxie4645/transformers that referenced this pull request Oct 4, 2025
…gface#40796)

* fix

* fixup inits

* oops

* fixup gemma

* fixup modular order

* how does this keep happen lol

* vaultgemma is new i forgot

* remove init check
gante added a commit that referenced this pull request Oct 8, 2025
… text generation (#40837)

* init

* added TopH

* Update TopH logits_process.py

* Update logits_process.py

* Update test_logits_process.py

* Update test_logits_process.py

* added test No. 4

* Resolving __init__.py issues

* Resolving configuration_utils.py Issues

* Resolving logits_process.py Issues

* Resolving utils.py Issues

* Resolving test_logits_process.py Issues

* Resolving __init__.py issues

* Resolving logits_process.py Issues

* Resolving __init__.py issues

* Updated Docs

* Updated Docstring

* style: autoformat with make fixup

* Fixing Docstring

* Update logits_process.py removed defaults

* Variable H name -> cumulative_entropy

* Using torch.distributions.Categorical

* Improve torch_dtype checks (#40808)

* Improve torch_dtype checks

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Apply suggestions from code review

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Add VideoProcessors to auto-backend requirements (#40843)

* add it

* fix existing ones

* add perception to auto_mapping...

* Adds Causal Conv 1D kernel for mamba models (#40765)

* add kernel

* make style

* keep causal-conv1d

* small fix

* small fix

* fix modular converter

* modular fix + lazy loading

* revert changes modular

* nit

* hub kernels update

* update

* small nit

* Update no split modules in T5Gemma model (#40810)

* Update no split modules in T5Gemma model

* Update no_split_modules also for T5Gemma modular

* Remove model_split_percents from test cases

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Replace image classification loss functions to `self.loss_function` (#40764)

* Fix the misalignment between the l2norm in GDN of Qwen3-Next and the implementation in the FLA library. (#40842)

* align torch implementation of gdn with fla.

* fix fla import.

* fix

* remove unused attr

* fixes

* strictly align l2norm in Qwen3-Next with FLA implementation.

---------

Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Fixes for continuous batching (#40828)

* Fix for CB attn mask and refactor

* Tests for CB (not all passing)

* Passing tests and a logger fix

* Fixed the KV metrics that were broken when we moved to hybrid alloc

* Fix circular import and style

* Added tests for FA

* Unfolded test to have device expectations

* Fixes for H100

* more fixes for h100

* H100 are good

* Style

* Adding some comments from #40831

* Rename test

* Avoid 1 letter variables

* Dictonnary is only removed during kwargs

* Test for supported sample

* Fix a unvoluntary slice

* Fixes for non-sliced inputs and small example improvments

* Slice inputs is more understandabe

* Style

* [tests] re-enable aria fast tests (#40846)

* rise from the dead

* test

* [SAM2] Fix inconsistent results with original implementation with input boxes (#40800)

* Fix inconsistencies with box input inference with original repo

* remove print

* always pad

* fix modular

* [Sam2Video] Fix video inference with batched boxes and add test (#40797)

fix video inference with batched boxes and add test

* add: differential privacy research model (#40851)

* VaultGemma

* Removing Sequence and Token classification models. Removing integration tests for now

* Remove pass-only modular code. style fixes

* Update vaultgemma.md

* Update docs/source/en/model_doc/vaultgemma.md

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Update docs/source/en/model_doc/vaultgemma.md

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Add links to model doc

* Correct model doc usage examples

* Updating model doc to describe differences from Gemma 2

* Update model_doc links

* Adding integration tests

* style fixes

* repo consistency

* attribute exception

---------

Co-authored-by: Amer <amersinha@gmail.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* [test] Fix test_eager_matches_sdpa incorrectly skipped (#40852)

* ouput_attentions in typed kwargs

* correct typing in GenericForTokenClassification

* improve

* [tests] move generative tests away from `test_modeling_common.py` (#40854)

move tests

* [generate] Always use decoder config to init cache (#40772)

* mega derp

* fix

* always use the decoder

* Use checkpoint in auto_class_docstring (#40844)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix TrainingArguments.parallelism_config NameError with accelerate<1.10.1 (#40818)

Fix ParallelismConfig type for accelerate < 1.10.1

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Redirect MI355 CI results to dummy dataset (#40862)

* [Bug fix #40813] Fix base_model_tp_plan of Starcoder2 model. (#40814)

Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>

* [docstrings / type hints] Update outdated annotations for `past_key_values`  (#40803)

* some fixes

* nits

* indentation

* indentation

* a bunch of type hints

* bulk changes

* fix florence kwargs  (#40826)

* fix: XIELU act parameters not being casted to correct dtype (#40812)

* Update model tags and integration references in bug report (#40881)

* [Qwen3 Next] Use numerically stable `rsqrt` (#40848)

use numerically stable inverse

* Adding Support for Qwen3-VL Series (#40795)

* add qwen3vl series

* make fixup

* fix import

* re-protect import

* fix it finally (need to merge main into the branch)

* skip processor test (need the checkpoint)

* oups typo

* simplify modular

* remove unecesary attr

* fix layer

* remove unused rope_deltas args

* reuse image def

* remove unnesesary imports

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>

* [`VaultGemma`] Update expectations in integration tests (#40855)

* fix tests

* style

* Fix modular consistency (#40883)

* reapply modular

* add missing one

* 🔴 Move variable output controls to `_prepare_generation_config ` (#40715)

* move checks to validate steps where possible

* fix csm and other models that override _sample

* ops dia you again

* opsie

* joao review

* Move variable output controls to `prepare_inputs_for_generation`

* fix a bunch of models

* back to basics

* final touches

* Clarify passing is_causal in sdpa_attention_paged_forward (#40838)

* Correctly pass is_causal in sdpa_attention_paged_forward

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Improve typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add comment

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Improve comments

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Revert typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use torch.expm1 and torch.log1p for better numerical results (#40860)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add Fast PromptDepthAnything Processor (#40602)

* Test & import setup

* First version passing tests

* Ruff

* Dummy post processing

* Add numerical test

* Adjust

* Doc

* Ruff

* remove unused arg

* Refine interpolation method and push test script

* update bench

* Comments

* Update src/transformers/models/auto/image_processing_auto.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Remove benchmrk script

* Update docstrings

* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* doc

* further process kwargs

* remove it

* remove

* Remove to dict

* remove crop middle

* Remove param specific handling

* Update testing logic

* remove ensure multiple of as kwargs

* fix formatting

* Remove none default and get image size

* Move stuff to _preprocess_image_like_inputs and refacto

* Clean

* ruff

* End of file & comments

* ruff again

* Padding fixed

* Remove comments to pass tests

* Remove prompt depth from kwargs

* Adjust output_size logic

* Docstring for preprocess

* auto_docstring for preprocess

* pass as an arg

* update test batched

* stack images

* remove prompt scale to meter

* return tensors back in preprocess

* remove copying of images

* Update behavior to match old processoer

* Fix batch size of tests

* fix test and fast

* Fix slow processor

* Put tests back to pytorch

* remove check and modify batched tests

* test do_pad + slow processor fix

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>

* Fix deta loading & dataclass (#40878)

* fix

* fix 2

* Remove dict branch of attention_mask in sdpa_attention_paged_forward (#40882)

Remove dict branch of attention_mask

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* 🌐 [i18n-KO] Translated smolvlm.md to Korean (#40414)

* fix: manual edits

* Apply suggestions from code review

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/_toctree.yml

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* 🌐 [i18n-KO] Translated `imageprocessor.md` to Korean (#39557)

* feat: manual translation

* docs: fix ko/_toctree.yml

* Apply suggestions from code review

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* Update docs/source/ko/image_processors.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* [generate] remove docs of a feature that no longer exists (#40895)

* Make debugging failing tests (check and update expect output values) easier 🔥  (#40727)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fixing the call to kernelize (#40628)

* fix

* style

* overload train and eval

* add getter and setter

* Fix getter  regression (#40824)

* test things

* style

* move tests to a sane place

* Fix flaky `Gemma3nAudioFeatureExtractionTest::test_dither` (#40902)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [cache] Merge static sliding and static chunked layer (#40893)

* merge

* get rid of tensors in get_mask_sizes!!

* remove branch

* add comment explanation

* re-add the class with deprecation cycle

* Harmonize CacheLayer names (#40892)

* unify naming

* style

* doc as well

* post rebase fix

* style

* style

* revert

* [cache] Only use scalars in `get_mask_sizes` (#40907)

* remove tensor ops

* style

* style

* Set seed for `Glm4vIntegrationTest` (#40905)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Add Olmo3 model (#40778)

* transformers add-new-model-like for Olmo3

* Implement modular Olmo3

* Update Olmo3 tests

* Copy Olmo2 weight converter to Olmo3

* Implement Olmo3 weight converter

* Fix code quality errors

* Remove unused import

* Address rope-related PR comments

* Update Olmo3 model doc with minimal details

* Fix Olmo3 rope test failure

* Fix 7B integration test

* remove dummy EncodingFast (#40864)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Improve module name handling for local custom code (#40809)

* Improve module name handling for local custom code

* Use `%lazy` in logging messages

* Revert "Use `%lazy` in logging messages"

This reverts commit 5848755d5805e67177c5218f351c0ac852df9340.

* Add notes for sanitization rule in docstring

* Remove too many underscores

* Update src/transformers/dynamic_module_utils.py

* Update src/transformers/dynamic_module_utils.py

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Remove `runner_map` (#40880)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* disable `test_fast_is_faster_than_slow` (#40909)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [gemma3] `Gemma3ForConditionalGeneration` compatible with assisted generation (#40791)

* gemma3vision compatible with assisted generation

* docstring

* BC

* docstring

* failing checks

* make fixup

* apply changes to modular

* misc fixes

* is_initialized

* fix poor rebase

* [generate] misc fixes (#40906)

misc fixes

* 🔴Make `center_crop` fast equivalent to slow (#40856)

make center_crop fast equivalent to slow

* Fix dtype in Paligemma (#40912)

* fix dtypes

* fix copies

* delete unused attr

* [Docs] Adding documentation of MXFP4 Quantization (#40885)

* adding mxfp4 quantization docs

* review suggestions

* Apply suggestions from code review

Co-authored-by: vb <vaibhavs10@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: vb <vaibhavs10@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Processor load with multi-processing (#40786)

push

* [Llama4] Remove `image_sizes` arg and deprecate `vision_feature_layer` (#40832)

* Remove unused arg

* deprecate

* revrt one change

* get set go

* version correction

* fix

* make style

* comment

* Fix #40067: Add dedicated UMT5 support to GGUF loader (config, tokenizer, test) (#40218)

* Fix #40067 : add UMT5 support in GGUF loader (config, tokenizer, test)

* chore: fix code formatting and linting issues

* refactor: move UMT5 GGUF test to quantization directory and clean up comments

* chore: trigger CI pipeline

* refactor(tests): Move UMT5 Encoder GGUF test to GgufModelTests. This consolidates the new test into the main class for consistency.

* Add regression check to UMT5 encoder GGUF test

Verify encoder output against reference tensor values with appropriate tolerances for stability.

* Update tests/quantization/ggml/test_ggml.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Update tests/quantization/ggml/test_ggml.py

remove comments

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* [torchao safetensors] renaming get_state_dict function (#40774)

renaming get_state_dict function

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Adding activation kernels (#40890)

* first commit

* add mode

* revert modeling

* add compile

* rm print

* Minor fix for #40727 (#40929)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Add support for Florence-2 training (#40914)

* Support training florence2

* update doc and testing model to florence-community

* fix florence-2 test, use head dim 16 instead of 8 for fa2

* skip test_sdpa_can_dispatch_on_flash

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Add LongCat-Flash (#40730)

* working draft for LongCat

* BC changes to deepseek_v3 for modular

* format

* various modularities

* better tp plan

* better init

* minor changes

* make modular better

* clean up patterns

* Revert a couple of modular commits, because we won't convert in the end

* make things explicit.

* draft test

* toctree, tests and imports

* drop

* woops

* make better things

* update test

* update

* fixes

* style and CI

* convert stuff

* up

* ah, yes, that

* enable gen tests

* fix cache shape in test (sum of 2 things)

* fix tests

* comments

* re-Identitise

* minimize changes

* better defaults

* modular betterment

* fix configuration, add documentation

* fix init

* add integration tests

* add info

* simplify

* update slow tests

* fix

* style

* some additional long tests

* cpu-only long test

* fix last tests?

* urg

* cleaner tests why not

* fix

* improve slow tests, no skip

* style

* don't upcast

* one skip

* finally fix parallelism

* [DOC] Add missing dates in model cards (#40922)

add missing dates

* [models] remove unused `import torch.utils.checkpoint`  (#40934)

* Intel CPU dockerfile (#40806)

* upload intel cpu dockerfile

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update cpu dockerfile

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update label name

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* docs(i18n): Correct the descriptive text in the README_zh-hans.md (#40941)

* Fix trainer tests (#40823)

* fix liger

* fix

* more

* fix

* fix hp

* fix

---------

Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>

* Fix `Glm4vMoeIntegrationTest` (#40930)

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Raise error instead of warning when using meta device in from_pretrained (#40942)

* raise instead of warning

* add timm

* remove

* Consistent naming for images kwargs (#40834)

* use consistent naming for padding

* no validation on pad size

* add warnings

* fix

* fox copies

* another fix

* fix some tests

* fix more tests

* fix lasts tests

* fix copies

* better docstring

* delete print

* Remove nested import logic for torchvision (#40940)

* remove nested import logic for torchvision

* remove unnecessary protected imports

* remove unnecessarry protected import in modular (and modeling)

* fix wrongly remove protected imports

* Fix `Glm4vModelTest::test_eager_matches_fa2_generate` (#40947)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Update expected values for some `test_speculative_generation` (#40949)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Standardize audio embedding function name for audio multimodal models (#40919)

* Standardize audio embedding function name for audio multimodal models

* PR review

* Add FlexOlmo model (#40921)

* transformers add-new-model-like

* Add FlexOlmo implementation

* Update FlexOlmo docs

* Set default tokenization for flex olmo

* Update FlexOlmo tests

* Update attention comment

* Remove unneeded use of `sliding_window`

* Don't list dropout in eager_paged_attention_forward (#40924)

Remove dropout argument

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update expected values for one more `test_speculative_generation` after #40949 (#40967)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* FIX(trainer): ensure final checkpoint is saved when resuming training (#40347)

* fix(trainer): ensure final checkpoint is saved when resuming training

* add test

* make style && slight fix of test

* make style again

* move test code to test_trainer

* remove outdated test file

* Apply style fixes

---------

Co-authored-by: rangehow <rangehow@foxmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Add new model LFM2-VL (#40624)

* Add LFM2-VL support

* add tests

* linting, formatting, misc review changes

* add siglip2 to auto config and instantiate it in lfm2-vl configuration

* decouple image processor from processor

* remove torch import from configuration

* replace | with Optional

* remove layer truncation from modeling file

* fix copies

* update everything

* fix test case to use tiny model

* update the test cases

* fix finally the image processor and add slow tests

* fixup

* typo in docs

* fix tests

* the doc name uses underscore

* address comments from Yoni

* delete tests and unsuffling

* relative import

* do we really handle imports better now?

* fix test

* slow tests

* found a bug in ordering + slow tests

* fix copies

* dont run compile test

---------

Co-authored-by: Anna <anna@liquid.ai>
Co-authored-by: Anna Banaszak <48625325+ankke@users.noreply.github.com>

* Fix outdated version checks of accelerator (#40969)

* Fix outdated version checks of accelerator

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix outdated version checks of accelerator

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use `skip_predictor=True` in vjepa2 `get_vision_features` (#40966)

use skip_predictor in vjepa2 `get_vision_features`

* [Trainer] Fix DP loss (#40799)

* fix

* style

* Fix fp16

* style

---------

Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>

* [timm_wrapper] better handling of "Unknown model" exception in timm (#40951)

* fix(timm): Add exception handling for unknown Gemma3n model

* nit: Let’s cater to this specific issue

* nit: Simplify error handling

* Fix Issue #39030: AutoTokenizer.from_pretrained does not propagate token (#40956)

* fix merge conflicts

* change token typing

---------

Co-authored-by: Ubuntu <ubuntu@ip-172-31-27-253.ec2.internal>

* [tests] Really use small models in all fast tests (#40945)

* start

* xcodec

* chameleon

* start

* layoutlm2

* layoutlm

* remove skip

* oups

* timm_wrapper

* add default

* doc

* consistency

* Add captured actual outputs to CI artifacts (#40965)

* fix

* fix

* Remove `# TODO: ???` as it make me `???`

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Revert change in `compile_friendly_resize` (#40645)

fix

* Track the CI (model) jobs that don't produce test output files (process being killed etc.) (#40981)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Using torch.distributions.Categorical

* Remove `set_model_tester_for_less_flaky_tests` (#40982)

remove

* Benchmarking v2 GH workflows (#40716)

* WIP benchmark v2 workflow

* Container was missing

* Change to sandbox branch name

* Wrong place for image name

* Variable declarations

* Remove references to file logging

* Remove unnecessary step

* Fix deps install

* Syntax

* Add workdir

* Add upload feature

* typo

* No need for hf_transfer

* Pass in runner

* Runner config

* Runner config

* Runner config

* Runner config

* Runner config

* mi325 caller

* Name workflow runs properly

* Copy-paste error

* Add final repo IDs and schedule

* Review comments

* Remove wf params

* Remove parametrization from worfkflow files

* Fix callers

* Change push trigger to pull_request + label

* Add back schedule event

* Push to the same dataset

* Simplify parameter description

* 🔴[`Attention`] Bert-based Models Attention Refactor (#38301)

* clean start to bert refactor

* some test fixes

* style

* fix last tests

* be strict on positional embeddings, fixup according tests

* cache support

* more cache fixes, new causal API

* simplify masks, fix tests for gen

* flex attn, static cache support, round of fixes

* ?

* this time

* style

* fix flash attention tests, flex attention requires torch 2.7.x to work with multiple classes (as recompile strats force a size call which is wrongly interpreted before)

* roberta

* fixup sdpa remains

* attention split, simplify args and kwargs, better typing

* fix encoder decoder

* fix test

* modular roberta

* albert

* data2vectext, making it modular tomorrow

* modular data2vec text

* tmp disable

* xmod + cache position fixes

* whoops

* electra + markuplm, small fixes

* remove wrong copy

* xlm_roberta + some embedding fixes

* roberta prelayernorm

* RemBert: remove copy, maybe doing it later

* ernie

* fix roberta offloading

* camembert

* copy fixes

* bert generation + fixes on eager

* xlm roberta xl

* bridgetower (text) + seamlessv2 copy fixes

* rocbert + small fixes

* whoops

* small round of fixups

* NOTE: kernels didnt load with an earlier version, some fixup (needs another look bc cross deps)

* the end of the tunnel?

* fixup nllbmoe + style

* we dont need this anymore

* megatron bert is barely used, low prio skip for now

* Modernize bert (template for others)

NOTE: trying to push this through, might be overdue if not in time possible

* check inputs for all others (if checkmarked)

* fix bridgetower

* style

* fix encoder decoder (partially but cause found and fix also, just needs to be done for everything else)

* proper fix for bert to force intermediate dict outputs

* propagate to others

* style

* xlm roberta xl investigation, its the layernorm...

* mobile bert

* revert this, might cause issues with composed models

* review

* style

* Remove [[autodoc]] refs to TF/Flax objects (#40996)

* remove refs

* more

* ENH: Enable readline support for transformers chat (#40911)

ENH Enable readline support for chat

This small change enables GNU readline support for the transformers chat
command. This includes, among others:

- advanced navigation and editing: ctrl + a ctrl + e alt + b alt + f
  ctrl + k alt + d etc.
- navigate and search history: arrow up/down ctrl + p ctrl + n  ctrl + r
- undo: ctrl + _
- clear screen: ctrl + l

Implementation

Although it may look strange, just importing readline is enough to
enable it in Python, see:

https://docs.python.org/3/library/functions.html#input

As readline is not available on some
platforms (https://docs.python.org/3/library/readline.html), the import
is guarded.

Readline should work on Linux, MacOS, and with WSL, I'm not sure about
Windows though. Ideally, someone can give it a try. It's possible that
Windows users would have to install
pyreadline (https://pypi.org/project/pyreadline3/).

* [testing] test `num_hidden_layers` being small in model tester (#40992)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* blt wip (#38579)

* blt wip

* cpu version

* cpu friendly with full entropy model (real time patching)

* adding config file instead of args file

* enable MPS

* refactoring unused code

* single config class in config file

* inherit from PreTrainedModel

* refactor LMTransformer --> BLTPatcher

* add conversion script

* load from new checkpoing with form_pretrained

* fixed demo from_pretrained

* clean up

* clean a few comments

* cleanup folder

* clean up dir

* cleaned up modeling further

* rename classes

* adding transformers Attention class and RotaryEmbedding class

* exchanged blt modules for transformers modules: attention, rotary_emb, create_causal_mask, etc

* seperate out patcher config, update modeling and conversion script

* rename vars to be more transformers-like

* rm unused functions

* adding cross attention from transformers

* pass arg

* rename weights

* updated conversion script

* overwritten commit! fixing PR

* apply feedback

* adding BLTRMSNorm like Llama

* add repeat_kv and eager_attention_forward copied from

* BLTMLP identical to MllamTextMLP

* clean up some args'

* more like mllama, but busier inits

* BLTTransformerLayer config

* decoder, encoder, global configs

* wip working on modular file

* cleaning up patch and configs

* clean up patcher helpers

* clean up patcher helpers further

* clean up

* some config renaming

* clean up unused configs

* clean up configs

* clean up configs

* update modular

* clean

* update demo

* config more like mllama, seperated subconfigs from subdicts

* read from config instead of self args

* update demo file

* model weights to causal lm weights

* missed file

* added tied weights keys

* BLTForCausalLM

* adding files after add-new-model-like

* update demo

* working on tests

* first running integration tests

* added integration tests

* adding tokenization tests, integration tests, and cleaned up tokenization file, + ruff

* tokenizer clean up

* modular file

* fixing rebase

* ruff

* adding correct basemodel output and updating config with checkpoint vals (for testing)

* BLTModelTests git status

* enabling inputs_embeds, although won't be equal to input_ids since need ids for patching logic

* fix sdpa == causal tests

* fix small model test and some gradient checkpointing

* skip training GC tests

* fix test

* updated modular

* update modular

* ruff

* adding modular + modeling

* modular

* more modern is_casual check

* cleaning up modular

* more modular reduction

* ruff

* modular fix

* fix styling

* return 2

* return 2

* fix some tests

* fix bltcrossattention after modular break

* some fixes / feedback

* try cache generate fix

* try cache generate fix

* fix generate tests

* attn_impl workaround

* refactoring to use recent TransformersKwargs changes

* fix hidden_states shape test

* refactor to new outputs

* simplify outputs a bit

* rm unneeded decoderlayer overwriting

* rename blt

* forgot tokenizer test renamed

* Reorder

* Reorder

* working on modular

* updates from modular

* new modular

* ruff and such

* update pretrainedmodel modular

* using cohere2 apply_rotary_pos_emb

* small changes

* apply feedback r2

* fix cross_attention

* apply more feedback

* update modeling fix

* load submodules from pretrainedmodel

* set initializer_range to subconfigs

* rm cross_attnetion_states pass when not needed

* add 7b projection layer support

* check repo

* make copies

* lost cohere2 rotate_half

* ruff

* copies?

* don't tie weights for submodules

* tie weights setting

* check docstrings

* apply feedback

* rebase

* rebased modeling

* update docs

* applying feedback

* few more fixes

* fix can_record_outputs

* fast tokenizer

* no more modulelist

* tok auto

* rm tokenizersss

* fix docs

* ruff

* fix after rebase

* fix test, configs are not subscriptable

---------

Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-30.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-103.ec2.internal>
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-36.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-45.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-173-121.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-103.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-178.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-79.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-169-239.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-111.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-100.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-153.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-15.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-131.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-138.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-215.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-142.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-147.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-0.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-58.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-202.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-244.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-186.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-192.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-14.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-171-249.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-75.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-78.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-134.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-180.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-175-241.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-225.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-9.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-34.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-68.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-175.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-170-160.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-95.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-73.ec2.internal>

* [docs] rm stray tf/flax autodocs references (#40999)

rm tf references

* [`RMSNorm`] Fix rms norm init for models that center around 1 (#40796)

* fix

* fixup inits

* oops

* fixup gemma

* fixup modular order

* how does this keep happen lol

* vaultgemma is new i forgot

* remove init check

* Make `EfficientLoFTRModelTest` faster (#41000)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix typoes in src and tests (#40845)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix more dates in model cards and wrong modalities in _toctree.yml (#40955)

* Fix model cards and modalities in toctree

* fix new models

* RUFF fix on CI scripts (#40805)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* fix dict like init for ModelOutput (#41002)

* fix dict like init

* style

* 🚨 [v5] remove generate output retrocompatibility aliases (#40998)

remove old type aliases

* [tests] update `test_left_padding_compatibility` (and minimize overwrites) (#40980)

* update test (and overwrites)

* better test comment

* 0 as a default for

* Patch more `unittest.case.TestCase.assertXXX` methods (#41008)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* 🚨 [v5] remove deprecated entry point (#40997)

* remove old entry point

* update references to transformers-cli

* 🚨 [lightglue] fix: matches order changed because of early stopped indices (#40859)

* fix: bug that made early stop change order of matches

* fix: applied code suggestion

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fix: applied code suggestion to modular

* fix: integration tests

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Fix `PhimoeIntegrationTest` (#41007)

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix Glm4v test (#41011)

fix

* Update after #41007 (#41014)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix benchmark runner argument name (#41012)

* Adding support for Qwen3Omni (#41025)

* Add Qwen3Omni

* make fix-copies, import properly

* nit

* fix wrong setup. Why was audio_token_id renamed ?

* upds

* more processing fixes

* yup

* fix more generation tests

* down to 1?

* fix import issue

* style, update check repo

* up

* fix quality at my best

* final quality?

* fix doc building

* FINAL COMMIT: SKIP IMPORTANT BUT FAILING TESTS FOR MERGE

* SKIP THE TEMPLATE ONE

---------

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>

* Making compute_loss_func always take priority in Trainer (#40632)

* logger warn, if-else logic improved

* redundant if condition fix

* Modify Qwen3Omni parameter name since VL changed it (#41045)

Modify parameter name since VL changed it

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>

* Fix Qwen video tests (#41049)

fix test

* [testing] Fix `qwen2_audio` (#41018)

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix typing of tuples (#41028)

* Fix tuple typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove optax (#41030)

Remove optax dep

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typos in English/Chinese documentation (#41031)

* Fix typos and formatting in English docs

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typos and formatting in Chinese docs

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use torch.autocast (#40975)

* Use torch.autocast

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Format code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* docs: improved RoPE function Docstrings (#41004)

* docs: improved RoPE functuon docstrings

* Update src/transformers/modeling_rope_utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Fix condition for emitting warning when generation exceeds max model length (#40775)

correct warning when generation exceeds max model length

Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>

* Fix outdated torch version check (#40925)

Update torch minimum version check to 2.2

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove doc of tf and flax (#41029)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add Whole Word Masking and Padding Strategy to DataCollatorForLanguageModeling (#39485)

* Add whole word masking

* Vectorize whole word masking functions

* Unit test whole word masking

* Remove support for TF in whole word masking

* [testing] Fix `seed_oss` (#41052)

* fix

* fix

* fix

* fix

* fix

* fix

* Update tests/models/seed_oss/test_modeling_seed_oss.py

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Remove repeated import (#40937)

* Remove repeated import

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix conflict

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Simplify unnecessary Optional typing (#40839)

Remove Optional

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add write token for uploading benchmark results to the Hub (#41047)

* Separate write token for Hub upload

* Address review comments

* Address review comments

* Ci utils (#40978)

* Add CI reports dir to gitignore

* Add utils to run local CI

* Review compliance

* Style

* License

* Remove <frameworkcontent> and <pt> tags from documentation (#41055)

* Remove <frameworkcontent> and <pt> tags

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Revert changes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update docs/source/en/model_doc/madlad-400.md

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Fix CI jobs being all red 🔴 (false positive) (#41059)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Update quantization CI (#41068)

* fix

* new everything

* fix

* [i18n-bn] Add Bengali language README file (#40935)

* [i18n-bn] Add Bengali language README file and update links in existing language files

* Update Bengali README for clarity and consistency in model descriptions

* Improve documentation and errors in Mamba2-based models (#41063)

* fix bug in Mamba2 docs

* correct 'because on of' issue

* link to other Mamba2 model types

* github URL is not changed

* update error message in generated files

* Update team member list for some CI workflows (#41094)

* update list

* update list

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* fix crash when using chat to send 2+ request to gptoss (#40536)

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

* Minor addition, no split modules for VideoMAEE (#41051)

* added no split modules

* fixed typo

---------

Co-authored-by: Raushan Turganbay <raushan@huggingface.co>

* Switch to `python:3.10-slim` for CircleCI docker images (#41067)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix argument name in benchmarking script (#41086)

* Fix argument name in benchmarking script

* Adjust vars

* Remove mention of TensorFlow/Flax/JAX from English documentation (#41058)

Remove mention of TensorFlow from English documentation

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typos in documentation (#41087)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typing (#40788)

* Fix optional typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix optional typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix schema typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typing

* Fix typing

* Fix typing

* Fix typing

* Use np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Format code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Improve typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix quote string of np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix code

* Format

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove unused arguments (#40916)

* Fix unused arguments

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove tf and flax from Chinese documentation (#41057)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* fix wrong height and width when read video use torchvision (#41091)

* docs: Fix Tool Use links and remove dead RAG links (#41104)

docs: Fix tool use links. Remove dead RAG links. Fix style

* 🚨 [generate] update paligemma mask updates (and other assisted generation-related fixes) (#40917)

* tmp

* fix modular inheritance

* nit

* paligemma 1 doesn't have swa

* use same pattern as in models with hybrid layers

* PR comments

* helium also needs layer_typed (bc it relies on gemma)

* paligemma/gemma3: same mask creation fn in fwd and generate

* propagate changes to helium (gemma-based)

* tmp commit

* slow paligemma tests passing, let's see what breaks

* fix test_left_padding_compatibility

* tmp commit

* tmp commit

* rebase error

* docs

* reduce diff

* like this?

* t5gemma

* better comment

* shorter diff

* exception

* ffs type

* optional

* shorter modular_gemma.py

* helium model actually needs no changes -- the tester is the issue

* t5gemma modular config

* a few more modular; paligemma BC

* fix processor issues?

* rm config exception

* lift warning in gemma

* [tests] gpt2 + `CausalLMModelTester` (#41003)

* tmp commit

* tmp commit

* tmp commit

* rm old GPT2ModelTester

* nit bug

* add facilities for encoder-decoder tests; add comments on ALL overwrites/extra fns

* vision_encoder_decoder

* Fix `_get_test_info` for inherited tests (#41106)

* fix _get_test_info

* fix patched

* add comment

* ruff

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Remove bad test skips (#41109)

* remove bad skips

* remove more

* fix inits

* Format empty lines and white space in markdown files. (#41100)

* Remove additional white space and empty lines from markdown files

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add empty lines around code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update ruff to 0.13.1 + target Python 3.10 + apply fixes (#37809)

Update ruff to 0.13.1 target it to Python 3.10 and apply its fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* 🚨 [V5] Remove deprecated training arguments  (#41017)

* Remove deprecated training arguments from V5

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove deprecated training arguments from V5

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix comments

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Support loading LFM2 GGUF (#41111)

* add gguf config mapping for lfm2

* add lfm2 tensor process to unsqueeze conv weights

* adjust values from gguf config to HF config

* add test for lfm2 gguf

* ruff

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* [torchao safetensors] integrate torchao safetensors support with transformers  (#40735)

* enable torchao safetensors

* enable torchao safetensors support

* add more version checking

* [Qwen3-next] Fix dimension mismatch in torch_chunk_gated_delta_rule and torch_recurrent_gated_delta_rule (#40963) (#41036)

* fix mismatched dims for qwen3 next

* propagate changes

* chore: renamed tot_heads to total_sequence_length

* Apply suggestion from @vasqu

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* minor fix to modular qwen3 next file

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Fix the error where a keyword argument appearing before *args (#41099)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix broken `` expressions in markdown files (#41113)

Fix broken expressions in markdown files

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove self-assignment (#41062)

* Remove self-assignment

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update src/transformers/integrations/flash_paged.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Clear pass

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Clear pass

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Clear pass

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* 🚨Refactor: Update text2text generation pipelines to use max_new_tokens… (#40928)

* Refactor: Update text2text generation pipelines to use max_new_tokens and resolve max_length warning

* docs(text2text_generation): 更新参数注释以反映现代生成实践

将max_length参数注释更新为max_new_tokens,以符合现代生成实践中指定生成新token数量的标准做法

* refactor(text2text_generation): Remove outdated input validation logic

* docs(text2text_generation): Revert incorrectly modified comment

* docs(text2text_generation): Revert incorrectly modified comment

* Fixed MXFP4 model storage issue (#41118)

* Fixed loading LongT5 from legacy checkpoints (#40724)

* Fixed loading LongT5 from legacy checkpoints

* Adapted the fix to work with missing lm_head

* dummy commit (#41133)

* dummy commit, nothing interesting

* dummy commit, nothing interesting

* dummy commit, nothing interesting

* dummy commit, nothing interesting

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix loading logic flaw with regards to unexpected and missing keys (#40850)

* Unexpected keys should be ignored at load with device map

* remove them all

* fix logic flaw

* fix

* simplify

* style

* fix

* revert caching allocator change

* add other test

* add nice doc

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Using torch.distributions.Categorical

* Resolving logits_process.py Issues

* style: autoformat with make fixup

* Update logits_process.py removed defaults

* Variable H name -> cumulative_entropy

* Resolving format error

* Correction of the loop variables in logit processor

* Vectorized the loop in logits_process

* formatted  logits_process

* paper reference and stopping rule comment logits_process

* Trigger CI rerun

* Update logits_process.py

* added test_TopH_example_integration

* added test_TopH_example_integration

* Update README.md

* Restore CI config to match main (remove accidental changes)

* Restore CI config to match upstream main (no diffs)

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
Co-authored-by: ArminAzizi98 <147081650+ArminAzizi98@users.noreply.github.com>
Co-authored-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Yuchao Zhang <418121364@qq.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
Co-authored-by: Bo Zheng <368586905@qq.com>
Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Rémi Ouazan <83456801+remi-or@users.noreply.github.com>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
Co-authored-by: Amer <amersinha@gmail.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Ákos Hadnagy <akos@ahadnagy.com>
Co-authored-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>
Co-authored-by: NanoCode012 <nano@axolotl.ai>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: 艾力可 <178652170+thalahors@users.noreply.github.com>
Co-authored-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com>
Co-authored-by: Manuel de Prada Corral <6536835+manueldeprada@users.noreply.github.com>
Co-authored-by: Samuel Barry <127697809+SamuelBarryCS@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: HyunZ118 <156191095+HyunZ118@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: Shane A <shanea@allenai.org>
Co-authored-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: vb <vaibhavs10@gmail.com>
Co-authored-by: Yaswanth Gali <82788246+yaswanth19@users.noreply.github.com>
Co-authored-by: Akshay Babbar <priv.akshay@outlook.com>
Co-authored-by: liangel-02 <liangel@meta.com>
Co-authored-by: Duc-Viet Hoang <vietyb00@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: lilin-1 <256404019@qq.com>
Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
Co-authored-by: Jack <32371937+jackzhxng@users.noreply.github.com>
Co-authored-by: Rangehow <88258534+rangehow@users.noreply.github.com>
Co-authored-by: rangehow <rangehow@foxmail.com>
Co-authored-by: Anna <anna@liquid.ai>
Co-authored-by: Anna Banaszak <48625325+ankke@users.noreply.github.com>
Co-authored-by: Hamish Scott <41787553+hamishs@users.noreply.github.com>
Co-authored-by: Harshal Janjani <75426551+harshaljanjani@users.noreply.github.com>
Co-authored-by: Branden <brandenkmurray@gmail.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-27-253.ec2.internal>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Ita Zaporozhets <31893021+itazap@users.noreply.github.com>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-30.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-103.ec2.internal>
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-36.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-45.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-173-121.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-103.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-178.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-79.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-169-239.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-111.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-100.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-153.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-15.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-131.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-138.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-215.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-142.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-147.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-0.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-58.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-202.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-244.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-186.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-192.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-14.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-171-249.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-75.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-78.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-134.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-180.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-175-241.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-225.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-9.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-34.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-68.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-175.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-170-160.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-95.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-73.ec2.internal>
Co-authored-by: StevenBucaille <steven.bucaille@gmail.com>
Co-authored-by: BakerBunker <17872844+BakerBunker@users.noreply.github.com>
Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>
Co-authored-by: Ayush <ayushtanwar1729@gmail.com>
Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>
Co-authored-by: Yannick Schnider <Yannick.Schnider1@ibm.com>
Co-authored-by: Ralph Gleaton <70818603+rjgleaton@users.noreply.github.com>
Co-authored-by: Saidur Rahman Pulok <59414463+saidurpulok@users.noreply.github.com>
Co-authored-by: Nick Doiron <ndoiron@mapmeld.com>
Co-authored-by: Wang, Yi <yi.a.wang@intel.com>
Co-authored-by: Duygu Altinok <duygu.altinok12@gmail.com>
Co-authored-by: Jinde.Song <juude.song@gmail.com>
Co-authored-by: hbenoit <60629420+HaroldBenoit@users.noreply.github.com>
Co-authored-by: nnul <107971634+notkisk@users.noreply.github.com>
Co-authored-by: YangKai0616 <kai.yang@intel.com>
Co-authored-by: Karol Szustakowski <61427290+Szustarol@users.noreply.github.com>
Co-authored-by: souvikku <107592858+souvikku@users.noreply.github.com>
omsherikar pushed a commit to omsherikar/transformers that referenced this pull request Oct 8, 2025
… text generation (huggingface#40837)

* init

* added TopH

* Update TopH logits_process.py

* Update logits_process.py

* Update test_logits_process.py

* Update test_logits_process.py

* added test No. 4

* Resolving __init__.py issues

* Resolving configuration_utils.py Issues

* Resolving logits_process.py Issues

* Resolving utils.py Issues

* Resolving test_logits_process.py Issues

* Resolving __init__.py issues

* Resolving logits_process.py Issues

* Resolving __init__.py issues

* Updated Docs

* Updated Docstring

* style: autoformat with make fixup

* Fixing Docstring

* Update logits_process.py removed defaults

* Variable H name -> cumulative_entropy

* Using torch.distributions.Categorical

* Improve torch_dtype checks (#40808)

* Improve torch_dtype checks

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Apply suggestions from code review

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Add VideoProcessors to auto-backend requirements (#40843)

* add it

* fix existing ones

* add perception to auto_mapping...

* Adds Causal Conv 1D kernel for mamba models (#40765)

* add kernel

* make style

* keep causal-conv1d

* small fix

* small fix

* fix modular converter

* modular fix + lazy loading

* revert changes modular

* nit

* hub kernels update

* update

* small nit

* Update no split modules in T5Gemma model (#40810)

* Update no split modules in T5Gemma model

* Update no_split_modules also for T5Gemma modular

* Remove model_split_percents from test cases

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Replace image classification loss functions to `self.loss_function` (#40764)

* Fix the misalignment between the l2norm in GDN of Qwen3-Next and the implementation in the FLA library. (#40842)

* align torch implementation of gdn with fla.

* fix fla import.

* fix

* remove unused attr

* fixes

* strictly align l2norm in Qwen3-Next with FLA implementation.

---------

Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Fixes for continuous batching (#40828)

* Fix for CB attn mask and refactor

* Tests for CB (not all passing)

* Passing tests and a logger fix

* Fixed the KV metrics that were broken when we moved to hybrid alloc

* Fix circular import and style

* Added tests for FA

* Unfolded test to have device expectations

* Fixes for H100

* more fixes for h100

* H100 are good

* Style

* Adding some comments from #40831

* Rename test

* Avoid 1 letter variables

* Dictonnary is only removed during kwargs

* Test for supported sample

* Fix a unvoluntary slice

* Fixes for non-sliced inputs and small example improvments

* Slice inputs is more understandabe

* Style

* [tests] re-enable aria fast tests (#40846)

* rise from the dead

* test

* [SAM2] Fix inconsistent results with original implementation with input boxes (#40800)

* Fix inconsistencies with box input inference with original repo

* remove print

* always pad

* fix modular

* [Sam2Video] Fix video inference with batched boxes and add test (#40797)

fix video inference with batched boxes and add test

* add: differential privacy research model (#40851)

* VaultGemma

* Removing Sequence and Token classification models. Removing integration tests for now

* Remove pass-only modular code. style fixes

* Update vaultgemma.md

* Update docs/source/en/model_doc/vaultgemma.md

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Update docs/source/en/model_doc/vaultgemma.md

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Add links to model doc

* Correct model doc usage examples

* Updating model doc to describe differences from Gemma 2

* Update model_doc links

* Adding integration tests

* style fixes

* repo consistency

* attribute exception

---------

Co-authored-by: Amer <amersinha@gmail.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* [test] Fix test_eager_matches_sdpa incorrectly skipped (#40852)

* ouput_attentions in typed kwargs

* correct typing in GenericForTokenClassification

* improve

* [tests] move generative tests away from `test_modeling_common.py` (#40854)

move tests

* [generate] Always use decoder config to init cache (#40772)

* mega derp

* fix

* always use the decoder

* Use checkpoint in auto_class_docstring (#40844)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix TrainingArguments.parallelism_config NameError with accelerate<1.10.1 (#40818)

Fix ParallelismConfig type for accelerate < 1.10.1

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Redirect MI355 CI results to dummy dataset (#40862)

* [Bug fix #40813] Fix base_model_tp_plan of Starcoder2 model. (#40814)

Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>

* [docstrings / type hints] Update outdated annotations for `past_key_values`  (#40803)

* some fixes

* nits

* indentation

* indentation

* a bunch of type hints

* bulk changes

* fix florence kwargs  (#40826)

* fix: XIELU act parameters not being casted to correct dtype (#40812)

* Update model tags and integration references in bug report (#40881)

* [Qwen3 Next] Use numerically stable `rsqrt` (#40848)

use numerically stable inverse

* Adding Support for Qwen3-VL Series (#40795)

* add qwen3vl series

* make fixup

* fix import

* re-protect import

* fix it finally (need to merge main into the branch)

* skip processor test (need the checkpoint)

* oups typo

* simplify modular

* remove unecesary attr

* fix layer

* remove unused rope_deltas args

* reuse image def

* remove unnesesary imports

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>

* [`VaultGemma`] Update expectations in integration tests (#40855)

* fix tests

* style

* Fix modular consistency (#40883)

* reapply modular

* add missing one

* 🔴 Move variable output controls to `_prepare_generation_config ` (#40715)

* move checks to validate steps where possible

* fix csm and other models that override _sample

* ops dia you again

* opsie

* joao review

* Move variable output controls to `prepare_inputs_for_generation`

* fix a bunch of models

* back to basics

* final touches

* Clarify passing is_causal in sdpa_attention_paged_forward (#40838)

* Correctly pass is_causal in sdpa_attention_paged_forward

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Improve typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add comment

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Improve comments

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Revert typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use torch.expm1 and torch.log1p for better numerical results (#40860)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add Fast PromptDepthAnything Processor (#40602)

* Test & import setup

* First version passing tests

* Ruff

* Dummy post processing

* Add numerical test

* Adjust

* Doc

* Ruff

* remove unused arg

* Refine interpolation method and push test script

* update bench

* Comments

* Update src/transformers/models/auto/image_processing_auto.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Remove benchmrk script

* Update docstrings

* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* doc

* further process kwargs

* remove it

* remove

* Remove to dict

* remove crop middle

* Remove param specific handling

* Update testing logic

* remove ensure multiple of as kwargs

* fix formatting

* Remove none default and get image size

* Move stuff to _preprocess_image_like_inputs and refacto

* Clean

* ruff

* End of file & comments

* ruff again

* Padding fixed

* Remove comments to pass tests

* Remove prompt depth from kwargs

* Adjust output_size logic

* Docstring for preprocess

* auto_docstring for preprocess

* pass as an arg

* update test batched

* stack images

* remove prompt scale to meter

* return tensors back in preprocess

* remove copying of images

* Update behavior to match old processoer

* Fix batch size of tests

* fix test and fast

* Fix slow processor

* Put tests back to pytorch

* remove check and modify batched tests

* test do_pad + slow processor fix

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>

* Fix deta loading & dataclass (#40878)

* fix

* fix 2

* Remove dict branch of attention_mask in sdpa_attention_paged_forward (#40882)

Remove dict branch of attention_mask

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* 🌐 [i18n-KO] Translated smolvlm.md to Korean (#40414)

* fix: manual edits

* Apply suggestions from code review

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/_toctree.yml

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* 🌐 [i18n-KO] Translated `imageprocessor.md` to Korean (#39557)

* feat: manual translation

* docs: fix ko/_toctree.yml

* Apply suggestions from code review

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* Update docs/source/ko/image_processors.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* [generate] remove docs of a feature that no longer exists (#40895)

* Make debugging failing tests (check and update expect output values) easier 🔥  (#40727)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fixing the call to kernelize (#40628)

* fix

* style

* overload train and eval

* add getter and setter

* Fix getter  regression (#40824)

* test things

* style

* move tests to a sane place

* Fix flaky `Gemma3nAudioFeatureExtractionTest::test_dither` (#40902)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [cache] Merge static sliding and static chunked layer (#40893)

* merge

* get rid of tensors in get_mask_sizes!!

* remove branch

* add comment explanation

* re-add the class with deprecation cycle

* Harmonize CacheLayer names (#40892)

* unify naming

* style

* doc as well

* post rebase fix

* style

* style

* revert

* [cache] Only use scalars in `get_mask_sizes` (#40907)

* remove tensor ops

* style

* style

* Set seed for `Glm4vIntegrationTest` (#40905)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Add Olmo3 model (#40778)

* transformers add-new-model-like for Olmo3

* Implement modular Olmo3

* Update Olmo3 tests

* Copy Olmo2 weight converter to Olmo3

* Implement Olmo3 weight converter

* Fix code quality errors

* Remove unused import

* Address rope-related PR comments

* Update Olmo3 model doc with minimal details

* Fix Olmo3 rope test failure

* Fix 7B integration test

* remove dummy EncodingFast (#40864)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Improve module name handling for local custom code (#40809)

* Improve module name handling for local custom code

* Use `%lazy` in logging messages

* Revert "Use `%lazy` in logging messages"

This reverts commit 5848755d5805e67177c5218f351c0ac852df9340.

* Add notes for sanitization rule in docstring

* Remove too many underscores

* Update src/transformers/dynamic_module_utils.py

* Update src/transformers/dynamic_module_utils.py

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Remove `runner_map` (#40880)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* disable `test_fast_is_faster_than_slow` (#40909)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [gemma3] `Gemma3ForConditionalGeneration` compatible with assisted generation (#40791)

* gemma3vision compatible with assisted generation

* docstring

* BC

* docstring

* failing checks

* make fixup

* apply changes to modular

* misc fixes

* is_initialized

* fix poor rebase

* [generate] misc fixes (#40906)

misc fixes

* 🔴Make `center_crop` fast equivalent to slow (#40856)

make center_crop fast equivalent to slow

* Fix dtype in Paligemma (#40912)

* fix dtypes

* fix copies

* delete unused attr

* [Docs] Adding documentation of MXFP4 Quantization (#40885)

* adding mxfp4 quantization docs

* review suggestions

* Apply suggestions from code review

Co-authored-by: vb <vaibhavs10@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: vb <vaibhavs10@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Processor load with multi-processing (#40786)

push

* [Llama4] Remove `image_sizes` arg and deprecate `vision_feature_layer` (#40832)

* Remove unused arg

* deprecate

* revrt one change

* get set go

* version correction

* fix

* make style

* comment

* Fix #40067: Add dedicated UMT5 support to GGUF loader (config, tokenizer, test) (#40218)

* Fix #40067 : add UMT5 support in GGUF loader (config, tokenizer, test)

* chore: fix code formatting and linting issues

* refactor: move UMT5 GGUF test to quantization directory and clean up comments

* chore: trigger CI pipeline

* refactor(tests): Move UMT5 Encoder GGUF test to GgufModelTests. This consolidates the new test into the main class for consistency.

* Add regression check to UMT5 encoder GGUF test

Verify encoder output against reference tensor values with appropriate tolerances for stability.

* Update tests/quantization/ggml/test_ggml.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Update tests/quantization/ggml/test_ggml.py

remove comments

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* [torchao safetensors] renaming get_state_dict function (#40774)

renaming get_state_dict function

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Adding activation kernels (#40890)

* first commit

* add mode

* revert modeling

* add compile

* rm print

* Minor fix for #40727 (#40929)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Add support for Florence-2 training (#40914)

* Support training florence2

* update doc and testing model to florence-community

* fix florence-2 test, use head dim 16 instead of 8 for fa2

* skip test_sdpa_can_dispatch_on_flash

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Add LongCat-Flash (#40730)

* working draft for LongCat

* BC changes to deepseek_v3 for modular

* format

* various modularities

* better tp plan

* better init

* minor changes

* make modular better

* clean up patterns

* Revert a couple of modular commits, because we won't convert in the end

* make things explicit.

* draft test

* toctree, tests and imports

* drop

* woops

* make better things

* update test

* update

* fixes

* style and CI

* convert stuff

* up

* ah, yes, that

* enable gen tests

* fix cache shape in test (sum of 2 things)

* fix tests

* comments

* re-Identitise

* minimize changes

* better defaults

* modular betterment

* fix configuration, add documentation

* fix init

* add integration tests

* add info

* simplify

* update slow tests

* fix

* style

* some additional long tests

* cpu-only long test

* fix last tests?

* urg

* cleaner tests why not

* fix

* improve slow tests, no skip

* style

* don't upcast

* one skip

* finally fix parallelism

* [DOC] Add missing dates in model cards (#40922)

add missing dates

* [models] remove unused `import torch.utils.checkpoint`  (#40934)

* Intel CPU dockerfile (#40806)

* upload intel cpu dockerfile

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update cpu dockerfile

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update label name

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* docs(i18n): Correct the descriptive text in the README_zh-hans.md (#40941)

* Fix trainer tests (#40823)

* fix liger

* fix

* more

* fix

* fix hp

* fix

---------

Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>

* Fix `Glm4vMoeIntegrationTest` (#40930)

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Raise error instead of warning when using meta device in from_pretrained (#40942)

* raise instead of warning

* add timm

* remove

* Consistent naming for images kwargs (#40834)

* use consistent naming for padding

* no validation on pad size

* add warnings

* fix

* fox copies

* another fix

* fix some tests

* fix more tests

* fix lasts tests

* fix copies

* better docstring

* delete print

* Remove nested import logic for torchvision (#40940)

* remove nested import logic for torchvision

* remove unnecessary protected imports

* remove unnecessarry protected import in modular (and modeling)

* fix wrongly remove protected imports

* Fix `Glm4vModelTest::test_eager_matches_fa2_generate` (#40947)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Update expected values for some `test_speculative_generation` (#40949)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Standardize audio embedding function name for audio multimodal models (#40919)

* Standardize audio embedding function name for audio multimodal models

* PR review

* Add FlexOlmo model (#40921)

* transformers add-new-model-like

* Add FlexOlmo implementation

* Update FlexOlmo docs

* Set default tokenization for flex olmo

* Update FlexOlmo tests

* Update attention comment

* Remove unneeded use of `sliding_window`

* Don't list dropout in eager_paged_attention_forward (#40924)

Remove dropout argument

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update expected values for one more `test_speculative_generation` after #40949 (#40967)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* FIX(trainer): ensure final checkpoint is saved when resuming training (#40347)

* fix(trainer): ensure final checkpoint is saved when resuming training

* add test

* make style && slight fix of test

* make style again

* move test code to test_trainer

* remove outdated test file

* Apply style fixes

---------

Co-authored-by: rangehow <rangehow@foxmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Add new model LFM2-VL (#40624)

* Add LFM2-VL support

* add tests

* linting, formatting, misc review changes

* add siglip2 to auto config and instantiate it in lfm2-vl configuration

* decouple image processor from processor

* remove torch import from configuration

* replace | with Optional

* remove layer truncation from modeling file

* fix copies

* update everything

* fix test case to use tiny model

* update the test cases

* fix finally the image processor and add slow tests

* fixup

* typo in docs

* fix tests

* the doc name uses underscore

* address comments from Yoni

* delete tests and unsuffling

* relative import

* do we really handle imports better now?

* fix test

* slow tests

* found a bug in ordering + slow tests

* fix copies

* dont run compile test

---------

Co-authored-by: Anna <anna@liquid.ai>
Co-authored-by: Anna Banaszak <48625325+ankke@users.noreply.github.com>

* Fix outdated version checks of accelerator (#40969)

* Fix outdated version checks of accelerator

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix outdated version checks of accelerator

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use `skip_predictor=True` in vjepa2 `get_vision_features` (#40966)

use skip_predictor in vjepa2 `get_vision_features`

* [Trainer] Fix DP loss (#40799)

* fix

* style

* Fix fp16

* style

---------

Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>

* [timm_wrapper] better handling of "Unknown model" exception in timm (#40951)

* fix(timm): Add exception handling for unknown Gemma3n model

* nit: Let’s cater to this specific issue

* nit: Simplify error handling

* Fix Issue #39030: AutoTokenizer.from_pretrained does not propagate token (#40956)

* fix merge conflicts

* change token typing

---------

Co-authored-by: Ubuntu <ubuntu@ip-172-31-27-253.ec2.internal>

* [tests] Really use small models in all fast tests (#40945)

* start

* xcodec

* chameleon

* start

* layoutlm2

* layoutlm

* remove skip

* oups

* timm_wrapper

* add default

* doc

* consistency

* Add captured actual outputs to CI artifacts (#40965)

* fix

* fix

* Remove `# TODO: ???` as it make me `???`

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Revert change in `compile_friendly_resize` (#40645)

fix

* Track the CI (model) jobs that don't produce test output files (process being killed etc.) (#40981)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Using torch.distributions.Categorical

* Remove `set_model_tester_for_less_flaky_tests` (#40982)

remove

* Benchmarking v2 GH workflows (#40716)

* WIP benchmark v2 workflow

* Container was missing

* Change to sandbox branch name

* Wrong place for image name

* Variable declarations

* Remove references to file logging

* Remove unnecessary step

* Fix deps install

* Syntax

* Add workdir

* Add upload feature

* typo

* No need for hf_transfer

* Pass in runner

* Runner config

* Runner config

* Runner config

* Runner config

* Runner config

* mi325 caller

* Name workflow runs properly

* Copy-paste error

* Add final repo IDs and schedule

* Review comments

* Remove wf params

* Remove parametrization from worfkflow files

* Fix callers

* Change push trigger to pull_request + label

* Add back schedule event

* Push to the same dataset

* Simplify parameter description

* 🔴[`Attention`] Bert-based Models Attention Refactor (#38301)

* clean start to bert refactor

* some test fixes

* style

* fix last tests

* be strict on positional embeddings, fixup according tests

* cache support

* more cache fixes, new causal API

* simplify masks, fix tests for gen

* flex attn, static cache support, round of fixes

* ?

* this time

* style

* fix flash attention tests, flex attention requires torch 2.7.x to work with multiple classes (as recompile strats force a size call which is wrongly interpreted before)

* roberta

* fixup sdpa remains

* attention split, simplify args and kwargs, better typing

* fix encoder decoder

* fix test

* modular roberta

* albert

* data2vectext, making it modular tomorrow

* modular data2vec text

* tmp disable

* xmod + cache position fixes

* whoops

* electra + markuplm, small fixes

* remove wrong copy

* xlm_roberta + some embedding fixes

* roberta prelayernorm

* RemBert: remove copy, maybe doing it later

* ernie

* fix roberta offloading

* camembert

* copy fixes

* bert generation + fixes on eager

* xlm roberta xl

* bridgetower (text) + seamlessv2 copy fixes

* rocbert + small fixes

* whoops

* small round of fixups

* NOTE: kernels didnt load with an earlier version, some fixup (needs another look bc cross deps)

* the end of the tunnel?

* fixup nllbmoe + style

* we dont need this anymore

* megatron bert is barely used, low prio skip for now

* Modernize bert (template for others)

NOTE: trying to push this through, might be overdue if not in time possible

* check inputs for all others (if checkmarked)

* fix bridgetower

* style

* fix encoder decoder (partially but cause found and fix also, just needs to be done for everything else)

* proper fix for bert to force intermediate dict outputs

* propagate to others

* style

* xlm roberta xl investigation, its the layernorm...

* mobile bert

* revert this, might cause issues with composed models

* review

* style

* Remove [[autodoc]] refs to TF/Flax objects (#40996)

* remove refs

* more

* ENH: Enable readline support for transformers chat (#40911)

ENH Enable readline support for chat

This small change enables GNU readline support for the transformers chat
command. This includes, among others:

- advanced navigation and editing: ctrl + a ctrl + e alt + b alt + f
  ctrl + k alt + d etc.
- navigate and search history: arrow up/down ctrl + p ctrl + n  ctrl + r
- undo: ctrl + _
- clear screen: ctrl + l

Implementation

Although it may look strange, just importing readline is enough to
enable it in Python, see:

https://docs.python.org/3/library/functions.html#input

As readline is not available on some
platforms (https://docs.python.org/3/library/readline.html), the import
is guarded.

Readline should work on Linux, MacOS, and with WSL, I'm not sure about
Windows though. Ideally, someone can give it a try. It's possible that
Windows users would have to install
pyreadline (https://pypi.org/project/pyreadline3/).

* [testing] test `num_hidden_layers` being small in model tester (#40992)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* blt wip (#38579)

* blt wip

* cpu version

* cpu friendly with full entropy model (real time patching)

* adding config file instead of args file

* enable MPS

* refactoring unused code

* single config class in config file

* inherit from PreTrainedModel

* refactor LMTransformer --> BLTPatcher

* add conversion script

* load from new checkpoing with form_pretrained

* fixed demo from_pretrained

* clean up

* clean a few comments

* cleanup folder

* clean up dir

* cleaned up modeling further

* rename classes

* adding transformers Attention class and RotaryEmbedding class

* exchanged blt modules for transformers modules: attention, rotary_emb, create_causal_mask, etc

* seperate out patcher config, update modeling and conversion script

* rename vars to be more transformers-like

* rm unused functions

* adding cross attention from transformers

* pass arg

* rename weights

* updated conversion script

* overwritten commit! fixing PR

* apply feedback

* adding BLTRMSNorm like Llama

* add repeat_kv and eager_attention_forward copied from

* BLTMLP identical to MllamTextMLP

* clean up some args'

* more like mllama, but busier inits

* BLTTransformerLayer config

* decoder, encoder, global configs

* wip working on modular file

* cleaning up patch and configs

* clean up patcher helpers

* clean up patcher helpers further

* clean up

* some config renaming

* clean up unused configs

* clean up configs

* clean up configs

* update modular

* clean

* update demo

* config more like mllama, seperated subconfigs from subdicts

* read from config instead of self args

* update demo file

* model weights to causal lm weights

* missed file

* added tied weights keys

* BLTForCausalLM

* adding files after add-new-model-like

* update demo

* working on tests

* first running integration tests

* added integration tests

* adding tokenization tests, integration tests, and cleaned up tokenization file, + ruff

* tokenizer clean up

* modular file

* fixing rebase

* ruff

* adding correct basemodel output and updating config with checkpoint vals (for testing)

* BLTModelTests git status

* enabling inputs_embeds, although won't be equal to input_ids since need ids for patching logic

* fix sdpa == causal tests

* fix small model test and some gradient checkpointing

* skip training GC tests

* fix test

* updated modular

* update modular

* ruff

* adding modular + modeling

* modular

* more modern is_casual check

* cleaning up modular

* more modular reduction

* ruff

* modular fix

* fix styling

* return 2

* return 2

* fix some tests

* fix bltcrossattention after modular break

* some fixes / feedback

* try cache generate fix

* try cache generate fix

* fix generate tests

* attn_impl workaround

* refactoring to use recent TransformersKwargs changes

* fix hidden_states shape test

* refactor to new outputs

* simplify outputs a bit

* rm unneeded decoderlayer overwriting

* rename blt

* forgot tokenizer test renamed

* Reorder

* Reorder

* working on modular

* updates from modular

* new modular

* ruff and such

* update pretrainedmodel modular

* using cohere2 apply_rotary_pos_emb

* small changes

* apply feedback r2

* fix cross_attention

* apply more feedback

* update modeling fix

* load submodules from pretrainedmodel

* set initializer_range to subconfigs

* rm cross_attnetion_states pass when not needed

* add 7b projection layer support

* check repo

* make copies

* lost cohere2 rotate_half

* ruff

* copies?

* don't tie weights for submodules

* tie weights setting

* check docstrings

* apply feedback

* rebase

* rebased modeling

* update docs

* applying feedback

* few more fixes

* fix can_record_outputs

* fast tokenizer

* no more modulelist

* tok auto

* rm tokenizersss

* fix docs

* ruff

* fix after rebase

* fix test, configs are not subscriptable

---------

Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-30.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-103.ec2.internal>
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-36.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-45.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-173-121.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-103.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-178.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-79.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-169-239.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-111.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-100.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-153.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-15.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-131.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-138.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-215.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-142.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-147.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-0.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-58.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-202.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-244.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-186.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-192.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-14.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-171-249.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-75.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-78.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-134.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-180.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-175-241.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-225.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-9.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-34.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-68.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-175.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-170-160.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-95.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-73.ec2.internal>

* [docs] rm stray tf/flax autodocs references (#40999)

rm tf references

* [`RMSNorm`] Fix rms norm init for models that center around 1 (#40796)

* fix

* fixup inits

* oops

* fixup gemma

* fixup modular order

* how does this keep happen lol

* vaultgemma is new i forgot

* remove init check

* Make `EfficientLoFTRModelTest` faster (#41000)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix typoes in src and tests (#40845)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix more dates in model cards and wrong modalities in _toctree.yml (#40955)

* Fix model cards and modalities in toctree

* fix new models

* RUFF fix on CI scripts (#40805)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* fix dict like init for ModelOutput (#41002)

* fix dict like init

* style

* 🚨 [v5] remove generate output retrocompatibility aliases (#40998)

remove old type aliases

* [tests] update `test_left_padding_compatibility` (and minimize overwrites) (#40980)

* update test (and overwrites)

* better test comment

* 0 as a default for

* Patch more `unittest.case.TestCase.assertXXX` methods (#41008)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* 🚨 [v5] remove deprecated entry point (#40997)

* remove old entry point

* update references to transformers-cli

* 🚨 [lightglue] fix: matches order changed because of early stopped indices (#40859)

* fix: bug that made early stop change order of matches

* fix: applied code suggestion

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fix: applied code suggestion to modular

* fix: integration tests

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Fix `PhimoeIntegrationTest` (#41007)

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix Glm4v test (#41011)

fix

* Update after #41007 (#41014)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix benchmark runner argument name (#41012)

* Adding support for Qwen3Omni (#41025)

* Add Qwen3Omni

* make fix-copies, import properly

* nit

* fix wrong setup. Why was audio_token_id renamed ?

* upds

* more processing fixes

* yup

* fix more generation tests

* down to 1?

* fix import issue

* style, update check repo

* up

* fix quality at my best

* final quality?

* fix doc building

* FINAL COMMIT: SKIP IMPORTANT BUT FAILING TESTS FOR MERGE

* SKIP THE TEMPLATE ONE

---------

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>

* Making compute_loss_func always take priority in Trainer (#40632)

* logger warn, if-else logic improved

* redundant if condition fix

* Modify Qwen3Omni parameter name since VL changed it (#41045)

Modify parameter name since VL changed it

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>

* Fix Qwen video tests (#41049)

fix test

* [testing] Fix `qwen2_audio` (#41018)

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix typing of tuples (#41028)

* Fix tuple typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove optax (#41030)

Remove optax dep

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typos in English/Chinese documentation (#41031)

* Fix typos and formatting in English docs

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typos and formatting in Chinese docs

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use torch.autocast (#40975)

* Use torch.autocast

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Format code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* docs: improved RoPE function Docstrings (#41004)

* docs: improved RoPE functuon docstrings

* Update src/transformers/modeling_rope_utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Fix condition for emitting warning when generation exceeds max model length (#40775)

correct warning when generation exceeds max model length

Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>

* Fix outdated torch version check (#40925)

Update torch minimum version check to 2.2

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove doc of tf and flax (#41029)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add Whole Word Masking and Padding Strategy to DataCollatorForLanguageModeling (#39485)

* Add whole word masking

* Vectorize whole word masking functions

* Unit test whole word masking

* Remove support for TF in whole word masking

* [testing] Fix `seed_oss` (#41052)

* fix

* fix

* fix

* fix

* fix

* fix

* Update tests/models/seed_oss/test_modeling_seed_oss.py

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Remove repeated import (#40937)

* Remove repeated import

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix conflict

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Simplify unnecessary Optional typing (#40839)

Remove Optional

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add write token for uploading benchmark results to the Hub (#41047)

* Separate write token for Hub upload

* Address review comments

* Address review comments

* Ci utils (#40978)

* Add CI reports dir to gitignore

* Add utils to run local CI

* Review compliance

* Style

* License

* Remove <frameworkcontent> and <pt> tags from documentation (#41055)

* Remove <frameworkcontent> and <pt> tags

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Revert changes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update docs/source/en/model_doc/madlad-400.md

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Fix CI jobs being all red 🔴 (false positive) (#41059)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Update quantization CI (#41068)

* fix

* new everything

* fix

* [i18n-bn] Add Bengali language README file (#40935)

* [i18n-bn] Add Bengali language README file and update links in existing language files

* Update Bengali README for clarity and consistency in model descriptions

* Improve documentation and errors in Mamba2-based models (#41063)

* fix bug in Mamba2 docs

* correct 'because on of' issue

* link to other Mamba2 model types

* github URL is not changed

* update error message in generated files

* Update team member list for some CI workflows (#41094)

* update list

* update list

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* fix crash when using chat to send 2+ request to gptoss (#40536)

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

* Minor addition, no split modules for VideoMAEE (#41051)

* added no split modules

* fixed typo

---------

Co-authored-by: Raushan Turganbay <raushan@huggingface.co>

* Switch to `python:3.10-slim` for CircleCI docker images (#41067)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix argument name in benchmarking script (#41086)

* Fix argument name in benchmarking script

* Adjust vars

* Remove mention of TensorFlow/Flax/JAX from English documentation (#41058)

Remove mention of TensorFlow from English documentation

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typos in documentation (#41087)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typing (#40788)

* Fix optional typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix optional typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix schema typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typing

* Fix typing

* Fix typing

* Fix typing

* Use np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Format code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Improve typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix quote string of np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix code

* Format

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove unused arguments (#40916)

* Fix unused arguments

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove tf and flax from Chinese documentation (#41057)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* fix wrong height and width when read video use torchvision (#41091)

* docs: Fix Tool Use links and remove dead RAG links (#41104)

docs: Fix tool use links. Remove dead RAG links. Fix style

* 🚨 [generate] update paligemma mask updates (and other assisted generation-related fixes) (#40917)

* tmp

* fix modular inheritance

* nit

* paligemma 1 doesn't have swa

* use same pattern as in models with hybrid layers

* PR comments

* helium also needs layer_typed (bc it relies on gemma)

* paligemma/gemma3: same mask creation fn in fwd and generate

* propagate changes to helium (gemma-based)

* tmp commit

* slow paligemma tests passing, let's see what breaks

* fix test_left_padding_compatibility

* tmp commit

* tmp commit

* rebase error

* docs

* reduce diff

* like this?

* t5gemma

* better comment

* shorter diff

* exception

* ffs type

* optional

* shorter modular_gemma.py

* helium model actually needs no changes -- the tester is the issue

* t5gemma modular config

* a few more modular; paligemma BC

* fix processor issues?

* rm config exception

* lift warning in gemma

* [tests] gpt2 + `CausalLMModelTester` (#41003)

* tmp commit

* tmp commit

* tmp commit

* rm old GPT2ModelTester

* nit bug

* add facilities for encoder-decoder tests; add comments on ALL overwrites/extra fns

* vision_encoder_decoder

* Fix `_get_test_info` for inherited tests (#41106)

* fix _get_test_info

* fix patched

* add comment

* ruff

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Remove bad test skips (#41109)

* remove bad skips

* remove more

* fix inits

* Format empty lines and white space in markdown files. (#41100)

* Remove additional white space and empty lines from markdown files

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add empty lines around code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update ruff to 0.13.1 + target Python 3.10 + apply fixes (#37809)

Update ruff to 0.13.1 target it to Python 3.10 and apply its fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* 🚨 [V5] Remove deprecated training arguments  (#41017)

* Remove deprecated training arguments from V5

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove deprecated training arguments from V5

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix comments

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Support loading LFM2 GGUF (#41111)

* add gguf config mapping for lfm2

* add lfm2 tensor process to unsqueeze conv weights

* adjust values from gguf config to HF config

* add test for lfm2 gguf

* ruff

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* [torchao safetensors] integrate torchao safetensors support with transformers  (#40735)

* enable torchao safetensors

* enable torchao safetensors support

* add more version checking

* [Qwen3-next] Fix dimension mismatch in torch_chunk_gated_delta_rule and torch_recurrent_gated_delta_rule (#40963) (#41036)

* fix mismatched dims for qwen3 next

* propagate changes

* chore: renamed tot_heads to total_sequence_length

* Apply suggestion from @vasqu

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* minor fix to modular qwen3 next file

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Fix the error where a keyword argument appearing before *args (#41099)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix broken `` expressions in markdown files (#41113)

Fix broken expressions in markdown files

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove self-assignment (#41062)

* Remove self-assignment

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update src/transformers/integrations/flash_paged.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Clear pass

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Clear pass

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Clear pass

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* 🚨Refactor: Update text2text generation pipelines to use max_new_tokens… (#40928)

* Refactor: Update text2text generation pipelines to use max_new_tokens and resolve max_length warning

* docs(text2text_generation): 更新参数注释以反映现代生成实践

将max_length参数注释更新为max_new_tokens,以符合现代生成实践中指定生成新token数量的标准做法

* refactor(text2text_generation): Remove outdated input validation logic

* docs(text2text_generation): Revert incorrectly modified comment

* docs(text2text_generation): Revert incorrectly modified comment

* Fixed MXFP4 model storage issue (#41118)

* Fixed loading LongT5 from legacy checkpoints (#40724)

* Fixed loading LongT5 from legacy checkpoints

* Adapted the fix to work with missing lm_head

* dummy commit (#41133)

* dummy commit, nothing interesting

* dummy commit, nothing interesting

* dummy commit, nothing interesting

* dummy commit, nothing interesting

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix loading logic flaw with regards to unexpected and missing keys (#40850)

* Unexpected keys should be ignored at load with device map

* remove them all

* fix logic flaw

* fix

* simplify

* style

* fix

* revert caching allocator change

* add other test

* add nice doc

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Using torch.distributions.Categorical

* Resolving logits_process.py Issues

* style: autoformat with make fixup

* Update logits_process.py removed defaults

* Variable H name -> cumulative_entropy

* Resolving format error

* Correction of the loop variables in logit processor

* Vectorized the loop in logits_process

* formatted  logits_process

* paper reference and stopping rule comment logits_process

* Trigger CI rerun

* Update logits_process.py

* added test_TopH_example_integration

* added test_TopH_example_integration

* Update README.md

* Restore CI config to match main (remove accidental changes)

* Restore CI config to match upstream main (no diffs)

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
Co-authored-by: ArminAzizi98 <147081650+ArminAzizi98@users.noreply.github.com>
Co-authored-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Yuchao Zhang <418121364@qq.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
Co-authored-by: Bo Zheng <368586905@qq.com>
Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Rémi Ouazan <83456801+remi-or@users.noreply.github.com>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
Co-authored-by: Amer <amersinha@gmail.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Ákos Hadnagy <akos@ahadnagy.com>
Co-authored-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>
Co-authored-by: NanoCode012 <nano@axolotl.ai>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: 艾力可 <178652170+thalahors@users.noreply.github.com>
Co-authored-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com>
Co-authored-by: Manuel de Prada Corral <6536835+manueldeprada@users.noreply.github.com>
Co-authored-by: Samuel Barry <127697809+SamuelBarryCS@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: HyunZ118 <156191095+HyunZ118@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: Shane A <shanea@allenai.org>
Co-authored-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: vb <vaibhavs10@gmail.com>
Co-authored-by: Yaswanth Gali <82788246+yaswanth19@users.noreply.github.com>
Co-authored-by: Akshay Babbar <priv.akshay@outlook.com>
Co-authored-by: liangel-02 <liangel@meta.com>
Co-authored-by: Duc-Viet Hoang <vietyb00@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: lilin-1 <256404019@qq.com>
Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
Co-authored-by: Jack <32371937+jackzhxng@users.noreply.github.com>
Co-authored-by: Rangehow <88258534+rangehow@users.noreply.github.com>
Co-authored-by: rangehow <rangehow@foxmail.com>
Co-authored-by: Anna <anna@liquid.ai>
Co-authored-by: Anna Banaszak <48625325+ankke@users.noreply.github.com>
Co-authored-by: Hamish Scott <41787553+hamishs@users.noreply.github.com>
Co-authored-by: Harshal Janjani <75426551+harshaljanjani@users.noreply.github.com>
Co-authored-by: Branden <brandenkmurray@gmail.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-27-253.ec2.internal>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Ita Zaporozhets <31893021+itazap@users.noreply.github.com>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-30.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-103.ec2.internal>
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-36.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-45.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-173-121.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-103.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-178.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-79.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-169-239.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-111.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-100.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-153.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-15.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-131.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-138.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-215.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-142.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-147.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-0.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-58.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-202.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-244.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-186.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-192.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-14.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-171-249.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-75.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-78.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-134.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-180.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-175-241.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-225.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-9.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-34.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-68.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-175.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-170-160.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-95.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-73.ec2.internal>
Co-authored-by: StevenBucaille <steven.bucaille@gmail.com>
Co-authored-by: BakerBunker <17872844+BakerBunker@users.noreply.github.com>
Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>
Co-authored-by: Ayush <ayushtanwar1729@gmail.com>
Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>
Co-authored-by: Yannick Schnider <Yannick.Schnider1@ibm.com>
Co-authored-by: Ralph Gleaton <70818603+rjgleaton@users.noreply.github.com>
Co-authored-by: Saidur Rahman Pulok <59414463+saidurpulok@users.noreply.github.com>
Co-authored-by: Nick Doiron <ndoiron@mapmeld.com>
Co-authored-by: Wang, Yi <yi.a.wang@intel.com>
Co-authored-by: Duygu Altinok <duygu.altinok12@gmail.com>
Co-authored-by: Jinde.Song <juude.song@gmail.com>
Co-authored-by: hbenoit <60629420+HaroldBenoit@users.noreply.github.com>
Co-authored-by: nnul <107971634+notkisk@users.noreply.github.com>
Co-authored-by: YangKai0616 <kai.yang@intel.com>
Co-authored-by: Karol Szustakowski <61427290+Szustarol@users.noreply.github.com>
Co-authored-by: souvikku <107592858+souvikku@users.noreply.github.com>
AhnJoonSung pushed a commit to AhnJoonSung/transformers that referenced this pull request Oct 12, 2025
… text generation (huggingface#40837)

* init

* added TopH

* Update TopH logits_process.py

* Update logits_process.py

* Update test_logits_process.py

* Update test_logits_process.py

* added test No. 4

* Resolving __init__.py issues

* Resolving configuration_utils.py Issues

* Resolving logits_process.py Issues

* Resolving utils.py Issues

* Resolving test_logits_process.py Issues

* Resolving __init__.py issues

* Resolving logits_process.py Issues

* Resolving __init__.py issues

* Updated Docs

* Updated Docstring

* style: autoformat with make fixup

* Fixing Docstring

* Update logits_process.py removed defaults

* Variable H name -> cumulative_entropy

* Using torch.distributions.Categorical

* Improve torch_dtype checks (#40808)

* Improve torch_dtype checks

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Apply suggestions from code review

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Add VideoProcessors to auto-backend requirements (#40843)

* add it

* fix existing ones

* add perception to auto_mapping...

* Adds Causal Conv 1D kernel for mamba models (#40765)

* add kernel

* make style

* keep causal-conv1d

* small fix

* small fix

* fix modular converter

* modular fix + lazy loading

* revert changes modular

* nit

* hub kernels update

* update

* small nit

* Update no split modules in T5Gemma model (#40810)

* Update no split modules in T5Gemma model

* Update no_split_modules also for T5Gemma modular

* Remove model_split_percents from test cases

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Replace image classification loss functions to `self.loss_function` (#40764)

* Fix the misalignment between the l2norm in GDN of Qwen3-Next and the implementation in the FLA library. (#40842)

* align torch implementation of gdn with fla.

* fix fla import.

* fix

* remove unused attr

* fixes

* strictly align l2norm in Qwen3-Next with FLA implementation.

---------

Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Fixes for continuous batching (#40828)

* Fix for CB attn mask and refactor

* Tests for CB (not all passing)

* Passing tests and a logger fix

* Fixed the KV metrics that were broken when we moved to hybrid alloc

* Fix circular import and style

* Added tests for FA

* Unfolded test to have device expectations

* Fixes for H100

* more fixes for h100

* H100 are good

* Style

* Adding some comments from #40831

* Rename test

* Avoid 1 letter variables

* Dictonnary is only removed during kwargs

* Test for supported sample

* Fix a unvoluntary slice

* Fixes for non-sliced inputs and small example improvments

* Slice inputs is more understandabe

* Style

* [tests] re-enable aria fast tests (#40846)

* rise from the dead

* test

* [SAM2] Fix inconsistent results with original implementation with input boxes (#40800)

* Fix inconsistencies with box input inference with original repo

* remove print

* always pad

* fix modular

* [Sam2Video] Fix video inference with batched boxes and add test (#40797)

fix video inference with batched boxes and add test

* add: differential privacy research model (#40851)

* VaultGemma

* Removing Sequence and Token classification models. Removing integration tests for now

* Remove pass-only modular code. style fixes

* Update vaultgemma.md

* Update docs/source/en/model_doc/vaultgemma.md

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Update docs/source/en/model_doc/vaultgemma.md

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Add links to model doc

* Correct model doc usage examples

* Updating model doc to describe differences from Gemma 2

* Update model_doc links

* Adding integration tests

* style fixes

* repo consistency

* attribute exception

---------

Co-authored-by: Amer <amersinha@gmail.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* [test] Fix test_eager_matches_sdpa incorrectly skipped (#40852)

* ouput_attentions in typed kwargs

* correct typing in GenericForTokenClassification

* improve

* [tests] move generative tests away from `test_modeling_common.py` (#40854)

move tests

* [generate] Always use decoder config to init cache (#40772)

* mega derp

* fix

* always use the decoder

* Use checkpoint in auto_class_docstring (#40844)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix TrainingArguments.parallelism_config NameError with accelerate<1.10.1 (#40818)

Fix ParallelismConfig type for accelerate < 1.10.1

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Redirect MI355 CI results to dummy dataset (#40862)

* [Bug fix #40813] Fix base_model_tp_plan of Starcoder2 model. (#40814)

Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>

* [docstrings / type hints] Update outdated annotations for `past_key_values`  (#40803)

* some fixes

* nits

* indentation

* indentation

* a bunch of type hints

* bulk changes

* fix florence kwargs  (#40826)

* fix: XIELU act parameters not being casted to correct dtype (#40812)

* Update model tags and integration references in bug report (#40881)

* [Qwen3 Next] Use numerically stable `rsqrt` (#40848)

use numerically stable inverse

* Adding Support for Qwen3-VL Series (#40795)

* add qwen3vl series

* make fixup

* fix import

* re-protect import

* fix it finally (need to merge main into the branch)

* skip processor test (need the checkpoint)

* oups typo

* simplify modular

* remove unecesary attr

* fix layer

* remove unused rope_deltas args

* reuse image def

* remove unnesesary imports

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>

* [`VaultGemma`] Update expectations in integration tests (#40855)

* fix tests

* style

* Fix modular consistency (#40883)

* reapply modular

* add missing one

* 🔴 Move variable output controls to `_prepare_generation_config ` (#40715)

* move checks to validate steps where possible

* fix csm and other models that override _sample

* ops dia you again

* opsie

* joao review

* Move variable output controls to `prepare_inputs_for_generation`

* fix a bunch of models

* back to basics

* final touches

* Clarify passing is_causal in sdpa_attention_paged_forward (#40838)

* Correctly pass is_causal in sdpa_attention_paged_forward

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Improve typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add comment

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Improve comments

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Revert typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use torch.expm1 and torch.log1p for better numerical results (#40860)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add Fast PromptDepthAnything Processor (#40602)

* Test & import setup

* First version passing tests

* Ruff

* Dummy post processing

* Add numerical test

* Adjust

* Doc

* Ruff

* remove unused arg

* Refine interpolation method and push test script

* update bench

* Comments

* Update src/transformers/models/auto/image_processing_auto.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Remove benchmrk script

* Update docstrings

* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* doc

* further process kwargs

* remove it

* remove

* Remove to dict

* remove crop middle

* Remove param specific handling

* Update testing logic

* remove ensure multiple of as kwargs

* fix formatting

* Remove none default and get image size

* Move stuff to _preprocess_image_like_inputs and refacto

* Clean

* ruff

* End of file & comments

* ruff again

* Padding fixed

* Remove comments to pass tests

* Remove prompt depth from kwargs

* Adjust output_size logic

* Docstring for preprocess

* auto_docstring for preprocess

* pass as an arg

* update test batched

* stack images

* remove prompt scale to meter

* return tensors back in preprocess

* remove copying of images

* Update behavior to match old processoer

* Fix batch size of tests

* fix test and fast

* Fix slow processor

* Put tests back to pytorch

* remove check and modify batched tests

* test do_pad + slow processor fix

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>

* Fix deta loading & dataclass (#40878)

* fix

* fix 2

* Remove dict branch of attention_mask in sdpa_attention_paged_forward (#40882)

Remove dict branch of attention_mask

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* 🌐 [i18n-KO] Translated smolvlm.md to Korean (#40414)

* fix: manual edits

* Apply suggestions from code review

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/_toctree.yml

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* 🌐 [i18n-KO] Translated `imageprocessor.md` to Korean (#39557)

* feat: manual translation

* docs: fix ko/_toctree.yml

* Apply suggestions from code review

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* Update docs/source/ko/image_processors.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* [generate] remove docs of a feature that no longer exists (#40895)

* Make debugging failing tests (check and update expect output values) easier 🔥  (#40727)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fixing the call to kernelize (#40628)

* fix

* style

* overload train and eval

* add getter and setter

* Fix getter  regression (#40824)

* test things

* style

* move tests to a sane place

* Fix flaky `Gemma3nAudioFeatureExtractionTest::test_dither` (#40902)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [cache] Merge static sliding and static chunked layer (#40893)

* merge

* get rid of tensors in get_mask_sizes!!

* remove branch

* add comment explanation

* re-add the class with deprecation cycle

* Harmonize CacheLayer names (#40892)

* unify naming

* style

* doc as well

* post rebase fix

* style

* style

* revert

* [cache] Only use scalars in `get_mask_sizes` (#40907)

* remove tensor ops

* style

* style

* Set seed for `Glm4vIntegrationTest` (#40905)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Add Olmo3 model (#40778)

* transformers add-new-model-like for Olmo3

* Implement modular Olmo3

* Update Olmo3 tests

* Copy Olmo2 weight converter to Olmo3

* Implement Olmo3 weight converter

* Fix code quality errors

* Remove unused import

* Address rope-related PR comments

* Update Olmo3 model doc with minimal details

* Fix Olmo3 rope test failure

* Fix 7B integration test

* remove dummy EncodingFast (#40864)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Improve module name handling for local custom code (#40809)

* Improve module name handling for local custom code

* Use `%lazy` in logging messages

* Revert "Use `%lazy` in logging messages"

This reverts commit 5848755d5805e67177c5218f351c0ac852df9340.

* Add notes for sanitization rule in docstring

* Remove too many underscores

* Update src/transformers/dynamic_module_utils.py

* Update src/transformers/dynamic_module_utils.py

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Remove `runner_map` (#40880)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* disable `test_fast_is_faster_than_slow` (#40909)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [gemma3] `Gemma3ForConditionalGeneration` compatible with assisted generation (#40791)

* gemma3vision compatible with assisted generation

* docstring

* BC

* docstring

* failing checks

* make fixup

* apply changes to modular

* misc fixes

* is_initialized

* fix poor rebase

* [generate] misc fixes (#40906)

misc fixes

* 🔴Make `center_crop` fast equivalent to slow (#40856)

make center_crop fast equivalent to slow

* Fix dtype in Paligemma (#40912)

* fix dtypes

* fix copies

* delete unused attr

* [Docs] Adding documentation of MXFP4 Quantization (#40885)

* adding mxfp4 quantization docs

* review suggestions

* Apply suggestions from code review

Co-authored-by: vb <vaibhavs10@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: vb <vaibhavs10@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Processor load with multi-processing (#40786)

push

* [Llama4] Remove `image_sizes` arg and deprecate `vision_feature_layer` (#40832)

* Remove unused arg

* deprecate

* revrt one change

* get set go

* version correction

* fix

* make style

* comment

* Fix #40067: Add dedicated UMT5 support to GGUF loader (config, tokenizer, test) (#40218)

* Fix #40067 : add UMT5 support in GGUF loader (config, tokenizer, test)

* chore: fix code formatting and linting issues

* refactor: move UMT5 GGUF test to quantization directory and clean up comments

* chore: trigger CI pipeline

* refactor(tests): Move UMT5 Encoder GGUF test to GgufModelTests. This consolidates the new test into the main class for consistency.

* Add regression check to UMT5 encoder GGUF test

Verify encoder output against reference tensor values with appropriate tolerances for stability.

* Update tests/quantization/ggml/test_ggml.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Update tests/quantization/ggml/test_ggml.py

remove comments

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* [torchao safetensors] renaming get_state_dict function (#40774)

renaming get_state_dict function

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Adding activation kernels (#40890)

* first commit

* add mode

* revert modeling

* add compile

* rm print

* Minor fix for #40727 (#40929)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Add support for Florence-2 training (#40914)

* Support training florence2

* update doc and testing model to florence-community

* fix florence-2 test, use head dim 16 instead of 8 for fa2

* skip test_sdpa_can_dispatch_on_flash

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Add LongCat-Flash (#40730)

* working draft for LongCat

* BC changes to deepseek_v3 for modular

* format

* various modularities

* better tp plan

* better init

* minor changes

* make modular better

* clean up patterns

* Revert a couple of modular commits, because we won't convert in the end

* make things explicit.

* draft test

* toctree, tests and imports

* drop

* woops

* make better things

* update test

* update

* fixes

* style and CI

* convert stuff

* up

* ah, yes, that

* enable gen tests

* fix cache shape in test (sum of 2 things)

* fix tests

* comments

* re-Identitise

* minimize changes

* better defaults

* modular betterment

* fix configuration, add documentation

* fix init

* add integration tests

* add info

* simplify

* update slow tests

* fix

* style

* some additional long tests

* cpu-only long test

* fix last tests?

* urg

* cleaner tests why not

* fix

* improve slow tests, no skip

* style

* don't upcast

* one skip

* finally fix parallelism

* [DOC] Add missing dates in model cards (#40922)

add missing dates

* [models] remove unused `import torch.utils.checkpoint`  (#40934)

* Intel CPU dockerfile (#40806)

* upload intel cpu dockerfile

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update cpu dockerfile

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update label name

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* docs(i18n): Correct the descriptive text in the README_zh-hans.md (#40941)

* Fix trainer tests (#40823)

* fix liger

* fix

* more

* fix

* fix hp

* fix

---------

Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>

* Fix `Glm4vMoeIntegrationTest` (#40930)

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Raise error instead of warning when using meta device in from_pretrained (#40942)

* raise instead of warning

* add timm

* remove

* Consistent naming for images kwargs (#40834)

* use consistent naming for padding

* no validation on pad size

* add warnings

* fix

* fox copies

* another fix

* fix some tests

* fix more tests

* fix lasts tests

* fix copies

* better docstring

* delete print

* Remove nested import logic for torchvision (#40940)

* remove nested import logic for torchvision

* remove unnecessary protected imports

* remove unnecessarry protected import in modular (and modeling)

* fix wrongly remove protected imports

* Fix `Glm4vModelTest::test_eager_matches_fa2_generate` (#40947)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Update expected values for some `test_speculative_generation` (#40949)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Standardize audio embedding function name for audio multimodal models (#40919)

* Standardize audio embedding function name for audio multimodal models

* PR review

* Add FlexOlmo model (#40921)

* transformers add-new-model-like

* Add FlexOlmo implementation

* Update FlexOlmo docs

* Set default tokenization for flex olmo

* Update FlexOlmo tests

* Update attention comment

* Remove unneeded use of `sliding_window`

* Don't list dropout in eager_paged_attention_forward (#40924)

Remove dropout argument

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update expected values for one more `test_speculative_generation` after #40949 (#40967)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* FIX(trainer): ensure final checkpoint is saved when resuming training (#40347)

* fix(trainer): ensure final checkpoint is saved when resuming training

* add test

* make style && slight fix of test

* make style again

* move test code to test_trainer

* remove outdated test file

* Apply style fixes

---------

Co-authored-by: rangehow <rangehow@foxmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Add new model LFM2-VL (#40624)

* Add LFM2-VL support

* add tests

* linting, formatting, misc review changes

* add siglip2 to auto config and instantiate it in lfm2-vl configuration

* decouple image processor from processor

* remove torch import from configuration

* replace | with Optional

* remove layer truncation from modeling file

* fix copies

* update everything

* fix test case to use tiny model

* update the test cases

* fix finally the image processor and add slow tests

* fixup

* typo in docs

* fix tests

* the doc name uses underscore

* address comments from Yoni

* delete tests and unsuffling

* relative import

* do we really handle imports better now?

* fix test

* slow tests

* found a bug in ordering + slow tests

* fix copies

* dont run compile test

---------

Co-authored-by: Anna <anna@liquid.ai>
Co-authored-by: Anna Banaszak <48625325+ankke@users.noreply.github.com>

* Fix outdated version checks of accelerator (#40969)

* Fix outdated version checks of accelerator

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix outdated version checks of accelerator

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use `skip_predictor=True` in vjepa2 `get_vision_features` (#40966)

use skip_predictor in vjepa2 `get_vision_features`

* [Trainer] Fix DP loss (#40799)

* fix

* style

* Fix fp16

* style

---------

Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>

* [timm_wrapper] better handling of "Unknown model" exception in timm (#40951)

* fix(timm): Add exception handling for unknown Gemma3n model

* nit: Let’s cater to this specific issue

* nit: Simplify error handling

* Fix Issue #39030: AutoTokenizer.from_pretrained does not propagate token (#40956)

* fix merge conflicts

* change token typing

---------

Co-authored-by: Ubuntu <ubuntu@ip-172-31-27-253.ec2.internal>

* [tests] Really use small models in all fast tests (#40945)

* start

* xcodec

* chameleon

* start

* layoutlm2

* layoutlm

* remove skip

* oups

* timm_wrapper

* add default

* doc

* consistency

* Add captured actual outputs to CI artifacts (#40965)

* fix

* fix

* Remove `# TODO: ???` as it make me `???`

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Revert change in `compile_friendly_resize` (#40645)

fix

* Track the CI (model) jobs that don't produce test output files (process being killed etc.) (#40981)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Using torch.distributions.Categorical

* Remove `set_model_tester_for_less_flaky_tests` (#40982)

remove

* Benchmarking v2 GH workflows (#40716)

* WIP benchmark v2 workflow

* Container was missing

* Change to sandbox branch name

* Wrong place for image name

* Variable declarations

* Remove references to file logging

* Remove unnecessary step

* Fix deps install

* Syntax

* Add workdir

* Add upload feature

* typo

* No need for hf_transfer

* Pass in runner

* Runner config

* Runner config

* Runner config

* Runner config

* Runner config

* mi325 caller

* Name workflow runs properly

* Copy-paste error

* Add final repo IDs and schedule

* Review comments

* Remove wf params

* Remove parametrization from worfkflow files

* Fix callers

* Change push trigger to pull_request + label

* Add back schedule event

* Push to the same dataset

* Simplify parameter description

* 🔴[`Attention`] Bert-based Models Attention Refactor (#38301)

* clean start to bert refactor

* some test fixes

* style

* fix last tests

* be strict on positional embeddings, fixup according tests

* cache support

* more cache fixes, new causal API

* simplify masks, fix tests for gen

* flex attn, static cache support, round of fixes

* ?

* this time

* style

* fix flash attention tests, flex attention requires torch 2.7.x to work with multiple classes (as recompile strats force a size call which is wrongly interpreted before)

* roberta

* fixup sdpa remains

* attention split, simplify args and kwargs, better typing

* fix encoder decoder

* fix test

* modular roberta

* albert

* data2vectext, making it modular tomorrow

* modular data2vec text

* tmp disable

* xmod + cache position fixes

* whoops

* electra + markuplm, small fixes

* remove wrong copy

* xlm_roberta + some embedding fixes

* roberta prelayernorm

* RemBert: remove copy, maybe doing it later

* ernie

* fix roberta offloading

* camembert

* copy fixes

* bert generation + fixes on eager

* xlm roberta xl

* bridgetower (text) + seamlessv2 copy fixes

* rocbert + small fixes

* whoops

* small round of fixups

* NOTE: kernels didnt load with an earlier version, some fixup (needs another look bc cross deps)

* the end of the tunnel?

* fixup nllbmoe + style

* we dont need this anymore

* megatron bert is barely used, low prio skip for now

* Modernize bert (template for others)

NOTE: trying to push this through, might be overdue if not in time possible

* check inputs for all others (if checkmarked)

* fix bridgetower

* style

* fix encoder decoder (partially but cause found and fix also, just needs to be done for everything else)

* proper fix for bert to force intermediate dict outputs

* propagate to others

* style

* xlm roberta xl investigation, its the layernorm...

* mobile bert

* revert this, might cause issues with composed models

* review

* style

* Remove [[autodoc]] refs to TF/Flax objects (#40996)

* remove refs

* more

* ENH: Enable readline support for transformers chat (#40911)

ENH Enable readline support for chat

This small change enables GNU readline support for the transformers chat
command. This includes, among others:

- advanced navigation and editing: ctrl + a ctrl + e alt + b alt + f
  ctrl + k alt + d etc.
- navigate and search history: arrow up/down ctrl + p ctrl + n  ctrl + r
- undo: ctrl + _
- clear screen: ctrl + l

Implementation

Although it may look strange, just importing readline is enough to
enable it in Python, see:

https://docs.python.org/3/library/functions.html#input

As readline is not available on some
platforms (https://docs.python.org/3/library/readline.html), the import
is guarded.

Readline should work on Linux, MacOS, and with WSL, I'm not sure about
Windows though. Ideally, someone can give it a try. It's possible that
Windows users would have to install
pyreadline (https://pypi.org/project/pyreadline3/).

* [testing] test `num_hidden_layers` being small in model tester (#40992)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* blt wip (#38579)

* blt wip

* cpu version

* cpu friendly with full entropy model (real time patching)

* adding config file instead of args file

* enable MPS

* refactoring unused code

* single config class in config file

* inherit from PreTrainedModel

* refactor LMTransformer --> BLTPatcher

* add conversion script

* load from new checkpoing with form_pretrained

* fixed demo from_pretrained

* clean up

* clean a few comments

* cleanup folder

* clean up dir

* cleaned up modeling further

* rename classes

* adding transformers Attention class and RotaryEmbedding class

* exchanged blt modules for transformers modules: attention, rotary_emb, create_causal_mask, etc

* seperate out patcher config, update modeling and conversion script

* rename vars to be more transformers-like

* rm unused functions

* adding cross attention from transformers

* pass arg

* rename weights

* updated conversion script

* overwritten commit! fixing PR

* apply feedback

* adding BLTRMSNorm like Llama

* add repeat_kv and eager_attention_forward copied from

* BLTMLP identical to MllamTextMLP

* clean up some args'

* more like mllama, but busier inits

* BLTTransformerLayer config

* decoder, encoder, global configs

* wip working on modular file

* cleaning up patch and configs

* clean up patcher helpers

* clean up patcher helpers further

* clean up

* some config renaming

* clean up unused configs

* clean up configs

* clean up configs

* update modular

* clean

* update demo

* config more like mllama, seperated subconfigs from subdicts

* read from config instead of self args

* update demo file

* model weights to causal lm weights

* missed file

* added tied weights keys

* BLTForCausalLM

* adding files after add-new-model-like

* update demo

* working on tests

* first running integration tests

* added integration tests

* adding tokenization tests, integration tests, and cleaned up tokenization file, + ruff

* tokenizer clean up

* modular file

* fixing rebase

* ruff

* adding correct basemodel output and updating config with checkpoint vals (for testing)

* BLTModelTests git status

* enabling inputs_embeds, although won't be equal to input_ids since need ids for patching logic

* fix sdpa == causal tests

* fix small model test and some gradient checkpointing

* skip training GC tests

* fix test

* updated modular

* update modular

* ruff

* adding modular + modeling

* modular

* more modern is_casual check

* cleaning up modular

* more modular reduction

* ruff

* modular fix

* fix styling

* return 2

* return 2

* fix some tests

* fix bltcrossattention after modular break

* some fixes / feedback

* try cache generate fix

* try cache generate fix

* fix generate tests

* attn_impl workaround

* refactoring to use recent TransformersKwargs changes

* fix hidden_states shape test

* refactor to new outputs

* simplify outputs a bit

* rm unneeded decoderlayer overwriting

* rename blt

* forgot tokenizer test renamed

* Reorder

* Reorder

* working on modular

* updates from modular

* new modular

* ruff and such

* update pretrainedmodel modular

* using cohere2 apply_rotary_pos_emb

* small changes

* apply feedback r2

* fix cross_attention

* apply more feedback

* update modeling fix

* load submodules from pretrainedmodel

* set initializer_range to subconfigs

* rm cross_attnetion_states pass when not needed

* add 7b projection layer support

* check repo

* make copies

* lost cohere2 rotate_half

* ruff

* copies?

* don't tie weights for submodules

* tie weights setting

* check docstrings

* apply feedback

* rebase

* rebased modeling

* update docs

* applying feedback

* few more fixes

* fix can_record_outputs

* fast tokenizer

* no more modulelist

* tok auto

* rm tokenizersss

* fix docs

* ruff

* fix after rebase

* fix test, configs are not subscriptable

---------

Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-30.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-103.ec2.internal>
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-36.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-45.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-173-121.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-103.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-178.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-79.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-169-239.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-111.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-100.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-153.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-15.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-131.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-138.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-215.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-142.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-147.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-0.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-58.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-202.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-244.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-186.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-192.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-14.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-171-249.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-75.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-78.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-134.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-180.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-175-241.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-225.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-9.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-34.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-68.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-175.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-170-160.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-95.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-73.ec2.internal>

* [docs] rm stray tf/flax autodocs references (#40999)

rm tf references

* [`RMSNorm`] Fix rms norm init for models that center around 1 (#40796)

* fix

* fixup inits

* oops

* fixup gemma

* fixup modular order

* how does this keep happen lol

* vaultgemma is new i forgot

* remove init check

* Make `EfficientLoFTRModelTest` faster (#41000)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix typoes in src and tests (#40845)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix more dates in model cards and wrong modalities in _toctree.yml (#40955)

* Fix model cards and modalities in toctree

* fix new models

* RUFF fix on CI scripts (#40805)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* fix dict like init for ModelOutput (#41002)

* fix dict like init

* style

* 🚨 [v5] remove generate output retrocompatibility aliases (#40998)

remove old type aliases

* [tests] update `test_left_padding_compatibility` (and minimize overwrites) (#40980)

* update test (and overwrites)

* better test comment

* 0 as a default for

* Patch more `unittest.case.TestCase.assertXXX` methods (#41008)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* 🚨 [v5] remove deprecated entry point (#40997)

* remove old entry point

* update references to transformers-cli

* 🚨 [lightglue] fix: matches order changed because of early stopped indices (#40859)

* fix: bug that made early stop change order of matches

* fix: applied code suggestion

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fix: applied code suggestion to modular

* fix: integration tests

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Fix `PhimoeIntegrationTest` (#41007)

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix Glm4v test (#41011)

fix

* Update after #41007 (#41014)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix benchmark runner argument name (#41012)

* Adding support for Qwen3Omni (#41025)

* Add Qwen3Omni

* make fix-copies, import properly

* nit

* fix wrong setup. Why was audio_token_id renamed ?

* upds

* more processing fixes

* yup

* fix more generation tests

* down to 1?

* fix import issue

* style, update check repo

* up

* fix quality at my best

* final quality?

* fix doc building

* FINAL COMMIT: SKIP IMPORTANT BUT FAILING TESTS FOR MERGE

* SKIP THE TEMPLATE ONE

---------

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>

* Making compute_loss_func always take priority in Trainer (#40632)

* logger warn, if-else logic improved

* redundant if condition fix

* Modify Qwen3Omni parameter name since VL changed it (#41045)

Modify parameter name since VL changed it

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>

* Fix Qwen video tests (#41049)

fix test

* [testing] Fix `qwen2_audio` (#41018)

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix typing of tuples (#41028)

* Fix tuple typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove optax (#41030)

Remove optax dep

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typos in English/Chinese documentation (#41031)

* Fix typos and formatting in English docs

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typos and formatting in Chinese docs

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use torch.autocast (#40975)

* Use torch.autocast

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Format code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* docs: improved RoPE function Docstrings (#41004)

* docs: improved RoPE functuon docstrings

* Update src/transformers/modeling_rope_utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Fix condition for emitting warning when generation exceeds max model length (#40775)

correct warning when generation exceeds max model length

Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>

* Fix outdated torch version check (#40925)

Update torch minimum version check to 2.2

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove doc of tf and flax (#41029)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add Whole Word Masking and Padding Strategy to DataCollatorForLanguageModeling (#39485)

* Add whole word masking

* Vectorize whole word masking functions

* Unit test whole word masking

* Remove support for TF in whole word masking

* [testing] Fix `seed_oss` (#41052)

* fix

* fix

* fix

* fix

* fix

* fix

* Update tests/models/seed_oss/test_modeling_seed_oss.py

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Remove repeated import (#40937)

* Remove repeated import

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix conflict

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Simplify unnecessary Optional typing (#40839)

Remove Optional

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add write token for uploading benchmark results to the Hub (#41047)

* Separate write token for Hub upload

* Address review comments

* Address review comments

* Ci utils (#40978)

* Add CI reports dir to gitignore

* Add utils to run local CI

* Review compliance

* Style

* License

* Remove <frameworkcontent> and <pt> tags from documentation (#41055)

* Remove <frameworkcontent> and <pt> tags

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Revert changes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update docs/source/en/model_doc/madlad-400.md

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Fix CI jobs being all red 🔴 (false positive) (#41059)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Update quantization CI (#41068)

* fix

* new everything

* fix

* [i18n-bn] Add Bengali language README file (#40935)

* [i18n-bn] Add Bengali language README file and update links in existing language files

* Update Bengali README for clarity and consistency in model descriptions

* Improve documentation and errors in Mamba2-based models (#41063)

* fix bug in Mamba2 docs

* correct 'because on of' issue

* link to other Mamba2 model types

* github URL is not changed

* update error message in generated files

* Update team member list for some CI workflows (#41094)

* update list

* update list

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* fix crash when using chat to send 2+ request to gptoss (#40536)

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

* Minor addition, no split modules for VideoMAEE (#41051)

* added no split modules

* fixed typo

---------

Co-authored-by: Raushan Turganbay <raushan@huggingface.co>

* Switch to `python:3.10-slim` for CircleCI docker images (#41067)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix argument name in benchmarking script (#41086)

* Fix argument name in benchmarking script

* Adjust vars

* Remove mention of TensorFlow/Flax/JAX from English documentation (#41058)

Remove mention of TensorFlow from English documentation

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typos in documentation (#41087)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typing (#40788)

* Fix optional typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix optional typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix schema typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typing

* Fix typing

* Fix typing

* Fix typing

* Use np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Format code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Improve typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix quote string of np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix code

* Format

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove unused arguments (#40916)

* Fix unused arguments

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove tf and flax from Chinese documentation (#41057)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* fix wrong height and width when read video use torchvision (#41091)

* docs: Fix Tool Use links and remove dead RAG links (#41104)

docs: Fix tool use links. Remove dead RAG links. Fix style

* 🚨 [generate] update paligemma mask updates (and other assisted generation-related fixes) (#40917)

* tmp

* fix modular inheritance

* nit

* paligemma 1 doesn't have swa

* use same pattern as in models with hybrid layers

* PR comments

* helium also needs layer_typed (bc it relies on gemma)

* paligemma/gemma3: same mask creation fn in fwd and generate

* propagate changes to helium (gemma-based)

* tmp commit

* slow paligemma tests passing, let's see what breaks

* fix test_left_padding_compatibility

* tmp commit

* tmp commit

* rebase error

* docs

* reduce diff

* like this?

* t5gemma

* better comment

* shorter diff

* exception

* ffs type

* optional

* shorter modular_gemma.py

* helium model actually needs no changes -- the tester is the issue

* t5gemma modular config

* a few more modular; paligemma BC

* fix processor issues?

* rm config exception

* lift warning in gemma

* [tests] gpt2 + `CausalLMModelTester` (#41003)

* tmp commit

* tmp commit

* tmp commit

* rm old GPT2ModelTester

* nit bug

* add facilities for encoder-decoder tests; add comments on ALL overwrites/extra fns

* vision_encoder_decoder

* Fix `_get_test_info` for inherited tests (#41106)

* fix _get_test_info

* fix patched

* add comment

* ruff

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Remove bad test skips (#41109)

* remove bad skips

* remove more

* fix inits

* Format empty lines and white space in markdown files. (#41100)

* Remove additional white space and empty lines from markdown files

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add empty lines around code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update ruff to 0.13.1 + target Python 3.10 + apply fixes (#37809)

Update ruff to 0.13.1 target it to Python 3.10 and apply its fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* 🚨 [V5] Remove deprecated training arguments  (#41017)

* Remove deprecated training arguments from V5

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove deprecated training arguments from V5

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix comments

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Support loading LFM2 GGUF (#41111)

* add gguf config mapping for lfm2

* add lfm2 tensor process to unsqueeze conv weights

* adjust values from gguf config to HF config

* add test for lfm2 gguf

* ruff

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* [torchao safetensors] integrate torchao safetensors support with transformers  (#40735)

* enable torchao safetensors

* enable torchao safetensors support

* add more version checking

* [Qwen3-next] Fix dimension mismatch in torch_chunk_gated_delta_rule and torch_recurrent_gated_delta_rule (#40963) (#41036)

* fix mismatched dims for qwen3 next

* propagate changes

* chore: renamed tot_heads to total_sequence_length

* Apply suggestion from @vasqu

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* minor fix to modular qwen3 next file

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Fix the error where a keyword argument appearing before *args (#41099)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix broken `` expressions in markdown files (#41113)

Fix broken expressions in markdown files

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove self-assignment (#41062)

* Remove self-assignment

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update src/transformers/integrations/flash_paged.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Clear pass

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Clear pass

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Clear pass

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* 🚨Refactor: Update text2text generation pipelines to use max_new_tokens… (#40928)

* Refactor: Update text2text generation pipelines to use max_new_tokens and resolve max_length warning

* docs(text2text_generation): 更新参数注释以反映现代生成实践

将max_length参数注释更新为max_new_tokens,以符合现代生成实践中指定生成新token数量的标准做法

* refactor(text2text_generation): Remove outdated input validation logic

* docs(text2text_generation): Revert incorrectly modified comment

* docs(text2text_generation): Revert incorrectly modified comment

* Fixed MXFP4 model storage issue (#41118)

* Fixed loading LongT5 from legacy checkpoints (#40724)

* Fixed loading LongT5 from legacy checkpoints

* Adapted the fix to work with missing lm_head

* dummy commit (#41133)

* dummy commit, nothing interesting

* dummy commit, nothing interesting

* dummy commit, nothing interesting

* dummy commit, nothing interesting

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix loading logic flaw with regards to unexpected and missing keys (#40850)

* Unexpected keys should be ignored at load with device map

* remove them all

* fix logic flaw

* fix

* simplify

* style

* fix

* revert caching allocator change

* add other test

* add nice doc

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Using torch.distributions.Categorical

* Resolving logits_process.py Issues

* style: autoformat with make fixup

* Update logits_process.py removed defaults

* Variable H name -> cumulative_entropy

* Resolving format error

* Correction of the loop variables in logit processor

* Vectorized the loop in logits_process

* formatted  logits_process

* paper reference and stopping rule comment logits_process

* Trigger CI rerun

* Update logits_process.py

* added test_TopH_example_integration

* added test_TopH_example_integration

* Update README.md

* Restore CI config to match main (remove accidental changes)

* Restore CI config to match upstream main (no diffs)

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
Co-authored-by: ArminAzizi98 <147081650+ArminAzizi98@users.noreply.github.com>
Co-authored-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Yuchao Zhang <418121364@qq.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
Co-authored-by: Bo Zheng <368586905@qq.com>
Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Rémi Ouazan <83456801+remi-or@users.noreply.github.com>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
Co-authored-by: Amer <amersinha@gmail.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Ákos Hadnagy <akos@ahadnagy.com>
Co-authored-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>
Co-authored-by: NanoCode012 <nano@axolotl.ai>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: 艾力可 <178652170+thalahors@users.noreply.github.com>
Co-authored-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com>
Co-authored-by: Manuel de Prada Corral <6536835+manueldeprada@users.noreply.github.com>
Co-authored-by: Samuel Barry <127697809+SamuelBarryCS@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: HyunZ118 <156191095+HyunZ118@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: Shane A <shanea@allenai.org>
Co-authored-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: vb <vaibhavs10@gmail.com>
Co-authored-by: Yaswanth Gali <82788246+yaswanth19@users.noreply.github.com>
Co-authored-by: Akshay Babbar <priv.akshay@outlook.com>
Co-authored-by: liangel-02 <liangel@meta.com>
Co-authored-by: Duc-Viet Hoang <vietyb00@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: lilin-1 <256404019@qq.com>
Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
Co-authored-by: Jack <32371937+jackzhxng@users.noreply.github.com>
Co-authored-by: Rangehow <88258534+rangehow@users.noreply.github.com>
Co-authored-by: rangehow <rangehow@foxmail.com>
Co-authored-by: Anna <anna@liquid.ai>
Co-authored-by: Anna Banaszak <48625325+ankke@users.noreply.github.com>
Co-authored-by: Hamish Scott <41787553+hamishs@users.noreply.github.com>
Co-authored-by: Harshal Janjani <75426551+harshaljanjani@users.noreply.github.com>
Co-authored-by: Branden <brandenkmurray@gmail.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-27-253.ec2.internal>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Ita Zaporozhets <31893021+itazap@users.noreply.github.com>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-30.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-103.ec2.internal>
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-36.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-45.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-173-121.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-103.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-178.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-79.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-169-239.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-111.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-100.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-153.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-15.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-131.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-138.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-215.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-142.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-147.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-0.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-58.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-202.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-244.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-186.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-192.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-14.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-171-249.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-75.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-78.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-134.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-180.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-175-241.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-225.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-9.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-34.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-68.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-175.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-170-160.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-95.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-73.ec2.internal>
Co-authored-by: StevenBucaille <steven.bucaille@gmail.com>
Co-authored-by: BakerBunker <17872844+BakerBunker@users.noreply.github.com>
Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>
Co-authored-by: Ayush <ayushtanwar1729@gmail.com>
Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>
Co-authored-by: Yannick Schnider <Yannick.Schnider1@ibm.com>
Co-authored-by: Ralph Gleaton <70818603+rjgleaton@users.noreply.github.com>
Co-authored-by: Saidur Rahman Pulok <59414463+saidurpulok@users.noreply.github.com>
Co-authored-by: Nick Doiron <ndoiron@mapmeld.com>
Co-authored-by: Wang, Yi <yi.a.wang@intel.com>
Co-authored-by: Duygu Altinok <duygu.altinok12@gmail.com>
Co-authored-by: Jinde.Song <juude.song@gmail.com>
Co-authored-by: hbenoit <60629420+HaroldBenoit@users.noreply.github.com>
Co-authored-by: nnul <107971634+notkisk@users.noreply.github.com>
Co-authored-by: YangKai0616 <kai.yang@intel.com>
Co-authored-by: Karol Szustakowski <61427290+Szustarol@users.noreply.github.com>
Co-authored-by: souvikku <107592858+souvikku@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants