v4.57.0 Branch #41310

ArthurZucker · 2025-10-03T10:05:26Z

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

#40949 (#40967) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

…#40347) * fix(trainer): ensure final checkpoint is saved when resuming training * add test * make style && slight fix of test * make style again * move test code to test_trainer * remove outdated test file * Apply style fixes --------- Co-authored-by: rangehow <rangehow@foxmail.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Add LFM2-VL support * add tests * linting, formatting, misc review changes * add siglip2 to auto config and instantiate it in lfm2-vl configuration * decouple image processor from processor * remove torch import from configuration * replace | with Optional * remove layer truncation from modeling file * fix copies * update everything * fix test case to use tiny model * update the test cases * fix finally the image processor and add slow tests * fixup * typo in docs * fix tests * the doc name uses underscore * address comments from Yoni * delete tests and unsuffling * relative import * do we really handle imports better now? * fix test * slow tests * found a bug in ordering + slow tests * fix copies * dont run compile test --------- Co-authored-by: Anna <anna@liquid.ai> Co-authored-by: Anna Banaszak <48625325+ankke@users.noreply.github.com>

* Fix outdated version checks of accelerator Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Fix outdated version checks of accelerator Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> --------- Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

use skip_predictor in vjepa2 `get_vision_features`

* fix * style * Fix fp16 * style --------- Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>

…40951) * fix(timm): Add exception handling for unknown Gemma3n model * nit: Let’s cater to this specific issue * nit: Simplify error handling

…ken (#40956) * fix merge conflicts * change token typing --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-27-253.ec2.internal>

* start * xcodec * chameleon * start * layoutlm2 * layoutlm * remove skip * oups * timm_wrapper * add default * doc * consistency

* fix * fix * Remove `# TODO: ???` as it make me `???` * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

fix

…ss being killed etc.) (#40981) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

remove

* WIP benchmark v2 workflow * Container was missing * Change to sandbox branch name * Wrong place for image name * Variable declarations * Remove references to file logging * Remove unnecessary step * Fix deps install * Syntax * Add workdir * Add upload feature * typo * No need for hf_transfer * Pass in runner * Runner config * Runner config * Runner config * Runner config * Runner config * mi325 caller * Name workflow runs properly * Copy-paste error * Add final repo IDs and schedule * Review comments * Remove wf params * Remove parametrization from worfkflow files * Fix callers * Change push trigger to pull_request + label * Add back schedule event * Push to the same dataset * Simplify parameter description

ENH Enable readline support for chat This small change enables GNU readline support for the transformers chat command. This includes, among others: - advanced navigation and editing: ctrl + a ctrl + e alt + b alt + f ctrl + k alt + d etc. - navigate and search history: arrow up/down ctrl + p ctrl + n ctrl + r - undo: ctrl + _ - clear screen: ctrl + l Implementation Although it may look strange, just importing readline is enough to enable it in Python, see: https://docs.python.org/3/library/functions.html#input As readline is not available on some platforms (https://docs.python.org/3/library/readline.html), the import is guarded. Readline should work on Linux, MacOS, and with WSL, I'm not sure about Windows though. Ideally, someone can give it a try. It's possible that Windows users would have to install pyreadline (https://pypi.org/project/pyreadline3/).

fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* blt wip * cpu version * cpu friendly with full entropy model (real time patching) * adding config file instead of args file * enable MPS * refactoring unused code * single config class in config file * inherit from PreTrainedModel * refactor LMTransformer --> BLTPatcher * add conversion script * load from new checkpoing with form_pretrained * fixed demo from_pretrained * clean up * clean a few comments * cleanup folder * clean up dir * cleaned up modeling further * rename classes * adding transformers Attention class and RotaryEmbedding class * exchanged blt modules for transformers modules: attention, rotary_emb, create_causal_mask, etc * seperate out patcher config, update modeling and conversion script * rename vars to be more transformers-like * rm unused functions * adding cross attention from transformers * pass arg * rename weights * updated conversion script * overwritten commit! fixing PR * apply feedback * adding BLTRMSNorm like Llama * add repeat_kv and eager_attention_forward copied from * BLTMLP identical to MllamTextMLP * clean up some args' * more like mllama, but busier inits * BLTTransformerLayer config * decoder, encoder, global configs * wip working on modular file * cleaning up patch and configs * clean up patcher helpers * clean up patcher helpers further * clean up * some config renaming * clean up unused configs * clean up configs * clean up configs * update modular * clean * update demo * config more like mllama, seperated subconfigs from subdicts * read from config instead of self args * update demo file * model weights to causal lm weights * missed file * added tied weights keys * BLTForCausalLM * adding files after add-new-model-like * update demo * working on tests * first running integration tests * added integration tests * adding tokenization tests, integration tests, and cleaned up tokenization file, + ruff * tokenizer clean up * modular file * fixing rebase * ruff * adding correct basemodel output and updating config with checkpoint vals (for testing) * BLTModelTests git status * enabling inputs_embeds, although won't be equal to input_ids since need ids for patching logic * fix sdpa == causal tests * fix small model test and some gradient checkpointing * skip training GC tests * fix test * updated modular * update modular * ruff * adding modular + modeling * modular * more modern is_casual check * cleaning up modular * more modular reduction * ruff * modular fix * fix styling * return 2 * return 2 * fix some tests * fix bltcrossattention after modular break * some fixes / feedback * try cache generate fix * try cache generate fix * fix generate tests * attn_impl workaround * refactoring to use recent TransformersKwargs changes * fix hidden_states shape test * refactor to new outputs * simplify outputs a bit * rm unneeded decoderlayer overwriting * rename blt * forgot tokenizer test renamed * Reorder * Reorder * working on modular * updates from modular * new modular * ruff and such * update pretrainedmodel modular * using cohere2 apply_rotary_pos_emb * small changes * apply feedback r2 * fix cross_attention * apply more feedback * update modeling fix * load submodules from pretrainedmodel * set initializer_range to subconfigs * rm cross_attnetion_states pass when not needed * add 7b projection layer support * check repo * make copies * lost cohere2 rotate_half * ruff * copies? * don't tie weights for submodules * tie weights setting * check docstrings * apply feedback * rebase * rebased modeling * update docs * applying feedback * few more fixes * fix can_record_outputs * fast tokenizer * no more modulelist * tok auto * rm tokenizersss * fix docs * ruff * fix after rebase * fix test, configs are not subscriptable --------- Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-30.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-103.ec2.internal> Co-authored-by: Lysandre <hi@lysand.re> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-36.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-45.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-173-121.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-103.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-178.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-79.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-169-239.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-111.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-100.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-153.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-15.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-131.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-138.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-215.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-142.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-147.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-0.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-58.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-202.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-244.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-186.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-192.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-14.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-171-249.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-75.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-78.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-134.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-180.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-175-241.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-225.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-9.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-34.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-68.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-175.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-170-160.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-95.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-73.ec2.internal>

* fix * fixup inits * oops * fixup gemma * fixup modular order * how does this keep happen lol * vaultgemma is new i forgot * remove init check

* fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

…40955) * Fix model cards and modalities in toctree * fix new models

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* fix dict like init * style

…ites) (#40980) * update test (and overwrites) * better test comment * 0 as a default for

fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

…ices (#40859) * fix: bug that made early stop change order of matches * fix: applied code suggestion Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * fix: applied code suggestion to modular * fix: integration tests --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fix * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

fix

* fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* allow prive space id for trackio * complete docstring

* fix-client * fix

Fix is_torchvision_v2_available Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

added logits slicing to BioGpt for seq classifier Signed-off-by: Aviral <aviralkamaljain@gmail.com>

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

Fix pylint generator warnings Signed-off-by: cyy <cyyever@outlook.com>

* feat: use `aws-highcpu-32-priv` for amd docker img build * feat: add `workflow_dispatch` event to docker build CI

* support aux loss in qwen3vlmoe * update qwen3vl processor test! * add integration tests for qwen3vl-30a3 * remove duplicated decorator * code clean * fix consistency * do not inherit from nn.Linear for better quantization * pass check

remove it

…b Actions (#41263) * delete * delete --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* separate * separate --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

fix

The main content of this PR is to fix a bug in the delete_adapter method of the PeftAdapterMixin. Previously, it did not take into account auxiliary modules from PEFT, e.g. those added by modules_to_save. This PR fixes this oversight. Note that the PR uses a new functionality from PEFT that exposes integration functions like delete_adapter. Those will be contained in the next PEFT release, 0.18.0 (yet unreleased). Therefore, the bug is only fixed when users have a PEFT version fullfilling this requirement. I ensured that with old PEFT versions, the integration still works the same as previously. The newly added test for this is skipped if the PEFT version is too low. (Note: I tested locally with that the test will pass with PEFT 0.18.0) While working on this, I also cleaned up the following: - The active_adapter property has been deprecated for more than 2 years (#26407). It is safe to remove it now. - There were numerous small errors or outdated pieces of information in the docstrings, which have been addressed. When PEFT < 0.18.0 is used, although we cannot delete modules_to_save, we can still detect them and warn about it.

chore: add Italian translation for README.md

use hub cache Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

ArthurZucker · 2025-10-03T12:19:12Z

run slow: auto

github-actions · 2025-10-03T12:20:38Z

This comment contains run-slow, running the specified jobs:

models: ['models/auto', 'models/blt']
quantizations: [] ...

ArthurZucker · 2025-10-03T12:57:38Z

run slow: all

github-actions · 2025-10-03T12:59:06Z

This comment contains run-slow, running the specified jobs:

models: ['models/blt']
quantizations: [] ...

ydshieh and others added 30 commits September 18, 2025 11:47

Update expected values for one more test_speculative_generation after

5748352

#40949 (#40967) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Use skip_predictor=True in vjepa2 get_vision_features (#40966)

7cf1f5c

use skip_predictor in vjepa2 `get_vision_features`

[Trainer] Fix DP loss (#40799)

9378f87

* fix * style * Fix fp16 * style --------- Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>

[timm_wrapper] better handling of "Unknown model" exception in timm (#…

6e51ac3

…40951) * fix(timm): Add exception handling for unknown Gemma3n model * nit: Let’s cater to this specific issue * nit: Simplify error handling

Fix Issue #39030: AutoTokenizer.from_pretrained does not propagate to…

2ce35a2

…ken (#40956) * fix merge conflicts * change token typing --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-27-253.ec2.internal>

[tests] Really use small models in all fast tests (#40945)

dd7ac4c

* start * xcodec * chameleon * start * layoutlm2 * layoutlm * remove skip * oups * timm_wrapper * add default * doc * consistency

Add captured actual outputs to CI artifacts (#40965)

738b223

* fix * fix * Remove `# TODO: ???` as it make me `???` * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Revert change in compile_friendly_resize (#40645)

d9d7f6a

fix

Track the CI (model) jobs that don't produce test output files (proce…

5ac3c51

…ss being killed etc.) (#40981) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Remove set_model_tester_for_less_flaky_tests (#40982)

5c2f566

remove

[testing] test num_hidden_layers being small in model tester (#40992)

103fe0d

fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

[RMSNorm] Fix rms norm init for models that center around 1 (#40796)

78f3e08

* fix * fixup inits * oops * fixup gemma * fixup modular order * how does this keep happen lol * vaultgemma is new i forgot * remove init check

Make EfficientLoFTRModelTest faster (#41000)

a89ed71

* fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Fix typoes in src and tests (#40845)

662ea95

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

Fix more dates in model cards and wrong modalities in _toctree.yml (#…

f73f73d

…40955) * Fix model cards and modalities in toctree * fix new models

RUFF fix on CI scripts (#40805)

6e1270d

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

fix dict like init for ModelOutput (#41002)

251825a

* fix dict like init * style

[tests] update test_left_padding_compatibility (and minimize overwr…

f47c651

…ites) (#40980) * update test (and overwrites) * better test comment * 0 as a default for

Patch more unittest.case.TestCase.assertXXX methods (#41008)

b164209

fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Fix PhimoeIntegrationTest (#41007)

b2b5044

* fix * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Fix Glm4v test (#41011)

e5a9a1d

fix

Update after #41007 (#41014)

9de898e

* fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Fix benchmark runner argument name (#41012)

c1cf8de

qgallouedec and others added 18 commits October 3, 2025 12:02

Allow private Space id for Trackio (#40948)

37f1f5d

* allow prive space id for trackio * complete docstring

fix async client for transformers chat (#41255)

247d21a

* fix-client * fix

Unify is_torchvision_v2_available with is_torchvision_available (#41259)

26c57ef

Fix is_torchvision_v2_available Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

Use max/min (#41280)

91e1bdd

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

Biogptlogits (#41270)

4f1faa0

added logits slicing to BioGpt for seq classifier Signed-off-by: Aviral <aviralkamaljain@gmail.com>

Fix unnecessary single-item container checks (#41279)

9d67585

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

Fix pylint generator warnings (#41258)

89d5349

Fix pylint generator warnings Signed-off-by: cyy <cyyever@outlook.com>

feat: use aws-highcpu-32-priv for amd docker img build (#41285)

f8ec172

* feat: use `aws-highcpu-32-priv` for amd docker img build * feat: add `workflow_dispatch` event to docker build CI

Remove test_initialization (#41261)

27b9c79

remove it

Remove some previous team members from allow list of triggering Githu…

0995a48

…b Actions (#41263) * delete * delete --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Build doc in 2 jobs: en and other languages (#41290)

41eae7a

* separate * separate --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Fix mxfp4 dequantization (#41292)

aca2380

fix

[Flex Attn] Fix lse x attention sinks logic (#41249)

531bb75

fix

Italian translation for README.md (#41269)

40329a8

chore: add Italian translation for README.md

Fix README.md error when installing from source (#41303)

e656e26

download and use HF Hub Cache (#41181)

a6e9ec4

use hub cache Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

ArthurZucker changed the base branch from main to v4 October 3, 2025 10:05

ArthurZucker added 3 commits October 3, 2025 12:18

fix some merge issues

010896e

[test_all]

8270a0f

[test-all]

e6d8087

LysandreJik approved these changes Oct 3, 2025

View reviewed changes

LysandreJik marked this pull request as ready for review October 3, 2025 16:29

LysandreJik merged commit 2ccc6ca into v4 Oct 3, 2025
17 of 27 checks passed

LysandreJik deleted the v4-backup branch October 3, 2025 16:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v4.57.0 Branch #41310

v4.57.0 Branch #41310

Uh oh!

ArthurZucker commented Oct 3, 2025

Uh oh!

ArthurZucker commented Oct 3, 2025

Uh oh!

github-actions bot commented Oct 3, 2025

Uh oh!

ArthurZucker commented Oct 3, 2025

Uh oh!

github-actions bot commented Oct 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

62 participants

v4.57.0 Branch #41310

v4.57.0 Branch #41310

Uh oh!

Conversation

ArthurZucker commented Oct 3, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

ArthurZucker commented Oct 3, 2025

Uh oh!

github-actions bot commented Oct 3, 2025

Uh oh!

ArthurZucker commented Oct 3, 2025

Uh oh!

github-actions bot commented Oct 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

62 participants