Release v4.37 Qwen2, Phi-2, SigLIP, ViP-LLaVA, Fast2SpeechConformer, 4-bit serialization, Whisper longform generation · huggingface/transformers

Model releases

Qwen2

Qwen2 is the new model series of large language models from the Qwen team. Previously, the Qwen series was released, including Qwen-72B, Qwen-1.8B, Qwen-VL, Qwen-Audio, etc.

Qwen2 is a language model series including decoder language models of different model sizes. For each size, we release the base language model and the aligned chat model. It is based on the Transformer architecture with SwiGLU activation, attention QKV bias, group query attention, mixture of sliding window attention and full attention, etc. Additionally, we have an improved tokenizer adaptive to multiple natural languages and codes.

Add qwen2 by @JustinLin610 in #28436

Phi-2

Phi-2 is a transformer language model trained by Microsoft with exceptionally strong performance for its small size of 2.7 billion parameters. It was previously available as a custom code model, but has now been fully integrated into transformers.

[Phi2] Add support for phi2 models by @susnato in #28211
[Phi] Extend implementation to use GQA/MQA. by @gugarosa in #28163
update docs to add the phi-2 example by @susnato in #28392
Fixes default value of softmax_scale in PhiFlashAttention2. by @gugarosa in #28537

SigLIP

The SigLIP model was proposed in Sigmoid Loss for Language Image Pre-Training by Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, Lucas Beyer. SigLIP proposes to replace the loss function used in CLIP by a simple pairwise sigmoid loss. This results in better performance in terms of zero-shot classification accuracy on ImageNet.

Add SigLIP by @NielsRogge in #26522
[SigLIP] Don't pad by default by @NielsRogge in #28578

ViP-LLaVA

The VipLlava model was proposed in Making Large Multimodal Models Understand Arbitrary Visual Prompts by Mu Cai, Haotian Liu, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Dennis Park, Yong Jae Lee.

VipLlava enhances the training protocol of Llava by marking images and interact with the model using natural cues like a “red bounding box” or “pointed arrow” during training.

Adds VIP-llava to transformers by @younesbelkada in #27932
Fix Vip-llava docs by @younesbelkada in #28085

FastSpeech2Conformer

The FastSpeech2Conformer model was proposed with the paper Recent Developments On Espnet Toolkit Boosted By Conformer by Pengcheng Guo, Florian Boyer, Xuankai Chang, Tomoki Hayashi, Yosuke Higuchi, Hirofumi Inaguma, Naoyuki Kamo, Chenda Li, Daniel Garcia-Romero, Jiatong Shi, Jing Shi, Shinji Watanabe, Kun Wei, Wangyou Zhang, and Yuekai Zhang.

FastSpeech 2 is a non-autoregressive model for text-to-speech (TTS) synthesis, which develops upon FastSpeech, showing improvements in training speed, inference speed and voice quality. It consists of a variance adapter; duration, energy and pitch predictor and waveform and mel-spectrogram decoder.

Add FastSpeech2Conformer by @connor-henderson in #23439

Wav2Vec2-BERT

The Wav2Vec2-BERT model was proposed in Seamless: Multilingual Expressive and Streaming Speech Translation by the Seamless Communication team from Meta AI.

This model was pre-trained on 4.5M hours of unlabeled audio data covering more than 143 languages. It requires finetuning to be used for downstream tasks such as Automatic Speech Recognition (ASR), or Audio Classification.

Add new meta w2v2-conformer BERT-like model by @ylacombe in #28165
Add w2v2bert to pipeline by @ylacombe in #28585

4-bit serialization

Enables saving and loading transformers models in 4bit formats - you can now push bitsandbytes 4-bit weights on Hugging Face Hub. To save 4-bit models and push them on the hub, simply install the latest bitsandbytes package from pypi pip install -U bitsandbytes, load your model in 4-bit precision and call save_pretrained / push_to_hub. An example repo here

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "facebook/opt-125m"
model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True)

model.push_to_hub("ybelkada/opt-125m-bnb-4bit")

[bnb] Let's make serialization of 4bit models possible by @poedator in #26037
[Docs] Add 4-bit serialization docs by @younesbelkada in #28182

4D Attention mask

Enable passing in 4D attention masks to models that support it. This is useful for reducing memory footprint of certain generation tasks.

4D attention_mask support by @poedator in #27539

Improved quantization support

Ability to customise which modules are quantized and which are not.

[Awq] Enable the possibility to skip quantization for some target modules by @younesbelkada in #27950
add modules_in_block_to_quantize arg in GPTQconfig by @SunMarc in #27956

Added fused modules support

[docs] Fused AWQ modules by @stevhliu in #27896
[Awq] Add llava fused modules support by @younesbelkada in #28239
[Mixtral / Awq] Add mixtral fused modules for Awq by @younesbelkada in #28240

SDPA Support for LLaVa, Mixtral, Mistral

Fix SDPA correctness following torch==2.1.2 regression by @fxmarty in #27973
[Llava / Vip-Llava] Add SDPA into llava by @younesbelkada in #28107
[Mixtral & Mistral] Add support for sdpa by @ArthurZucker in #28133
[SDPA] Make sure attn mask creation is always done on CPU by @patrickvonplaten in #28400
Fix SDPA tests by @fxmarty in #28552

Whisper: Batched state-of-the-art long-form transcription

All decoding strategies (temperature fallback, compression/log-prob/no-speech threshold, ...) of OpenAI's long-form transcription (see: https://github.com/openai/whisper or section 4.5 in paper) have been added. Contrary to https://github.com/openai/whisper, Transformers long-form transcription is fully compatible with pure FP16 and Batching!

For more information see: #27658.

[Whisper] Finalize batched SOTA long-form generation by @patrickvonplaten in #27658

Generation: assisted generation upgrades, speculative decoding, and ngram speculation

Assisted generation was reworked to accept arbitrary sources of candidate sequences. This enabled us to smoothly integrate ngram speculation, and opens the door for new candidate generation methods. Additionally, we've added the speculative decoding strategy on top of assisted generation: when you call assisted generation with an assistant model and do_sample=True, you'll benefit from the faster speculative decoding sampling 🏎️💨

Generate: assisted_decoding now accepts arbitrary candidate generators by @gante in #27751
Generate: assisted decoding now uses generate for the assistant by @gante in #28031
Generate: speculative decoding by @gante in #27979
Generate: fix speculative decoding by @gante in #28166
Adding Prompt lookup decoding by @apoorvumang in #27775
Fix _speculative_sampling implementation by @ofirzaf in #28508

torch.load pickle protection

Adding pickle protection via weights_only=True in the torch.load calls.

make torch.load a bit safer by @julien-c in #27282

Build methods for TensorFlow Models

Unlike PyTorch, TensorFlow models build their weights "lazily" after model initialization, using the shape of their inputs to figure out what their weight shapes should be. We previously needed a full forward pass through TF models to ensure that all layers received an input they could use to build their weights, but with this change we now have proper build() methods that can correctly infer shapes and build model weights. This avoids a whole range of potential issues, as well as significantly accelerating model load times.

Proper build() methods for TF by @Rocketknight1 in #27794
Replace build() with build_in_name_scope() for some TF tests by @Rocketknight1 in #28046
More TF fixes by @Rocketknight1 in #28081
Even more TF test fixes by @Rocketknight1 in #28146

Remove support for torch 1.10

The last version to support PyTorch 1.10 was 4.36.x. As it has been more than 2 years, and we're looking forward to using features available in PyTorch 1.11 and up, we do not support PyTorch 1.10 for v4.37 (i.e. we don't run the tests against torch 1.10).

Byebye torch 1.10 by @ydshieh in #28207

Model tagging

You can now add custom tags into your model before pushing it on the Hub! This enables you to filter models that contain that tag on the Hub with a simple URL filter. For example if you want to filter models that have trl tag you can search: https://huggingface.co/models?other=trl&sort=created

[core/ FEAT] Add the possibility to push custom tags using PreTrainedModel itself by @younesbelkada in #28405 - e.g.

from transformers import AutoModelForCausalLM

model_name = "HuggingFaceM4/tiny-random-LlamaForCausalLM"
model = AutoModelForCausalLM.from_pretrained(model_name)

model.add_model_tags(["tag-test"])
model.push_to_hub("llama-tagged")

Bugfixes and improvements

Fix PatchTSMixer Docstrings by @vijaye12 in #27943
use logger.warning_once to avoid massive outputs by @ranchlai in #27428
Docs for AutoBackbone & Backbone by @merveenoyan in #27456
Fix test for auto_find_batch_size on multi-GPU by @muellerzr in #27947
Update import message by @NielsRogge in #27946
Fix parameter count in readme for mixtral 45b by @CyberTimon in #27945
In PreTrainedTokenizerBase add missing word in error message by @petergtz in #27949
Fix AMD scheduled CI not triggered by @ydshieh in #27951
Add deepspeed test to amd scheduled CI by @echarlaix in #27633
Fix a couple of typos and add an illustrative test by @rjenc29 in #26941
fix bug in mask2former: cost matrix is infeasible by @xuchenhao001 in #27897
Fix for stochastic depth decay rule in the TimeSformer implementation by @atawari in #27875
fix no sequence length models error by @AdamLouly in #27522
[Mixtral] Change mistral op order by @younesbelkada in #27955
Update bounding box format everywhere by @NielsRogge in #27944
Support PeftModel signature inspect by @dancingpipi in #27865
fixed typos (issue 27919) by @asusevski in #27920
Hot-fix-mixstral-loss by @ArthurZucker in #27948
Fix link in README.md of Image Captioning by @saswatmeher in #27969
Better key error for AutoConfig by @Rocketknight1 in #27976
[doc] fix typo by @stas00 in #27981
fix typo in dvclive callback by @dberenbaum in #27983
[Tokenizer Serialization] Fix the broken serialisation by @ArthurZucker in #27099
[Whisper] raise better errors by @ArthurZucker in #27971
Fix PatchTSMixer slow tests by @ajati in #27997
[CI slow] Fix expected values by @ArthurZucker in #27999
Fix bug with rotating checkpoints by @muellerzr in #28009
[Doc] Spanish translation of glossary.md by @aaronjimv in #27958
Add model_docs from cpmant.md to derformable_detr.md by @rajveer43 in #27884
well well well by @ArthurZucker in #28011
[SeamlessM4TTokenizer] Safe import by @ArthurZucker in #28026
[core / modeling] Fix training bug with PEFT + GC by @younesbelkada in #28031
Fix AMD push CI not triggered by @ydshieh in #28029
SeamlessM4T: test_retain_grad_hidden_states_attentions is flaky by @gante in #28035
Fix languages covered by M4Tv2 by @ylacombe in #28019
Fixed spelling error in T5 tokenizer warning message (s/thouroughly/t… by @jeddobson in #28014
Generate: Mistral/Mixtral FA2 cache fix when going beyond the context window by @gante in #28037
[Seamless] Fix links in docs by @sanchit-gandhi in #27905
Remove warning when Annotion enum is created by @amyeroberts in #28048
[FA-2] Fix fa-2 issue when passing config to from_pretrained by @younesbelkada in #28043
[Modeling / Mixtral] Fix GC + PEFT issues with Mixtral by @younesbelkada in #28061
[Flax BERT] Update deprecated 'split' method by @sanchit-gandhi in #28012
[Flax LLaMA] Fix attn dropout by @sanchit-gandhi in #28059
Remove SpeechT5 deprecated argument by @ylacombe in #28062
doc: Correct spelling mistake by @caiyili in #28064
[Mixtral] update conversion script to reflect new changes by @younesbelkada in #28068
Skip M4T test_retain_grad_hidden_states_attentions by @ylacombe in #28060
[LLaVa] Add past_key_values to _skip_keys_device_placement to fix multi-GPU dispatch by @aismlv in #28051
Make GPT2 traceable in meta state by @kwen2501 in #28054
Fix bug for checkpoint saving on multi node training setting by @dumpmemory in #28078
Update fixtures-image-utils by @lhoestq in #28080
Fix low_cpu_mem_usage Flag Conflict with DeepSpeed Zero 3 in from_pretrained for Models with keep_in_fp32_modules" by @kotarotanahashi in #27762
Fix wrong examples in llava usage. by @Lyken17 in #28020
[docs] Trainer by @stevhliu in #27986
[docs] MPS by @stevhliu in #28016
fix resuming from ckpt when using FSDP with FULL_STATE_DICT by @pacman100 in #27891
Fix the deprecation warning of _torch_pytree._register_pytree_node by @cyyever in #27803
Spelling correction by @saeneas in #28110
in peft finetune, only the trainable parameters need to be saved by @sywangyi in #27825
fix ConversationalPipeline docstring by @not-lain in #28091
Disable jitter noise during evaluation in SwitchTransformers by @DaizeDong in #28077
Remove warning if DISABLE_TELEMETRY is used by @Wauplin in #28113
Fix indentation error - semantic_segmentation.md by @rajveer43 in #28117
[docs] General doc fixes by @stevhliu in #28087
Fix a typo in tokenizer documentation by @mssalvatore in #28118
[Doc] Fix token link in What 🤗 Transformers can do by @aaronjimv in #28123
When save a model on TPU, make a copy to be moved to CPU by @qihqi in #27993
Update split string in doctest to reflect #28087 by @amyeroberts in #28135
[Mixtral] Fix loss + nits by @ArthurZucker in #28115
Update modeling_utils.py by @mzelling in #28127
[docs] Fix mistral link in mixtral.md by @aaronjimv in #28143
Remove deprecated CPU dockerfiles by @ashahba in #28149
Fix FA2 integration by @pacman100 in #28142
[gpt-neox] Add attention_bias config to support model trained without attention biases by @dalgarak in #28126
move code to Trainer.evaluate to enable use of that function with multiple datasets by @peter-sk in #27844
Fix weights not properly initialized due to shape mismatch by @ydshieh in #28122
Avoid unnecessary warnings when loading CLIPConfig by @ydshieh in #28108
Update FA2 exception msg to point to hub discussions by @amyeroberts in #28161
Align backbone stage selection with out_indices & out_features by @amyeroberts in #27606
[docs] Trainer docs by @stevhliu in #28145
Fix yolos resizing by @amyeroberts in #27663
disable test_retain_grad_hidden_states_attentions on SeamlessM4TModelWithTextInputTest by @dwyatte in #28169
Fix input_embeds docstring in encoder-decoder architectures by @gante in #28168
[Whisper] Use torch for stft if available by @sanchit-gandhi in #26119
Fix slow backbone tests - out_indices must match stage name ordering by @amyeroberts in #28186
Update YOLOS slow test values by @amyeroberts in #28187
Update docs/source/en/perf_infer_gpu_one.md by @ydshieh in #28198
Fix ONNX export for causal LM sequence classifiers by removing reverse indexing by @dwyatte in #28144
Add Swinv2 backbone by @NielsRogge in #27742
Fix: [SeamlessM4T - S2TT] Bug in batch loading of audio in torch.Tensor format in the SeamlessM4TFeatureExtractor class by @nicholasneo78 in #27914
Bug: training_args.py fix missing import with accelerate with version accelerate==0.20.1 by @michaelfeil in #28171
Fix the check of models supporting FA/SDPA not run by @ydshieh in #28202
Drop feature_extractor_type when loading an image processor file by @ydshieh in #28195
[Whisper] Fix word-level timestamps with bs>1 or num_beams>1 by @ylacombe in #28114
Fixing visualization code for object detection to support both types of bounding box. by @Anindyadeep in #27842
update the logger message with accordant weights_file_name by @izyForever in #28181
[Llava] Fix llava index errors by @younesbelkada in #28032
fix FA2 when using quantization by @pacman100 in #28203
small typo by @stas00 in #28229
Update docs around mixing hf scheduler with deepspeed optimizer by @dwyatte in #28223
Fix trainer saving safetensors: metadata is None by @hiyouga in #28219
fix bug:divide by zero in _maybe_log_save_evaluate() by @frankenliu in #28251
[Whisper] Fix errors with MPS backend introduced by new code on word-level timestamps computation by @ercaronte in #28288
Remove fast tokenization warning in Data Collators by @dbuos in #28213
fix documentation for zero_shot_object_detection by @not-lain in #28267
Remove token_type_ids from model_input_names (like #24788) by @Apsod in #28325
Translate contributing.md into Chinese by @Mayfsz in #28243
[docs] Sort es/toctree.yml | Translate performance.md by @aaronjimv in #28262
Fix error in M4T feature extractor by @ylacombe in #28340
README: install transformers from conda-forge channel by @kevherro in #28313
Don't check the device when device_map=auto by @yuanwu2017 in #28351
Fix pos_mask application and update tests accordingly by @ferjorosa in #27892
fix FA2 when using quantization for remaining models by @susnato in #28341
Update VITS modeling to enable ONNX export by @echarlaix in #28141
chore: Fix typo s/exclusivelly/exclusively/ by @hugo-syn in #28361
Enhancing Code Readability and Maintainability with Simplified Activation Function Selection. by @hi-sushanta in #28349
Fix building alibi tensor when num_heads is not a power of 2 by @abuelnasr0 in #28380
remove two deprecated function by @statelesshz in #28220
Bugfix / ffmpeg input device (mic) not working on Windows by @Teapack1 in #27051
[AttentionMaskConverter] fix sdpa unmask unattended by @zspo in #28369
Remove shell=True from subprocess.Popen to Mitigate Security Risk by @avimanyu786 in #28299
Add segmentation map processing to SAM Image Processor by @rwood-97 in #27463
update warning for image processor loading by @ydshieh in #28209
Fix initialization for missing parameters in from_pretrained under ZeRO-3 by @XuehaiPan in #28245
Fix _merge_input_ids_with_image_features for llava model by @VictorSanh in #28333
Use mmap option to load_state_dict by @weimingzha0 in #28331
[BUG] BarkEosPrioritizerLogitsProcessor eos_token_id use list, tensor size mismatch by @inkinworld in #28201
Skip now failing test in the Trainer tests by @muellerzr in #28421
Support DeepSpeed when using auto find batch size by @muellerzr in #28088
Fix number of models in README.md by @prasatee in #28430
CI: limit natten version by @gante in #28432
Fix for checkpoint rename race condition by @tblattner in #28364
Fix load correct tokenizer in Mixtral model documentation by @JuanFKurucz in #28437
[docstring] Fix docstring for ErnieConfig, ErnieMConfig by @Sparty in #27029
[Whisper] Fix slow test by @patrickvonplaten in #28407
Assitant model may on a different device by @jiqing-feng in #27995
Enable multi-label image classification in pipeline by @amyeroberts in #28433
Optimize the speed of the truncate_sequences function. by @ikkvix in #28263
Use python 3.10 for docbuild by @ydshieh in #28399
Fix docker file by @ydshieh in #28452
Set cache_dir for evaluate.load() in example scripts by @aphedges in #28422
Optionally preprocess segmentation maps for MobileViT by @harisankar95 in #28420
Correctly resolve trust_remote_code=None for AutoTokenizer by @Rocketknight1 in #28419
Fix load balancing loss func for mixtral by @liangxuZhang in #28256
Doc by @jiqing-feng in #28431
Fix docstring checker issues with PIL enums by @Rocketknight1 in #28450
Fix broken link on page by @keenranger in #28451
Mark two logger tests as flaky by @amyeroberts in #28458
Update metadata loading for oneformer by @amyeroberts in #28398
Fix torch.ones usage in xlnet by @sungho-ham in #28471
Generate: deprecate old public functions by @gante in #28478
Docs: add model paths by @gante in #28475
Generate: refuse to save bad generation config files by @gante in #28477
TF: purge TFTrainer by @gante in #28483
Fix docstrings and update docstring checker error message by @Rocketknight1 in #28460
Change progress logging to once across all nodes by @siddartha-RE in #28373
Generate: fix candidate device placement by @gante in #28493
Fix paths to AI Sweden Models reference and model loading by @JuanFKurucz in #28423
[chore] Update warning text, a word was missing by @tomaarsen in #28017
Don't set finetuned_from if it is a local path by @ydshieh in #28482
Add the XPU device check for pipeline mode by @yuanwu2017 in #28326
Tokenizer kwargs in textgeneration pipe by @thedamnedrhino in #28362
[GPTQ] Fix test by @SunMarc in #28018
Fixed minor typos by @rishit5 in #28489
Add a use_safetensors arg to TFPreTrainedModel.from_pretrained() by @Rocketknight1 in #28511
Generate: consolidate output classes by @gante in #28494
fix: sampling in flax keeps EOS by @borisdayma in #28378
improve dev setup comments and hints by @4imothy in #28495
SiLU activation wrapper for safe importing by @amyeroberts in #28509
Remove task arg in load_dataset in image-classification example by @regisss in #28408
Improving Training Performance and Scalability Documentation by @HamzaFB in #28497
Fix mismatching loading in from_pretrained with/without accelerate by @fxmarty in #28414
Fix/speecht5 bug by @NimaYaqmuri in #28481
[ TokenizationUtils] Fix add_special_tokens when the token is already there by @ArthurZucker in #28520
[TokenizationRoformerFast] Fix the save and loading by @ArthurZucker in #28527
[SpeechT5Tokenization] Add copied from and fix the convert_tokens_to_string to match the fast decoding scheme by @ArthurZucker in #28522
Clearer error for SDPA when explicitely requested by @fxmarty in #28006
Add is_model_supported for fx by @inisis in #28521
Config: warning when saving generation kwargs in the model config by @gante in #28514
[Makefile] Exclude research projects from format by @patrickvonplaten in #28551
symbolic_trace: add past_key_values, llama, sdpa support by @fxmarty in #28447
Allow to train dinov2 with different dtypes like bf16 by @StarCycle in #28504
Fix Switch Transformers When sparse_step = 1 by @agemagician in #28564
Save Processor by @ydshieh in #27761
Use weights_only only if torch >= 1.13 by @ydshieh in #28506
[Core Tokenization] Support a fix for spm fast models by @ArthurZucker in #26678
Use LoggingLevel context manager in 3 tests by @ydshieh in #28575
Fix the documentation checkpoint for xlm-roberta-xl by @jeremyfowers in #28567
[ASR Pipe] Update init to set model type and subsequently call parent init method by @sanchit-gandhi in #28486
[Whisper Tok] Move token ids to CPU when computing offsets by @sanchit-gandhi in #28485
[Whisper] Fix audio classification with weighted layer sum by @sanchit-gandhi in #28563
Making CTC training example more general by @ylacombe in #28582
Don't save processor_config.json if a processor has no extra attribute by @ydshieh in #28584
Fix wrong xpu device in DistributedType.MULTI_XPU mode by @faaany in #28386
[GPTNeoX] Fix BC issue with 4.36 by @ArthurZucker in #28602

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@aaronjimv
- [Doc] Spanish translation of glossary.md (#27958)
- [Doc] Fix token link in What 🤗 Transformers can do (#28123)
- [docs] Fix mistral link in mixtral.md (#28143)
- [docs] Sort es/toctree.yml | Translate performance.md (#28262)
@rajveer43
- Add model_docs from cpmant.md to derformable_detr.md (#27884)
- Fix indentation error - semantic_segmentation.md (#28117)
@poedator
- 4D attention_mask support (#27539)
- [bnb] Let's make serialization of 4bit models possible (#26037)
@connor-henderson
- Add FastSpeech2Conformer (#23439)
@JustinLin610
- Add qwen2 (#28436)
@SangbumChoi
- enable training mask2former and maskformer for transformers trainer by @SangbumChoi in #28277
- [DETA] Improvement and Sync from DETA especially for training by @SangbumChoi in #27990
- fix auxiliary loss training in DetrSegmentation by @SangbumChoi in #28354

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v4.37 Qwen2, Phi-2, SigLIP, ViP-LLaVA, Fast2SpeechConformer, 4-bit serialization, Whisper longform generation