fix(deps): update dependency transformers to v4.37.0 #1065
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
4.36.2
->4.37.0
Release Notes
huggingface/transformers (transformers)
v4.37.0
: v4.37 Qwen2, Phi-2, SigLIP, ViP-LLaVA, Fast2SpeechConformer, 4-bit serialization, Whisper longform generationCompare Source
Model releases
Qwen2
Qwen2 is the new model series of large language models from the Qwen team. Previously, the Qwen series was released, including Qwen-72B, Qwen-1.8B, Qwen-VL, Qwen-Audio, etc.
Qwen2 is a language model series including decoder language models of different model sizes. For each size, we release the base language model and the aligned chat model. It is based on the Transformer architecture with SwiGLU activation, attention QKV bias, group query attention, mixture of sliding window attention and full attention, etc. Additionally, we have an improved tokenizer adaptive to multiple natural languages and codes.
Phi-2
Phi-2 is a transformer language model trained by Microsoft with exceptionally strong performance for its small size of 2.7 billion parameters. It was previously available as a custom code model, but has now been fully integrated into transformers.
phi-2
example by @susnato in #28392softmax_scale
inPhiFlashAttention2
. by @gugarosa in #28537SigLIP
The SigLIP model was proposed in Sigmoid Loss for Language Image Pre-Training by Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, Lucas Beyer. SigLIP proposes to replace the loss function used in CLIP by a simple pairwise sigmoid loss. This results in better performance in terms of zero-shot classification accuracy on ImageNet.
ViP-LLaVA
The VipLlava model was proposed in Making Large Multimodal Models Understand Arbitrary Visual Prompts by Mu Cai, Haotian Liu, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Dennis Park, Yong Jae Lee.
VipLlava enhances the training protocol of Llava by marking images and interact with the model using natural cues like a “red bounding box” or “pointed arrow” during training.
FastSpeech2Conformer
The FastSpeech2Conformer model was proposed with the paper Recent Developments On Espnet Toolkit Boosted By Conformer by Pengcheng Guo, Florian Boyer, Xuankai Chang, Tomoki Hayashi, Yosuke Higuchi, Hirofumi Inaguma, Naoyuki Kamo, Chenda Li, Daniel Garcia-Romero, Jiatong Shi, Jing Shi, Shinji Watanabe, Kun Wei, Wangyou Zhang, and Yuekai Zhang.
FastSpeech 2 is a non-autoregressive model for text-to-speech (TTS) synthesis, which develops upon FastSpeech, showing improvements in training speed, inference speed and voice quality. It consists of a variance adapter; duration, energy and pitch predictor and waveform and mel-spectrogram decoder.
Wav2Vec2-BERT
The Wav2Vec2-BERT model was proposed in Seamless: Multilingual Expressive and Streaming Speech Translation by the Seamless Communication team from Meta AI.
This model was pre-trained on 4.5M hours of unlabeled audio data covering more than 143 languages. It requires finetuning to be used for downstream tasks such as Automatic Speech Recognition (ASR), or Audio Classification.
4-bit serialization
Enables saving and loading transformers models in 4bit formats - you can now push bitsandbytes 4-bit weights on Hugging Face Hub. To save 4-bit models and push them on the hub, simply install the latest
bitsandbytes
package from pypipip install -U bitsandbytes
, load your model in 4-bit precision and callsave_pretrained
/push_to_hub
. An example repo hereDocs
] Add 4-bit serialization docs by @younesbelkada in #281824D Attention mask
Enable passing in 4D attention masks to models that support it. This is useful for reducing memory footprint of certain generation tasks.
attention_mask
support by @poedator in #27539Improved quantization support
Ability to customise which modules are quantized and which are not.
Awq
] Enable the possibility to skip quantization for some target modules by @younesbelkada in #27950modules_in_block_to_quantize
arg in GPTQconfig by @SunMarc in #27956Added fused modules support
Awq
] Add llava fused modules support by @younesbelkada in #28239Mixtral
/Awq
] Add mixtral fused modules for Awq by @younesbelkada in #28240SDPA Support for LLaVa, Mixtral, Mistral
Llava
/Vip-Llava
] Add SDPA into llava by @younesbelkada in #28107Mixtral
&Mistral
] Add support for sdpa by @ArthurZucker in #28133Whisper: Batched state-of-the-art long-form transcription
All decoding strategies (temperature fallback, compression/log-prob/no-speech threshold, ...) of OpenAI's long-form transcription (see: https://github.com/openai/whisper or section 4.5 in paper) have been added. Contrary to https://github.com/openai/whisper, Transformers long-form transcription is fully compatible with pure FP16 and Batching!
For more information see: https://github.com/huggingface/transformers/pull/27658.
Generation: assisted generation upgrades, speculative decoding, and ngram speculation
Assisted generation was reworked to accept arbitrary sources of candidate sequences. This enabled us to smoothly integrate ngram speculation, and opens the door for new candidate generation methods. Additionally, we've added the speculative decoding strategy on top of assisted generation: when you call assisted generation with an assistant model and
do_sample=True
, you'll benefit from the faster speculative decoding sampling 🏎️💨assisted_decoding
now accepts arbitrary candidate generators by @gante in #27751generate
for the assistant by @gante in #28031torch.load pickle protection
Adding pickle protection via weights_only=True in the torch.load calls.
Build methods for TensorFlow Models
Unlike PyTorch, TensorFlow models build their weights "lazily" after model initialization, using the shape of their inputs to figure out what their weight shapes should be. We previously needed a full forward pass through TF models to ensure that all layers received an input they could use to build their weights, but with this change we now have proper
build()
methods that can correctly infer shapes and build model weights. This avoids a whole range of potential issues, as well as significantly accelerating model load times.Remove support for torch 1.10
The last version to support PyTorch 1.10 was 4.36.x. As it has been more than 2 years, and we're looking forward to using features available in PyTorch 1.11 and up, we do not support PyTorch 1.10 for v4.37 (i.e. we don't run the tests against torch 1.10).
Model tagging
You can now add custom tags into your model before pushing it on the Hub! This enables you to filter models that contain that tag on the Hub with a simple URL filter. For example if you want to filter models that have
trl
tag you can search: https://huggingface.co/models?other=trl\&sort=createdcore
/ FEAT] Add the possibility to push custom tags usingPreTrainedModel
itself by @younesbelkada in #28405 - e.g.Bugfixes and improvements
Mixtral
] Change mistral op order by @younesbelkada in #27955Tokenizer Serialization
] Fix the broken serialisation by @ArthurZucker in #27099Whisper
] raise better errors by @ArthurZucker in #27971CI slow
] Fix expected values by @ArthurZucker in #27999SeamlessM4TTokenizer
] Safe import by @ArthurZucker in #28026core
/modeling
] Fix training bug with PEFT + GC by @younesbelkada in #28031test_retain_grad_hidden_states_attentions
is flaky by @gante in #28035FA-2
] Fix fa-2 issue when passingconfig
tofrom_pretrained
by @younesbelkada in #28043Modeling
/Mixtral
] Fix GC + PEFT issues with Mixtral by @younesbelkada in #28061Mixtral
] update conversion script to reflect new changes by @younesbelkada in #28068test_retain_grad_hidden_states_attentions
by @ylacombe in #28060low_cpu_mem_usage
Flag Conflict with DeepSpeed Zero 3 infrom_pretrained
for Models withkeep_in_fp32_modules
" by @kotarotanahashi in #27762DISABLE_TELEMETRY
is used by @Wauplin in #28113Mixtral
] Fix loss + nits by @ArthurZucker in #28115CLIPConfig
by @ydshieh in #28108input_embeds
docstring in encoder-decoder architectures by @gante in #28168docs/source/en/perf_infer_gpu_one.md
by @ydshieh in #28198training_args.py
fix missing import with accelerate with versionaccelerate==0.20.1
by @michaelfeil in #28171feature_extractor_type
when loading an image processor file by @ydshieh in #28195Llava
] Fix llava index errors by @younesbelkada in #28032from_pretrained
under ZeRO-3 by @XuehaiPan in #28245_merge_input_ids_with_image_features
for llava model by @VictorSanh in #28333DeepSpeed
when using auto find batch size by @muellerzr in #28088cache_dir
forevaluate.load()
in example scripts by @aphedges in #28422TFTrainer
by @gante in #28483chore
] Update warning text, a word was missing by @tomaarsen in #28017finetuned_from
if it is a local path by @ydshieh in #28482task
arg inload_dataset
in image-classification example by @regisss in #28408TokenizationUtils
] Fixadd_special_tokens
when the token is already there by @ArthurZucker in #28520TokenizationRoformerFast
] Fix the save and loading by @ArthurZucker in #28527SpeechT5Tokenization
] Add copied from and fix theconvert_tokens_to_string
to match the fast decoding scheme by @ArthurZucker in #28522Processor
by @ydshieh in #27761weights_only
only if torch >= 1.13 by @ydshieh in #28506Core Tokenization
] Support a fix for spm fast models by @ArthurZucker in #26678LoggingLevel
context manager in 3 tests by @ydshieh in #28575processor_config.json
if a processor has no extra attribute by @ydshieh in #28584Significant community contributions
The following contributors have made significant changes to the library over the last release:
attention_mask
support (#27539)Configuration
📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).
🚦 Automerge: Enabled.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR has been generated by Mend Renovate. View repository job log here.