Skip to content

v4.4.0: S2T, M2M100, I-BERT, mBART-50, DeBERTa-v2, XLSR-Wav2Vec2

Compare
Choose a tag to compare
@LysandreJik LysandreJik released this 16 Mar 15:39

v4.4.0: S2T, M2M100, I-BERT, mBART-50, DeBERTa-v2, XLSR-Wav2Vec2

SpeechToText

Two new models are released as part of the S2T implementation: Speech2TextModel and Speech2TextForConditionalGeneration, in PyTorch.

Speech2Text is a speech model that accepts a float tensor of log-mel filter-bank features extracted from the speech signal. It’s a transformer-based seq2seq model, so the transcripts/translations are generated autoregressively.

The Speech2Text model was proposed in fairseq S2T: Fast Speech-to-Text Modeling with fairseq by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=speech_to_text

M2M100

Two new models are released as part of the M2M100 implementation: M2M100Model and M2M100ForConditionalGeneration, in PyTorch.

M2M100 is a multilingual encoder-decoder (seq-to-seq) model primarily intended for translation tasks.

The M2M100 model was proposed in Beyond English-Centric Multilingual Machine Translation by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=m2m_100

I-BERT

Six new models are released as part of the I-BERT implementation: IBertModel, IBertForMaskedLM, IBertForSequenceClassification, IBertForMultipleChoice, IBertForTokenClassification and IBertForQuestionAnswering, in PyTorch.

I-BERT is a quantized version of RoBERTa running inference up to four times faster.

The I-BERT framework in PyTorch allows to identify the best parameters for quantization. Once the model is exported in a framework that supports int8 execution (such as TensorRT), a speedup of up to 4x is visible, with no loss in performance thanks to the parameter search.

The I-BERT model was proposed in I-BERT: Integer-only BERT Quantization by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney and Kurt Keutzer.

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=ibert

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=speech_to_text

mBART-50

MBart-50 is created using the original mbart-large-cc25 checkpoint by extending its embedding layers with randomly initialized vectors for an extra set of 25 language tokens and then pretrained on 50 languages.

The MBart model was presented in Multilingual Translation with Extensible Multilingual Pretraining and Finetuning by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=mbart-50

DeBERTa-v2

Fixe new models are released as part of the DeBERTa-v2 implementation: DebertaV2Model, DebertaV2ForMaskedLM, DebertaV2ForSequenceClassification, DeberaV2ForTokenClassification and DebertaV2ForQuestionAnswering, in PyTorch.

The DeBERTa model was proposed in DeBERTa: Decoding-enhanced BERT with Disentangled Attention by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen. It is based on Google’s BERT model released in 2018 and Facebook’s RoBERTa model released in 2019.

It builds on RoBERTa with disentangled attention and enhanced mask decoder training with half of the data used in RoBERTa.

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=deberta-v2

Wav2Vec2

XLSR-Wav2Vec2

The XLSR-Wav2Vec2 model was proposed in Unsupervised Cross-Lingual Representation Learning For Speech Recognition by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.

The checkpoint corresponding to that model is added to the model hub: facebook/
wav2vec2-large-xlsr-53

Training script

A fine-tuning script showcasing how the Wav2Vec2 model can be trained has been added.

Further improvements

The Wav2Vec2 architecture becomes more stable as several changes are done to its architecture. This introduces feature extractors and feature processors as the pre-processing aspect of multi-modal speech models.

AMP & XLA Support for TensorFlow models

Most of the TensorFlow models are now compatible with automatic mixed precision and have XLA support.

  • Add AMP for TF Albert #10141 (@jplu)
  • Unlock XLA test for TF ConvBert #10207 (@jplu)
  • Making TF BART-like models XLA and AMP compliant #10191 (@jplu)
  • Making TF XLM-like models XLA and AMP compliant #10211 (@jplu)
  • Make TF CTRL compliant with XLA and AMP #10209 (@jplu)
  • Making TF GPT2 compliant with XLA and AMP #10230 (@jplu)
  • Making TF Funnel compliant with AMP #10216 (@jplu)
  • Making TF Lxmert model compliant with AMP #10257 (@jplu)
  • Making TF MobileBert model compliant with AMP #10259 (@jplu)
  • Making TF MPNet model compliant with XLA #10260 (@jplu)
  • Making TF T5 model compliant with AMP and XLA #10262 (@jplu)
  • Making TF TransfoXL model compliant with AMP #10264 (@jplu)
  • Making TF OpenAI GPT model compliant with AMP and XLA #10261 (@jplu)
  • Rework the AMP for TF XLNet #10274 (@jplu)
  • Making TF Longformer-like models compliant with AMP #10233 (@jplu)

SageMaker Trainer for model parallelism

We are rolling out experimental support for model parallelism on SageMaker with a new SageMakerTrainer that can be used in place of the regular Trainer. This is a temporary class that will be removed in a future version, the end goal is to have Trainer support this feature out of the box.

General improvements and bugfixes