Skip to content

v4.2.0: LED from AllenAI, Generation Scores, TensorFlow 2x speedup, faster import

Compare
Choose a tag to compare
@LysandreJik LysandreJik released this 13 Jan 15:13

v4.2.0: LED from AllenAI, encoder-decoder templates, fast imports

LED from AllenAI (@patrickvonplaten)

Four new models are released as part of the LED implementation: LEDModel, LEDForConditionalGeneration, LEDForSequenceClassification, LEDForQuestionAnswering, in PyTorch. The first two models have a TensorFlow version.

LED is the encoder-decoder variant of the Longformer model by allenai.

The LED model was proposed in Longformer: The Long-Document Transformer by Iz Beltagy, Matthew E. Peters, Arman Cohan.

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=led

Available notebooks:

Contributions:

Generation Scores & other outputs (@patrickvonplaten)

The PyTorch generation function now allows to return:

  • scores - the logits generated at each step
  • attentions - all attention weights at each generation step
  • hidden_states - all hidden states at each generation step

by simply adding return_dict_in_generate to the config or as an input to .generate()

Tweet:

Notebooks for a better explanation:

PR:

  • Add flags to return scores, hidden states and / or attention weights in GenerationMixin #9150 (@SBrandeis)

TensorFlow improvements

TensorFlow BERT-like model improvements (@jplu)

The TensorFlow version of the BERT-like models have been updated and are now twice as fast as the previous versions.

  • Improve BERT-like models performance with better self attention #9124 (@jplu)

Better integration in TensorFlow Serving (@jplu)

This version introduces a new API for TensorFlow saved models, which can now be exported with model.save_pretrained("path", saved_model=True) and easily loaded into a TensorFlow Serving environment.

DeepSpeed integration (@stas00)

Initial support for DeepSpeed to accelerate distributed training on several GPUs. This is an experimental feature that hasn't been fully tested yet, but early results are very encouraging (see this comment). Stay tuned for more details in the coming weeks!

Model templates (@patrickvonplaten)

The encoder-decoder version of the templates is now part of Transformers! Adding an encoder-decoder model is made very easy with this addition. More information can be found in the README.

Faster import (@sgugger)

The initialization process has been changed to only import what is required. Therefore, when using only PyTorch models, TensorFlow will not be imported and vice-versa. In the best situations the import of a transformers model now takes only a few hundreds of milliseconds (~200ms) compared to more than a few seconds (~3s) in previous versions.

Documentation highlights (@Qbiwan, @NielsRogge)

Some models now have improved documentation. The LayoutLM model has seen a general overhaul in its documentation thanks to @NielsRogge.

The tokenizer-only models Bertweet, Herbert and Phobert now have their own documentation pages thanks to @Qbiwan.

Breaking changes

There are no breaking changes between the previous version and this one.
This will be the first version to require TensorFlow >= 2.3.

General improvements and bugfixes