Skip to content

v4.29.0: Transformers Agents, SAM, RWKV, FocalNet, OpenLLaMa

Compare
Choose a tag to compare
@LysandreJik LysandreJik released this 10 May 21:55
· 4953 commits to main since this release
15f260a

Transformers Agents

Transformers Agent is a new API that lets you use the library and Diffusers by prompting an agent (which is a large language model) in natural language. That agent will then output code using a set of predefined tools, leveraging the appropriate (and state-of-the-art) models for the task the user wants to perform. It is fully multimodal and extensible by the community. Learn more in the docs

SAM

SAM (Segment Anything Model) was proposed in Segment Anything by Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick.

The model can be used to predict segmentation masks of any object of interest given an input image.

RWKV

RWKV suggests a tweak in the traditional Transformer attention to make it linear. This way, the model can be used as recurrent network: passing inputs for timestamp 0 and timestamp 1 together is the same as passing inputs at timestamp 0, then inputs at timestamp 1 along with the state of timestamp 0 (see example below).

This can be more efficient than a regular Transformer and can deal with sentence of any length (even if the model uses a fixed context length for training).

FocalNet

The FocalNet model was proposed in Focal Modulation Networks by Jianwei Yang, Chunyuan Li, Xiyang Dai, Lu Yuan, Jianfeng Gao. FocalNets completely replace self-attention (used in models like ViT and Swin) by a focal modulation mechanism for modeling token interactions in vision. The authors claim that FocalNets outperform self-attention based models with similar computational costs on the tasks of image classification, object detection, and segmentation.

OpenLLaMa

The Open-Llama model was proposed in Open-Llama project by community developer s-JoL.

The model is mainly based on LLaMA with some modifications, incorporating memory-efficient attention from Xformers, stable embedding from Bloom, and shared input-output embedding from PLAM. And the model is pre-trained on both Chinese and English, which gives it better performance on Chinese language tasks.

Assisted Generation

Assisted generation is a new technique that lets you speed up generation with large language models by using a smaller model as assistant. The assistant model will be the ones doing multiple forward pass while the LLM will merely validate the tokens proposed by the assistant. This can lead to speed-ups up to 10x!

  • Generate: Add assisted generation by @gante in #22211
  • Generate: assisted generation with sample (take 2) by @gante in #22949

Code on the Hub from another repo

To avoid duplicating the model code in multiple repos when using the code on the Hub feature, loading such models will now save in their config the repo in which the code is. This way there is one source of ground truth for code on the Hub models.

Breaking changes

This releases has three breaking changes compared to version v4.28.0.

The first one focuses on fixing training issues for Pix2Struct. This slightly affects the results, but should result in the model training much better.

  • 🚨🚨🚨 [Pix2Struct] Attempts to fix training issues 🚨🚨🚨 by @younesbelkada in #23004

The second one is aligning the ignore index in the LUKE model to other models in the library. This breaks the convention that models should stick to their original implementation, but it was necessary in order to align with other transformers in the library

Finally, the third breaking change aims to harmonize the training procedure for most of recent additions in transformers. It should be users' responsibility to fill_mask the padding tokens of the labels with the correct value. This PR addresses the issue that was raised by other architectures such as Luke or Pix2Struct

Bugfixes and improvements

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @gabrielwithappy
    • 🌐 [i18n-KO] Translated training.mdx to Korean (#22670)
    • 🌐 [i18n-KO] Fix anchor links for docs auto_tutorial, training (#22796)
    • 🌐 [i18n-KO] translate create_a_model doc to Korean (#22754)
    • 🌐 [i18n-KO] docs: ko: Translate multiple_choice.mdx (#23064)
  • @0525hhgus
    • 🌐 [i18n-KO] Translated sequence_classification.mdx to Korean (#22655)
    • [i18n-KO] Translated accelerate.mdx to Korean (#22830)
    • 🌐 [i18n-KO] Translated token_classification.mdx to Korean (#22945)
    • 🌐 [i18n-KO] Translated model_sharing.mdx to Korean (#22991)
    • 🌐 [i18n-KO] Translated tasks/image_classification.mdx to Korean (#23048)
  • @sim-so
    • [WIP]🌐 [i18n-KO] Translated tutorial/proprecssing.mdx to Korean (#22578)
    • 🌐 [i18n-KO] Translated tasks/summarization.mdx to Korean (#22783)
    • 🌐 [i18n-KO] Translated tasks/image_captioning.mdx to Korean (#22943)
    • 🌐 [i18n-KO] Translated torchscript.mdx to Korean (#23060)
  • @HanNayeoniee
    • 🌐 [i18n-KO] Translated custom_models.mdx to Korean (#22534)
    • 🌐 [i18n-KO] Translated tasks/masked_language_modeling.mdx to Korean (#22838)
    • 🌐 [i18n-KO] Translated run_scripts.mdx to Korean (#22793)
    • 🌐 [i18n-KO] Fixed tasks/masked_language_modeling.mdx (#22965)
    • 🌐 [i18n-KO] Translated multilingual.mdx to Korean (#23008)
    • 🌐 [i18n-KO] Translated tasks/zero_shot_image_classification.mdx to Korean (#23065)
    • docs: ko: update _toctree.yml (#23112)
  • @wonhyeongseo
    • 🌐 [i18n-KO] Translated tasks/translation.mdx to Korean (#22805)
    • 🌐 [i18n-KO] Translated serialization.mdx to Korean (#22806)
  • @peter-sk
    • added GPTNeoXForTokenClassification (#23002)
    • added GPTNeoForTokenClassification (#22908)
    • GPT2ForQuestionAnswering (#23030)
    • GPTNeoForQuestionAnswering (#23057)
    • gpt2 multi-gpu fix (#23149)
    • GPTNeoXForQuestionAnswering (#23059)
  • @s-JoL
    • add open-llama model with ckpt (#22795)
  • @awinml
    • Add BioGPTForSequenceClassification (#22253)
    • Add no_trainer scripts to pre-train Vision Transformers (#23156)
    • Update LLaMA docs with arxiv link (#23191)
  • @raghavanone
    • Add FlaxWhisperForAudioClassification model (#22883)
    • Add FlaxWhisperForAudioClassification model (#23173)