English | 简体中文 | 繁體中文 | 한국어 | Español | 日本語 | हिन्दी | Русский | Рortuguês | తెలుగు | Français | Deutsch | Tiếng Việt |
🤗 Transformers 提供了数以千计的预训练模型,支持 100 多种语言的文本分类、信息抽取、问答、摘要、翻译、文本生成。它的宗旨是让最先进的 NLP 技术人人易用。
🤗 Transformers 提供了便于快速下载和使用的API,让你可以把预训练模型用在给定文本、在你的数据集上微调然后通过 model hub 与社区共享。同时,每个定义的 Python 模块均完全独立,方便修改和快速研究实验。
🤗 Transformers 支持三个最热门的深度学习库: Jax, PyTorch 以及 TensorFlow — 并与之无缝整合。你可以直接使用一个框架训练你的模型然后用另一个加载和推理。
你可以直接在模型页面上测试大多数 model hub 上的模型。 我们也提供了 私有模型托管、模型版本管理以及推理API。
这里是一些例子:
- 用 BERT 做掩码填词
- 用 Electra 做命名实体识别
- 用 GPT-2 做文本生成
- 用 RoBERTa 做自然语言推理
- 用 BART 做文本摘要
- 用 DistilBERT 做问答
- 用 T5 做翻译
Write With Transformer,由抱抱脸团队打造,是一个文本生成的官方 demo。
我们为快速使用模型提供了 pipeline
(流水线)API。流水线聚合了预训练模型和对应的文本预处理。下面是一个快速使用流水线去判断正负面情绪的例子:
>>> from transformers import pipeline
# 使用情绪分析流水线
>>> classifier = pipeline('sentiment-analysis')
>>> classifier('We are very happy to introduce pipeline to the transformers repository.')
[{'label': 'POSITIVE', 'score': 0.9996980428695679}]
第二行代码下载并缓存了流水线使用的预训练模型,而第三行代码则在给定的文本上进行了评估。这里的答案“正面” (positive) 具有 99 的置信度。
许多的 NLP 任务都有开箱即用的预训练流水线。比如说,我们可以轻松的从给定文本中抽取问题答案:
>>> from transformers import pipeline
# 使用问答流水线
>>> question_answerer = pipeline('question-answering')
>>> question_answerer({
... 'question': 'What is the name of the repository ?',
... 'context': 'Pipeline has been included in the huggingface/transformers repository'
... })
{'score': 0.30970096588134766, 'start': 34, 'end': 58, 'answer': 'huggingface/transformers'}
除了给出答案,预训练模型还给出了对应的置信度分数、答案在词符化 (tokenized) 后的文本中开始和结束的位置。你可以从这个教程了解更多流水线API支持的任务。
要在你的任务上下载和使用任意预训练模型也很简单,只需三行代码。这里是 PyTorch 版的示例:
>>> from transformers import AutoTokenizer, AutoModel
>>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
>>> model = AutoModel.from_pretrained("google-bert/bert-base-uncased")
>>> inputs = tokenizer("Hello world!", return_tensors="pt")
>>> outputs = model(**inputs)
这里是等效的 TensorFlow 代码:
>>> from transformers import AutoTokenizer, TFAutoModel
>>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
>>> model = TFAutoModel.from_pretrained("google-bert/bert-base-uncased")
>>> inputs = tokenizer("Hello world!", return_tensors="tf")
>>> outputs = model(**inputs)
词符化器 (tokenizer) 为所有的预训练模型提供了预处理,并可以直接对单个字符串进行调用(比如上面的例子)或对列表 (list) 调用。它会输出一个你可以在下游代码里使用或直接通过 **
解包表达式传给模型的词典 (dict)。
模型本身是一个常规的 Pytorch nn.Module
或 TensorFlow tf.keras.Model
(取决于你的后端),可以常规方式使用。 这个教程解释了如何将这样的模型整合到经典的 PyTorch 或 TensorFlow 训练循环中,或是如何使用我们的 Trainer
训练器)API 来在一个新的数据集上快速微调。
-
便于使用的先进模型:
- NLU 和 NLG 上表现优越
- 对教学和实践友好且低门槛
- 高级抽象,只需了解三个类
- 对所有模型统一的API
-
更低计算开销,更少的碳排放:
- 研究人员可以分享已训练的模型而非每次从头开始训练
- 工程师可以减少计算用时和生产环境开销
- 数十种模型架构、两千多个预训练模型、100多种语言支持
-
对于模型生命周期的每一个部分都面面俱到:
- 训练先进的模型,只需 3 行代码
- 模型在不同深度学习框架间任意转移,随你心意
- 为训练、评估和生产选择最适合的框架,衔接无缝
-
为你的需求轻松定制专属模型和用例:
- 我们为每种模型架构提供了多个用例来复现原论文结果
- 模型内部结构保持透明一致
- 模型文件可单独使用,方便魔改和快速实验
- 本库并不是模块化的神经网络工具箱。模型文件中的代码特意呈若璞玉,未经额外抽象封装,以便研究人员快速迭代魔改而不致溺于抽象和文件跳转之中。
Trainer
API 并非兼容任何模型,只为本库之模型优化。若是在寻找适用于通用机器学习的训练循环实现,请另觅他库。- 尽管我们已尽力而为,examples 目录中的脚本也仅为用例而已。对于你的特定问题,它们并不一定开箱即用,可能需要改几行代码以适之。
这个仓库已在 Python 3.8+、Flax 0.4.1+、PyTorch 1.11+ 和 TensorFlow 2.6+ 下经过测试。
你可以在虚拟环境中安装 🤗 Transformers。如果你还不熟悉 Python 的虚拟环境,请阅此用户说明。
首先,用你打算使用的版本的 Python 创建一个虚拟环境并激活。
然后,你需要安装 Flax、PyTorch 或 TensorFlow 其中之一。关于在你使用的平台上安装这些框架,请参阅 TensorFlow 安装页, PyTorch 安装页 或 Flax 安装页。
当这些后端之一安装成功后, 🤗 Transformers 可依此安装:
pip install transformers
如果你想要试试用例或者想在正式发布前使用最新的开发中代码,你得从源代码安装。
🤗 Transformers 可以通过 conda 依此安装:
conda install conda-forge::transformers
笔记: 从
huggingface
渠道安装transformers
已被废弃。
要通过 conda 安装 Flax、PyTorch 或 TensorFlow 其中之一,请参阅它们各自安装页的说明。
🤗 Transformers 支持的所有的模型检查点由用户和组织上传,均与 huggingface.co model hub 无缝整合。
🤗 Transformers 目前支持如下的架构(模型概述请阅这里):
- ALBERT (来自 Google Research and the Toyota Technological Institute at Chicago) 伴随论文 ALBERT: A Lite BERT for Self-supervised Learning of Language Representations, 由 Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut 发布。
- ALIGN (来自 Google Research) 伴随论文 Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision 由 Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig 发布。
- AltCLIP (来自 BAAI) 伴随论文 AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities 由 Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell 发布。
- Audio Spectrogram Transformer (来自 MIT) 伴随论文 AST: Audio Spectrogram Transformer 由 Yuan Gong, Yu-An Chung, James Glass 发布。
- Autoformer (from Tsinghua University) released with the paper Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting by Haixu Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long.
- Bark (from Suno) released in the repository suno-ai/bark by Suno AI team.
- BART (来自 Facebook) 伴随论文 BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension 由 Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer 发布。
- BARThez (来自 École polytechnique) 伴随论文 BARThez: a Skilled Pretrained French Sequence-to-Sequence Model 由 Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis 发布。
- BARTpho (来自 VinAI Research) 伴随论文 BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese 由 Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen 发布。
- BEiT (来自 Microsoft) 伴随论文 BEiT: BERT Pre-Training of Image Transformers 由 Hangbo Bao, Li Dong, Furu Wei 发布。
- BERT (来自 Google) 伴随论文 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 由 Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova 发布。
- BERT For Sequence Generation (来自 Google) 伴随论文 Leveraging Pre-trained Checkpoints for Sequence Generation Tasks 由 Sascha Rothe, Shashi Narayan, Aliaksei Severyn 发布。
- BERTweet (来自 VinAI Research) 伴随论文 BERTweet: A pre-trained language model for English Tweets 由 Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen 发布。
- BigBird-Pegasus (来自 Google Research) 伴随论文 Big Bird: Transformers for Longer Sequences 由 Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed 发布。
- BigBird-RoBERTa (来自 Google Research) 伴随论文 Big Bird: Transformers for Longer Sequences 由 Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed 发布。
- BioGpt (来自 Microsoft Research AI4Science) 伴随论文 BioGPT: generative pre-trained transformer for biomedical text generation and mining 由 Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon and Tie-Yan Liu 发布。
- BiT (来自 Google AI) 伴随论文 [Big Transfer (BiT) 由 Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, Neil Houlsby 发布。
- Blenderbot (来自 Facebook) 伴随论文 Recipes for building an open-domain chatbot 由 Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston 发布。
- BlenderbotSmall (来自 Facebook) 伴随论文 Recipes for building an open-domain chatbot 由 Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston 发布。
- BLIP (来自 Salesforce) 伴随论文 BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation 由 Junnan Li, Dongxu Li, Caiming Xiong, Steven Hoi 发布。
- BLIP-2 (来自 Salesforce) 伴随论文 BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models 由 Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi 发布。
- BLOOM (from BigScience workshop) released by the BigScience Workshop.
- BORT (来自 Alexa) 伴随论文 Optimal Subarchitecture Extraction For BERT 由 Adrian de Wynter and Daniel J. Perry 发布。
- BridgeTower (from Harbin Institute of Technology/Microsoft Research Asia/Intel Labs) released with the paper BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning by Xiao Xu, Chenfei Wu, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan.
- BROS (来自 NAVER CLOVA) 伴随论文 BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents 由 Teakgyu Hong, Donghyun Kim, Mingi Ji, Wonseok Hwang, Daehyun Nam, Sungrae Park 发布。
- ByT5 (来自 Google Research) 伴随论文 ByT5: Towards a token-free future with pre-trained byte-to-byte models 由 Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel 发布。
- CamemBERT (来自 Inria/Facebook/Sorbonne) 伴随论文 CamemBERT: a Tasty French Language Model 由 Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot 发布。
- CANINE (来自 Google Research) 伴随论文 CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation 由 Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting 发布。
- Chinese-CLIP (来自 OFA-Sys) 伴随论文 Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese 由 An Yang, Junshu Pan, Junyang Lin, Rui Men, Yichang Zhang, Jingren Zhou, Chang Zhou 发布。
- CLAP (来自 LAION-AI) 伴随论文 Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation 由 Yusong Wu, Ke Chen, Tianyu Zhang, Yuchen Hui, Taylor Berg-Kirkpatrick, Shlomo Dubnov 发布。
- CLIP (来自 OpenAI) 伴随论文 Learning Transferable Visual Models From Natural Language Supervision 由 Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever 发布。
- CLIPSeg (来自 University of Göttingen) 伴随论文 Image Segmentation Using Text and Image Prompts 由 Timo Lüddecke and Alexander Ecker 发布。
- CLVP released with the paper Better speech synthesis through scaling by James Betker.
- CodeGen (来自 Salesforce) 伴随论文 A Conversational Paradigm for Program Synthesis 由 Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong 发布。
- CodeLlama (来自 MetaAI) 伴随论文 Code Llama: Open Foundation Models for Code 由 Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve 发布。
- Cohere (来自 Cohere) 伴随论文 Command-R: Retrieval Augmented Generation at Production Scale 由 Cohere 发布。
- Conditional DETR (来自 Microsoft Research Asia) 伴随论文 Conditional DETR for Fast Training Convergence 由 Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang 发布。
- ConvBERT (来自 YituTech) 伴随论文 ConvBERT: Improving BERT with Span-based Dynamic Convolution 由 Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan 发布。
- ConvNeXT (来自 Facebook AI) 伴随论文 A ConvNet for the 2020s 由 Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie 发布。
- ConvNeXTV2 (from Facebook AI) released with the paper ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders by Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie.
- CPM (来自 Tsinghua University) 伴随论文 CPM: A Large-scale Generative Chinese Pre-trained Language Model 由 Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun 发布。
- CPM-Ant (from OpenBMB) released by the OpenBMB.
- CTRL (来自 Salesforce) 伴随论文 CTRL: A Conditional Transformer Language Model for Controllable Generation 由 Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher 发布。
- CvT (来自 Microsoft) 伴随论文 CvT: Introducing Convolutions to Vision Transformers 由 Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang 发布。
- Data2Vec (来自 Facebook) 伴随论文 Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language 由 Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli 发布。
- DeBERTa (来自 Microsoft) 伴随论文 DeBERTa: Decoding-enhanced BERT with Disentangled Attention 由 Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen 发布。
- DeBERTa-v2 (来自 Microsoft) 伴随论文 DeBERTa: Decoding-enhanced BERT with Disentangled Attention 由 Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen 发布。
- Decision Transformer (来自 Berkeley/Facebook/Google) 伴随论文 Decision Transformer: Reinforcement Learning via Sequence Modeling 由 Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch 发布。
- Deformable DETR (来自 SenseTime Research) 伴随论文 Deformable DETR: Deformable Transformers for End-to-End Object Detection 由 Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai 发布。
- DeiT (来自 Facebook) 伴随论文 Training data-efficient image transformers & distillation through attention 由 Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou 发布。
- DePlot (来自 Google AI) 伴随论文 DePlot: One-shot visual language reasoning by plot-to-table translation 由 Fangyu Liu, Julian Martin Eisenschlos, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Wenhu Chen, Nigel Collier, Yasemin Altun 发布。
- Depth Anything (来自 University of Hong Kong and TikTok) 伴随论文 Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data 由 Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao 发布。
- DETA (来自 The University of Texas at Austin) 伴随论文 NMS Strikes Back 由 Jeffrey Ouyang-Zhang, Jang Hyun Cho, Xingyi Zhou, Philipp Krähenbühl 发布。
- DETR (来自 Facebook) 伴随论文 End-to-End Object Detection with Transformers 由 Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko 发布。
- DialoGPT (来自 Microsoft Research) 伴随论文 DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation 由 Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan 发布。
- DiNAT (来自 SHI Labs) 伴随论文 Dilated Neighborhood Attention Transformer 由 Ali Hassani and Humphrey Shi 发布。
- DINOv2 (来自 Meta AI) 伴随论文 DINOv2: Learning Robust Visual Features without Supervision 由 Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick Labatut, Armand Joulin, Piotr Bojanowski 发布。
- DistilBERT (来自 HuggingFace), 伴随论文 DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter 由 Victor Sanh, Lysandre Debut and Thomas Wolf 发布。 同样的方法也应用于压缩 GPT-2 到 DistilGPT2, RoBERTa 到 DistilRoBERTa, Multilingual BERT 到 DistilmBERT 和德语版 DistilBERT。
- DiT (来自 Microsoft Research) 伴随论文 DiT: Self-supervised Pre-training for Document Image Transformer 由 Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei 发布。
- Donut (来自 NAVER) 伴随论文 OCR-free Document Understanding Transformer 由 Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park 发布。
- DPR (来自 Facebook) 伴随论文 Dense Passage Retrieval for Open-Domain Question Answering 由 Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih 发布。
- DPT (来自 Intel Labs) 伴随论文 Vision Transformers for Dense Prediction 由 René Ranftl, Alexey Bochkovskiy, Vladlen Koltun 发布。
- EfficientFormer (来自 Snap Research) 伴随论文 EfficientFormer: Vision Transformers at MobileNetSpeed 由 Yanyu Li, Geng Yuan, Yang Wen, Ju Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, Jian Ren 发布。
- EfficientNet (from Google Brain) released with the paper EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks by Mingxing Tan, Quoc V. Le.
- ELECTRA (来自 Google Research/Stanford University) 伴随论文 ELECTRA: Pre-training text encoders as discriminators rather than generators 由 Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning 发布。
- EnCodec (来自 Meta AI) 伴随论文 High Fidelity Neural Audio Compression 由 Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi 发布。
- EncoderDecoder (来自 Google Research) 伴随论文 Leveraging Pre-trained Checkpoints for Sequence Generation Tasks 由 Sascha Rothe, Shashi Narayan, Aliaksei Severyn 发布。
- ERNIE (来自 Baidu) 伴随论文 ERNIE: Enhanced Representation through Knowledge Integration by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu 发布。
- ErnieM (来自 Baidu) 伴随论文 ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora 由 Xuan Ouyang, Shuohuan Wang, Chao Pang, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang 发布。
- ESM (from Meta AI) are transformer protein language models. ESM-1b was released with the paper Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. ESM-1v was released with the paper Language models enable zero-shot prediction of the effects of mutations on protein function by Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu and Alexander Rives. ESM-2 was released with the paper Language models of protein sequences at the scale of evolution enable accurate structure prediction by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives.
- Falcon (from Technology Innovation Institute) by Almazrouei, Ebtesam and Alobeidli, Hamza and Alshamsi, Abdulaziz and Cappelli, Alessandro and Cojocaru, Ruxandra and Debbah, Merouane and Goffinet, Etienne and Heslow, Daniel and Launay, Julien and Malartic, Quentin and Noune, Badreddine and Pannier, Baptiste and Penedo, Guilherme.
- FastSpeech2Conformer (来自 ESPnet and Microsoft Research) 伴随论文 Fastspeech 2: Fast And High-quality End-to-End Text To Speech 由 Pengcheng Guo, Florian Boyer, Xuankai Chang, Tomoki Hayashi, Yosuke Higuchi, Hirofumi Inaguma, Naoyuki Kamo, Chenda Li, Daniel Garcia-Romero, Jiatong Shi, Jing Shi, Shinji Watanabe, Kun Wei, Wangyou Zhang, and Yuekai Zhang 发布。
- FLAN-T5 (from Google AI) released in the repository google-research/t5x by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei
- FLAN-UL2 (from Google AI) released in the repository google-research/t5x by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei
- FlauBERT (来自 CNRS) 伴随论文 FlauBERT: Unsupervised Language Model Pre-training for French 由 Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab 发布。
- FLAVA (来自 Facebook AI) 伴随论文 FLAVA: A Foundational Language And Vision Alignment Model 由 Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela 发布。
- FNet (来自 Google Research) 伴随论文 FNet: Mixing Tokens with Fourier Transforms 由 James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon 发布。
- FocalNet (来自 Microsoft Research) 伴随论文 Focal Modulation Networks 由 Jianwei Yang, Chunyuan Li, Xiyang Dai, Lu Yuan, Jianfeng Gao 发布。
- Funnel Transformer (来自 CMU/Google Brain) 伴随论文 Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing 由 Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le 发布。
- Fuyu (来自 ADEPT) 伴随论文 blog post 由 Rohan Bavishi, Erich Elsen, Curtis Hawthorne, Maxwell Nye, Augustus Odena, Arushi Somani, Sağnak Taşırlar 发布。
- Gemma (来自 Google) 伴随论文 Gemma: Open Models Based on Gemini Technology and Research 由 the Gemma Google team 发布。
- GIT (来自 Microsoft Research) 伴随论文 GIT: A Generative Image-to-text Transformer for Vision and Language 由 Jianfeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, Lijuan Wang 发布。
- GLPN (来自 KAIST) 伴随论文 Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth 由 Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim 发布。
- GPT (来自 OpenAI) 伴随论文 Improving Language Understanding by Generative Pre-Training 由 Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever 发布。
- GPT Neo (来自 EleutherAI) 随仓库 EleutherAI/gpt-neo 发布。作者为 Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy 发布。
- GPT NeoX (from EleutherAI) released with the paper GPT-NeoX-20B: An Open-Source Autoregressive Language Model by Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach
- GPT NeoX Japanese (来自 ABEJA) 由 Shinya Otani, Takayoshi Makabe, Anuj Arora, Kyo Hattori。
- GPT-2 (来自 OpenAI) 伴随论文 Language Models are Unsupervised Multitask Learners 由 Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei and Ilya Sutskever 发布。
- GPT-J (来自 EleutherAI) 伴随论文 kingoflolz/mesh-transformer-jax 由 Ben Wang and Aran Komatsuzaki 发布。
- GPT-Sw3 (from AI-Sweden) released with the paper Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish by Ariel Ekgren, Amaru Cuba Gyllensten, Evangelia Gogoulou, Alice Heiman, Severine Verlinden, Joey Öhman, Fredrik Carlsson, Magnus Sahlgren.
- GPTBigCode (来自 BigCode) 伴随论文 SantaCoder: don't reach for the stars! 由 Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo García del Río, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo, Ian Yu, Paulo Villegas, Marco Zocca, Sourab Mangrulkar, David Lansky, Huu Nguyen, Danish Contractor, Luis Villa, Jia Li, Dzmitry Bahdanau, Yacine Jernite, Sean Hughes, Daniel Fried, Arjun Guha, Harm de Vries, Leandro von Werra 发布。
- GPTSAN-japanese released in the repository tanreinama/GPTSAN by 坂本俊之(tanreinama).
- Graphormer (from Microsoft) released with the paper Do Transformers Really Perform Bad for Graph Representation? by Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu.
- GroupViT (来自 UCSD, NVIDIA) 伴随论文 GroupViT: Semantic Segmentation Emerges from Text Supervision 由 Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang 发布。
- HerBERT (来自 Allegro.pl, AGH University of Science and Technology) 伴随论文 KLEJ: Comprehensive Benchmark for Polish Language Understanding 由 Piotr Rybak, Robert Mroczkowski, Janusz Tracz, Ireneusz Gawlik 发布。
- Hubert (来自 Facebook) 伴随论文 HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units 由 Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed 发布。
- I-BERT (来自 Berkeley) 伴随论文 I-BERT: Integer-only BERT Quantization 由 Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer 发布。
- IDEFICS (from HuggingFace) released with the paper OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents by Hugo Laurençon, Lucile Saulnier, Léo Tronchon, Stas Bekman, Amanpreet Singh, Anton Lozhkov, Thomas Wang, Siddharth Karamcheti, Alexander M. Rush, Douwe Kiela, Matthieu Cord, Victor Sanh.
- ImageGPT (来自 OpenAI) 伴随论文 Generative Pretraining from Pixels 由 Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever 发布。
- Informer (from Beihang University, UC Berkeley, Rutgers University, SEDD Company) released with the paper Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting by Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang.
- InstructBLIP (来自 Salesforce) 伴随论文 InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning 由 Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi 发布。
- Jukebox (from OpenAI) released with the paper Jukebox: A Generative Model for Music by Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever.
- KOSMOS-2 (from Microsoft Research Asia) released with the paper Kosmos-2: Grounding Multimodal Large Language Models to the World by Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei.
- LayoutLM (来自 Microsoft Research Asia) 伴随论文 LayoutLM: Pre-training of Text and Layout for Document Image Understanding 由 Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou 发布。
- LayoutLMv2 (来自 Microsoft Research Asia) 伴随论文 LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding 由 Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou 发布。
- LayoutLMv3 (来自 Microsoft Research Asia) 伴随论文 LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking 由 Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei 发布。
- LayoutXLM (来自 Microsoft Research Asia) 伴随论文 LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding 由 Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei 发布。
- LED (来自 AllenAI) 伴随论文 Longformer: The Long-Document Transformer 由 Iz Beltagy, Matthew E. Peters, Arman Cohan 发布。
- LeViT (来自 Meta AI) 伴随论文 LeViT: A Vision Transformer in ConvNet's Clothing for Faster Inference 由 Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervé Jégou, Matthijs Douze 发布。
- LiLT (来自 South China University of Technology) 伴随论文 LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding 由 Jiapeng Wang, Lianwen Jin, Kai Ding 发布。
- LLaMA (来自 The FAIR team of Meta AI) 伴随论文 LLaMA: Open and Efficient Foundation Language Models 由 Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample 发布。
- Llama2 (来自 The FAIR team of Meta AI) 伴随论文 Llama2: Open Foundation and Fine-Tuned Chat Models 由 Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushka rMishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing EllenTan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, Thomas Scialom. 发布。
- LLaVa (来自 Microsoft Research & University of Wisconsin-Madison) 伴随论文 Visual Instruction Tuning 由 Haotian Liu, Chunyuan Li, Yuheng Li and Yong Jae Lee 发布。
- Longformer (来自 AllenAI) 伴随论文 Longformer: The Long-Document Transformer 由 Iz Beltagy, Matthew E. Peters, Arman Cohan 发布。
- LongT5 (来自 Google AI) released 伴随论文 LongT5: Efficient Text-To-Text Transformer for Long Sequences 由 Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang 发布。
- LUKE (来自 Studio Ousia) 伴随论文 LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention 由 Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto 发布。
- LXMERT (来自 UNC Chapel Hill) 伴随论文 LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering 由 Hao Tan and Mohit Bansal 发布。
- M-CTC-T (来自 Facebook) 伴随论文 Pseudo-Labeling For Massively Multilingual Speech Recognition 由 Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, and Ronan Collobert 发布。
- M2M100 (来自 Facebook) 伴随论文 Beyond English-Centric Multilingual Machine Translation 由 Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin 发布。
- MADLAD-400 (from Google) released with the paper MADLAD-400: A Multilingual And Document-Level Large Audited Dataset by Sneha Kudugunta, Isaac Caswell, Biao Zhang, Xavier Garcia, Christopher A. Choquette-Choo, Katherine Lee, Derrick Xin, Aditya Kusupati, Romi Stella, Ankur Bapna, Orhan Firat.
- Mamba (来自 Albert Gu and Tri Dao) 伴随论文 Mamba: Linear-Time Sequence Modeling with Selective State Spaces 由 Albert Gu and Tri Dao 发布。
- MarianMT 用 OPUS 数据训练的机器翻译模型由 Jörg Tiedemann 发布。Marian Framework 由微软翻译团队开发。
- MarkupLM (来自 Microsoft Research Asia) 伴随论文 MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding 由 Junlong Li, Yiheng Xu, Lei Cui, Furu Wei 发布。
- Mask2Former (来自 FAIR and UIUC) 伴随论文 Masked-attention Mask Transformer for Universal Image Segmentation 由 Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, Rohit Girdhar 发布。
- MaskFormer (from Meta and UIUC) released with the paper Per-Pixel Classification is Not All You Need for Semantic Segmentation by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov
- MatCha (来自 Google AI) 伴随论文 MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering 由 Fangyu Liu, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Yasemin Altun, Nigel Collier, Julian Martin Eisenschlos 发布。
- mBART (来自 Facebook) 伴随论文 Multilingual Denoising Pre-training for Neural Machine Translation 由 Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer 发布。
- mBART-50 (来自 Facebook) 伴随论文 Multilingual Translation with Extensible Multilingual Pretraining and Finetuning 由 Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan 发布。
- MEGA (来自 Facebook) 伴随论文 Mega: Moving Average Equipped Gated Attention 由 Xuezhe Ma, Chunting Zhou, Xiang Kong, Junxian He, Liangke Gui, Graham Neubig, Jonathan May, and Luke Zettlemoyer 发布。
- Megatron-BERT (来自 NVIDIA) 伴随论文 Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism 由 Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro 发布。
- Megatron-GPT2 (来自 NVIDIA) 伴随论文 Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism 由 Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro 发布。
- MGP-STR (来自 Alibaba Research) 伴随论文 Multi-Granularity Prediction for Scene Text Recognition 由 Peng Wang, Cheng Da, and Cong Yao 发布。
- Mistral (from Mistral AI) by The Mistral AI team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed..
- Mixtral (from Mistral AI) by The Mistral AI team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed.
- mLUKE (来自 Studio Ousia) 伴随论文 mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models 由 Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka 发布。
- MMS (来自 Facebook) 伴随论文 Scaling Speech Technology to 1,000+ Languages 由 Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli 发布。
- MobileBERT (来自 CMU/Google Brain) 伴随论文 MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices 由 Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou 发布。
- MobileNetV1 (来自 Google Inc.) 伴随论文 MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications 由 Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam 发布。
- MobileNetV2 (来自 Google Inc.) 伴随论文 MobileNetV2: Inverted Residuals and Linear Bottlenecks 由 Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen 发布。
- MobileViT (来自 Apple) 伴随论文 MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer 由 Sachin Mehta and Mohammad Rastegari 发布。
- MobileViTV2 (来自 Apple) 伴随论文 Separable Self-attention for Mobile Vision Transformers 由 Sachin Mehta and Mohammad Rastegari 发布。
- MPNet (来自 Microsoft Research) 伴随论文 MPNet: Masked and Permuted Pre-training for Language Understanding 由 Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu 发布。
- MPT (来自 MosaiML) 伴随论文 llm-foundry 由 the MosaicML NLP Team 发布。
- MRA (来自 the University of Wisconsin - Madison) 伴随论文 Multi Resolution Analysis (MRA) 由 Zhanpeng Zeng, Sourav Pal, Jeffery Kline, Glenn M Fung, Vikas Singh 发布。
- MT5 (来自 Google AI) 伴随论文 mT5: A massively multilingual pre-trained text-to-text transformer 由 Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel 发布。
- MusicGen (from Meta) released with the paper Simple and Controllable Music Generation by Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi and Alexandre Défossez.
- MusicGen Melody (from Meta) released with the paper Simple and Controllable Music Generation by Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi and Alexandre Défossez.
- MVP (来自 中国人民大学 AI Box) 伴随论文 MVP: Multi-task Supervised Pre-training for Natural Language Generation 由 Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen 发布。
- NAT (来自 SHI Labs) 伴随论文 Neighborhood Attention Transformer 由 Ali Hassani, Steven Walton, Jiachen Li, Shen Li, and Humphrey Shi 发布。
- Nezha (来自华为诺亚方舟实验室) 伴随论文 NEZHA: Neural Contextualized Representation for Chinese Language Understanding 由 Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao, Yasheng Wang, Jiashu Lin, Xin Jiang, Xiao Chen and Qun Liu 发布。
- NLLB (来自 Meta) 伴随论文 No Language Left Behind: Scaling Human-Centered Machine Translation 由 the NLLB team 发布。
- NLLB-MOE (来自 Meta) 伴随论文 No Language Left Behind: Scaling Human-Centered Machine Translation 由 the NLLB team 发布。
- Nougat (来自 Meta AI) 伴随论文 Nougat: Neural Optical Understanding for Academic Documents 由 Lukas Blecher, Guillem Cucurull, Thomas Scialom, Robert Stojnic 发布。
- Nyströmformer (来自 the University of Wisconsin - Madison) 伴随论文 Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention 由 Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh 发布。
- OneFormer (来自 SHI Labs) 伴随论文 OneFormer: One Transformer to Rule Universal Image Segmentation 由 Jitesh Jain, Jiachen Li, MangTik Chiu, Ali Hassani, Nikita Orlov, Humphrey Shi 发布。
- OpenLlama (来自 s-JoL) 由 GitHub (现已删除).
- OPT (来自 Meta AI) 伴随论文 OPT: Open Pre-trained Transformer Language Models 由 Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al 发布。
- OWL-ViT (来自 Google AI) 伴随论文 Simple Open-Vocabulary Object Detection with Vision Transformers 由 Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby 发布。
- OWLv2 (来自 Google AI) 伴随论文 Scaling Open-Vocabulary Object Detection 由 Matthias Minderer, Alexey Gritsenko, Neil Houlsby 发布。
- PatchTSMixer (来自 IBM Research) 伴随论文 TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series Forecasting 由 Vijay Ekambaram, Arindam Jati, Nam Nguyen, Phanwadee Sinthong, Jayant Kalagnanam 发布。
- PatchTST (来自 IBM) 伴随论文 A Time Series is Worth 64 Words: Long-term Forecasting with Transformers 由 Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, Jayant Kalagnanam 发布。
- Pegasus (来自 Google) 伴随论文 PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization 由 Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu 发布。
- PEGASUS-X (来自 Google) 伴随论文 Investigating Efficiently Extending Transformers for Long Input Summarization 由 Jason Phang, Yao Zhao, Peter J. Liu 发布。
- Perceiver IO (来自 Deepmind) 伴随论文 Perceiver IO: A General Architecture for Structured Inputs & Outputs 由 Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira 发布。
- Persimmon (来自 ADEPT) 伴随论文 blog post 由 Erich Elsen, Augustus Odena, Maxwell Nye, Sağnak Taşırlar, Tri Dao, Curtis Hawthorne, Deepak Moparthi, Arushi Somani 发布。
- Phi (from Microsoft) released with the papers - Textbooks Are All You Need by Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee and Yuanzhi Li, Textbooks Are All You Need II: phi-1.5 technical report by Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar and Yin Tat Lee.
- PhoBERT (来自 VinAI Research) 伴随论文 PhoBERT: Pre-trained language models for Vietnamese 由 Dat Quoc Nguyen and Anh Tuan Nguyen 发布。
- Pix2Struct (来自 Google) 伴随论文 Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding 由 Kenton Lee, Mandar Joshi, Iulia Turc, Hexiang Hu, Fangyu Liu, Julian Eisenschlos, Urvashi Khandelwal, Peter Shaw, Ming-Wei Chang, Kristina Toutanova 发布。
- PLBart (来自 UCLA NLP) 伴随论文 Unified Pre-training for Program Understanding and Generation 由 Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang 发布。
- PoolFormer (来自 Sea AI Labs) 伴随论文 MetaFormer is Actually What You Need for Vision 由 Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng 发布。
- Pop2Piano released with the paper Pop2Piano : Pop Audio-based Piano Cover Generation by Jongho Choi, Kyogu Lee.
- ProphetNet (来自 Microsoft Research) 伴随论文 ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training 由 Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou 发布。
- PVT (来自 Nanjing University, The University of Hong Kong etc.) 伴随论文 Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions 由 Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao 发布。
- PVTv2 (来自 Shanghai AI Laboratory, Nanjing University, The University of Hong Kong etc.) 伴随论文 PVT v2: Improved Baselines with Pyramid Vision Transformer 由 Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao 发布。
- QDQBert (来自 NVIDIA) 伴随论文 Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation 由 Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius 发布。
- Qwen2 (来自 the Qwen team, Alibaba Group) 伴随论文 Qwen Technical Report 由 Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan, Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu, Benfeng Xu, Jin Xu, An Yang, Hao Yang, Jian Yang, Shusheng Yang, Yang Yao, Bowen Yu, Hongyi Yuan, Zheng Yuan, Jianwei Zhang, Xingxuan Zhang, Yichang Zhang, Zhenru Zhang, Chang Zhou, Jingren Zhou, Xiaohuan Zhou and Tianhang Zhu 发布。
- RAG (来自 Facebook) 伴随论文 Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks 由 Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela 发布。
- REALM (来自 Google Research) 伴随论文 REALM: Retrieval-Augmented Language Model Pre-Training 由 Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang 发布。
- Reformer (来自 Google Research) 伴随论文 Reformer: The Efficient Transformer 由 Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya 发布。
- RegNet (from META Research) released with the paper Designing Network Design Space by Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár.
- RemBERT (来自 Google Research) 伴随论文 Rethinking embedding coupling in pre-trained language models 由 Hyung Won Chung, Thibault Févry, Henry Tsai, M. Johnson, Sebastian Ruder 发布。
- ResNet (from Microsoft Research) released with the paper Deep Residual Learning for Image Recognition by Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun.
- RoBERTa (来自 Facebook), 伴随论文 Robustly Optimized BERT Pretraining Approach 由 Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov 发布。
- RoBERTa-PreLayerNorm (来自 Facebook) 伴随论文 fairseq: A Fast, Extensible Toolkit for Sequence Modeling 由 Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, Michael Auli 发布。
- RoCBert (来自 WeChatAI), 伴随论文 RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining 由 HuiSu, WeiweiShi, XiaoyuShen, XiaoZhou, TuoJi, JiaruiFang, JieZhou 发布。
- RoFormer (来自 ZhuiyiTechnology), 伴随论文 RoFormer: Enhanced Transformer with Rotary Position Embedding 由 Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu 发布。
- RWKV (来自 Bo Peng) 伴随论文 this repo 由 Bo Peng 发布。
- SeamlessM4T (from Meta AI) released with the paper SeamlessM4T — Massively Multilingual & Multimodal Machine Translation by the Seamless Communication team.
- SeamlessM4Tv2 (from Meta AI) released with the paper Seamless: Multilingual Expressive and Streaming Speech Translation by the Seamless Communication team.
- SegFormer (来自 NVIDIA) 伴随论文 SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers 由 Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo 发布。
- SegGPT (来自 Beijing Academy of Artificial Intelligence (BAAI) 伴随论文 SegGPT: Segmenting Everything In Context 由 Xinlong Wang, Xiaosong Zhang, Yue Cao, Wen Wang, Chunhua Shen, Tiejun Huang 发布。
- Segment Anything (来自 Meta AI) 伴随论文 Segment Anything 由 Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick 发布。
- SEW (来自 ASAPP) 伴随论文 Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition 由 Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi 发布。
- SEW-D (来自 ASAPP) 伴随论文 Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition 由 Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi 发布。
- SigLIP (来自 Google AI) 伴随论文 Sigmoid Loss for Language Image Pre-Training 由 Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, Lucas Beyer 发布。
- SpeechT5 (来自 Microsoft Research) 伴随论文 SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing 由 Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei 发布。
- SpeechToTextTransformer (来自 Facebook), 伴随论文 fairseq S2T: Fast Speech-to-Text Modeling with fairseq 由 Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino 发布。
- SpeechToTextTransformer2 (来自 Facebook) 伴随论文 Large-Scale Self- and Semi-Supervised Learning for Speech Translation 由 Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau 发布。
- Splinter (来自 Tel Aviv University) 伴随论文 Few-Shot Question Answering by Pretraining Span Selection 由 Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy 发布。
- SqueezeBERT (来自 Berkeley) 伴随论文 SqueezeBERT: What can computer vision teach NLP about efficient neural networks? 由 Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer 发布。
- StableLm (from Stability AI) released with the paper StableLM 3B 4E1T (Technical Report) by Jonathan Tow, Marco Bellagente, Dakota Mahan, Carlos Riquelme Ruiz, Duy Phung, Maksym Zhuravinskyi, Nathan Cooper, Nikhil Pinnaparaju, Reshinth Adithyan, and James Baicoianu.
- Starcoder2 (from BigCode team) released with the paper StarCoder 2 and The Stack v2: The Next Generation by Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, Tianyang Liu, Max Tian, Denis Kocetkov, Arthur Zucker, Younes Belkada, Zijian Wang, Qian Liu, Dmitry Abulkhanov, Indraneil Paul, Zhuang Li, Wen-Ding Li, Megan Risdal, Jia Li, Jian Zhu, Terry Yue Zhuo, Evgenii Zheltonozhskii, Nii Osae Osae Dade, Wenhao Yu, Lucas Krauß, Naman Jain, Yixuan Su, Xuanli He, Manan Dey, Edoardo Abati, Yekun Chai, Niklas Muennighoff, Xiangru Tang, Muhtasham Oblokulov, Christopher Akiki, Marc Marone, Chenghao Mou, Mayank Mishra, Alex Gu, Binyuan Hui, Tri Dao, Armel Zebaze, Olivier Dehaene, Nicolas Patry, Canwen Xu, Julian McAuley, Han Hu, Torsten Scholak, Sebastien Paquet, Jennifer Robinson, Carolyn Jane Anderson, Nicolas Chapados, Mostofa Patwary, Nima Tajbakhsh, Yacine Jernite, Carlos Muñoz Ferrandis, Lingming Zhang, Sean Hughes, Thomas Wolf, Arjun Guha, Leandro von Werra, and Harm de Vries.
- SwiftFormer (来自 MBZUAI) 伴随论文 SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications 由 Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan 发布。
- Swin Transformer (来自 Microsoft) 伴随论文 Swin Transformer: Hierarchical Vision Transformer using Shifted Windows 由 Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo 发布。
- Swin Transformer V2 (来自 Microsoft) 伴随论文 Swin Transformer V2: Scaling Up Capacity and Resolution 由 Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo 发布。
- Swin2SR (来自 University of Würzburg) 伴随论文 Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration 由 Marcos V. Conde, Ui-Jin Choi, Maxime Burchi, Radu Timofte 发布。
- SwitchTransformers (from Google) released with the paper Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity by William Fedus, Barret Zoph, Noam Shazeer.
- T5 (来自 Google AI) 伴随论文 Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer 由 Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu 发布。
- T5v1.1 (来自 Google AI) 伴随论文 google-research/text-to-text-transfer-transformer 由 Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu 发布。
- Table Transformer (来自 Microsoft Research) 伴随论文 PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents 由 Brandon Smock, Rohith Pesala, Robin Abraham 发布。
- TAPAS (来自 Google AI) 伴随论文 TAPAS: Weakly Supervised Table Parsing via Pre-training 由 Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos 发布。
- TAPEX (来自 Microsoft Research) 伴随论文 TAPEX: Table Pre-training via Learning a Neural SQL Executor 由 Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou 发布。
- Time Series Transformer (from HuggingFace).
- TimeSformer (from Facebook) released with the paper Is Space-Time Attention All You Need for Video Understanding? by Gedas Bertasius, Heng Wang, Lorenzo Torresani.
- Trajectory Transformer (from the University of California at Berkeley) released with the paper Offline Reinforcement Learning as One Big Sequence Modeling Problem by Michael Janner, Qiyang Li, Sergey Levine
- Transformer-XL (来自 Google/CMU) 伴随论文 Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context 由 Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov 发布。
- TrOCR (来自 Microsoft) 伴随论文 TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models 由 Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei 发布。
- TVLT (来自 UNC Chapel Hill) 伴随论文 TVLT: Textless Vision-Language Transformer 由 Zineng Tang, Jaemin Cho, Yixin Nie, Mohit Bansal 发布。
- TVP (来自 Intel) 伴随论文 Text-Visual Prompting for Efficient 2D Temporal Video Grounding 由 Yimeng Zhang, Xin Chen, Jinghan Jia, Sijia Liu, Ke Ding 发布.
- UDOP (来自 Microsoft Research) 伴随论文 Unifying Vision, Text, and Layout for Universal Document Processing 由 Zineng Tang, Ziyi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, Mohit Bansal 发布。
- UL2 (from Google Research) released with the paper Unifying Language Learning Paradigms by Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler
- UMT5 (来自 Google Research) 伴随论文 UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining 由 Hyung Won Chung, Xavier Garcia, Adam Roberts, Yi Tay, Orhan Firat, Sharan Narang, Noah Constant 发布。
- UniSpeech (来自 Microsoft Research) 伴随论文 UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data 由 Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang 发布。
- UniSpeechSat (来自 Microsoft Research) 伴随论文 UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING 由 Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu 发布。
- UnivNet (from Kakao Corporation) released with the paper UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation by Won Jang, Dan Lim, Jaesam Yoon, Bongwan Kim, and Juntae Kim.
- UPerNet (来自 Peking University) 伴随论文 Unified Perceptual Parsing for Scene Understanding 由 Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, Jian Sun 发布。
- VAN (来自 Tsinghua University and Nankai University) 伴随论文 Visual Attention Network 由 Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu 发布。
- VideoMAE (来自 Multimedia Computing Group, Nanjing University) 伴随论文 VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training 由 Zhan Tong, Yibing Song, Jue Wang, Limin Wang 发布。
- ViLT (来自 NAVER AI Lab/Kakao Enterprise/Kakao Brain) 伴随论文 ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision 由 Wonjae Kim, Bokyung Son, Ildoo Kim 发布。
- VipLlava (来自 University of Wisconsin–Madison) 伴随论文 Making Large Multimodal Models Understand Arbitrary Visual Prompts 由 Mu Cai, Haotian Liu, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Dennis Park, Yong Jae Lee 发布。
- Vision Transformer (ViT) (来自 Google AI) 伴随论文 An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale 由 Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby 发布。
- VisualBERT (来自 UCLA NLP) 伴随论文 VisualBERT: A Simple and Performant Baseline for Vision and Language 由 Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang 发布。
- ViT Hybrid (来自 Google AI) 伴随论文 An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale 由 Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby 发布。
- VitDet (来自 Meta AI) 伴随论文 Exploring Plain Vision Transformer Backbones for Object Detection 由 Yanghao Li, Hanzi Mao, Ross Girshick, Kaiming He 发布。
- ViTMAE (来自 Meta AI) 伴随论文 Masked Autoencoders Are Scalable Vision Learners 由 Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick 发布。
- ViTMatte (来自 HUST-VL) 伴随论文 ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers 由 Jingfeng Yao, Xinggang Wang, Shusheng Yang, Baoyuan Wang 发布。
- ViTMSN (来自 Meta AI) 伴随论文 Masked Siamese Networks for Label-Efficient Learning by Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas 发布.
- VITS (来自 Kakao Enterprise) 伴随论文 Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech 由 Jaehyeon Kim, Jungil Kong, Juhee Son 发布。
- ViViT (来自 Google Research) released with the paper ViViT: A Video Vision Transformer 由 Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lučić, Cordelia Schmid.
- Wav2Vec2 (来自 Facebook AI) 伴随论文 wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations 由 Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli 发布。
- Wav2Vec2-BERT (from Meta AI) released with the paper Seamless: Multilingual Expressive and Streaming Speech Translation by the Seamless Communication team.
- Wav2Vec2-Conformer (来自 Facebook AI) 伴随论文 FAIRSEQ S2T: Fast Speech-to-Text Modeling with FAIRSEQ 由 Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Sravya Popuri, Dmytro Okhonko, Juan Pino 发布。
- Wav2Vec2Phoneme (来自 Facebook AI) 伴随论文 Simple and Effective Zero-shot Cross-lingual Phoneme Recognition 由 Qiantong Xu, Alexei Baevski, Michael Auli 发布。
- WavLM (from Microsoft Research) released with the paper WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei.
- Whisper (来自 OpenAI) 伴随论文 Robust Speech Recognition via Large-Scale Weak Supervision 由 Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever 发布。
- X-CLIP (来自 Microsoft Research) 伴随论文 Expanding Language-Image Pretrained Models for General Video Recognition 由 Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, Haibin Ling 发布。
- X-MOD (来自 Meta AI) 伴随论文 Lifting the Curse of Multilinguality by Pre-training Modular Transformers 由 Jonas Pfeiffer, Naman Goyal, Xi Lin, Xian Li, James Cross, Sebastian Riedel, Mikel Artetxe 发布。
- XGLM (From Facebook AI) released with the paper Few-shot Learning with Multilingual Language Models by Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li.
- XLM (来自 Facebook) 伴随论文 Cross-lingual Language Model Pretraining 由 Guillaume Lample and Alexis Conneau 发布。
- XLM-ProphetNet (来自 Microsoft Research) 伴随论文 ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training 由 Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou 发布。
- XLM-RoBERTa (来自 Facebook AI), 伴随论文 Unsupervised Cross-lingual Representation Learning at Scale 由 Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov 发布。
- XLM-RoBERTa-XL (来自 Facebook AI) 伴随论文 Larger-Scale Transformers for Multilingual Masked Language Modeling 由 Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau 发布。
- XLM-V (来自 Meta AI) 伴随论文 XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models 由 Davis Liang, Hila Gonen, Yuning Mao, Rui Hou, Naman Goyal, Marjan Ghazvininejad, Luke Zettlemoyer, Madian Khabsa 发布。
- XLNet (来自 Google/CMU) 伴随论文 XLNet: Generalized Autoregressive Pretraining for Language Understanding 由 Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le 发布。
- XLS-R (来自 Facebook AI) 伴随论文 XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale 由 Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli 发布。
- XLSR-Wav2Vec2 (来自 Facebook AI) 伴随论文 Unsupervised Cross-Lingual Representation Learning For Speech Recognition 由 Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli 发布。
- YOLOS (来自 Huazhong University of Science & Technology) 伴随论文 You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection 由 Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu 发布。
- YOSO (来自 the University of Wisconsin - Madison) 伴随论文 You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling 由 Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh 发布。
- 想要贡献新的模型?我们这里有一份详细指引和模板来引导你添加新的模型。你可以在
templates
目录中找到他们。记得查看 贡献指南 并在开始写 PR 前联系维护人员或开一个新的 issue 来获得反馈。
要检查某个模型是否已有 Flax、PyTorch 或 TensorFlow 的实现,或其是否在 🤗 Tokenizers 库中有对应词符化器(tokenizer),敬请参阅此表。
这些实现均已于多个数据集测试(请参看用例脚本)并应于原版实现表现相当。你可以在用例文档的此节中了解表现的细节。
章节 | 描述 |
---|---|
文档 | 完整的 API 文档和教程 |
任务总结 | 🤗 Transformers 支持的任务 |
预处理教程 | 使用 Tokenizer 来为模型准备数据 |
训练和微调 | 在 PyTorch/TensorFlow 的训练循环或 Trainer API 中使用 🤗 Transformers 提供的模型 |
快速上手:微调和用例脚本 | 为各种任务提供的用例脚本 |
模型分享和上传 | 和社区上传和分享你微调的模型 |
迁移 | 从 pytorch-transformers 或 pytorch-pretrained-bert 迁移到 🤗 Transformers |
我们已将此库的论文正式发表,如果你使用了 🤗 Transformers 库,请引用:
@inproceedings{wolf-etal-2020-transformers,
title = "Transformers: State-of-the-Art Natural Language Processing",
author = "Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rémi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush",
booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
month = oct,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.emnlp-demos.6",
pages = "38--45"
}