Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update training #4

Merged
merged 76 commits into from
Apr 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
e68ff30
[`quality`] update quality check to make sure we check imports 😈 (#2…
ArthurZucker Mar 22, 2024
3479161
Fix type hint for train_dataset param of Trainer.__init__() to allow …
stevemadere Mar 22, 2024
aa17cf9
Enable AMD docker build CI (#29803)
IlyasMoutawwakil Mar 22, 2024
13b2370
Correct llava mask & fix missing setter for `vocab_size` (#29389)
fxmarty Mar 22, 2024
e85654f
rm input dtype change in CPU (#28631)
jiqing-feng Mar 22, 2024
34e07f4
Generate: remove unused attributes in `AssistedCandidateGenerator` (#…
gante Mar 22, 2024
884b221
replaced concatenation to f-strings to improve readability and unify …
igeni Mar 22, 2024
2e7cb46
[`cleanup`] vestiges of causal mask (#29806)
ArthurZucker Mar 22, 2024
7e1413d
Complete security policy with mentions of remote code (#29707)
LysandreJik Mar 22, 2024
c5f0288
[`SuperPoint`] Fix doc example (#29816)
amyeroberts Mar 22, 2024
dafe370
[DOCS] Fix typo for llava next docs (#29829)
aliencaocao Mar 23, 2024
76a33a1
model_summary.md - Restore link to Harvard's Annotated Transformer. (…
gamepad-coder Mar 24, 2024
39114c0
Remove static pretrained maps from the library's internals (#29112)
LysandreJik Mar 25, 2024
afe73ae
Fix the behavior of collecting 'num_input_tokens_seen' (#29099)
YouliangHUANG Mar 25, 2024
8e9a220
Populate torch_dtype from model to pipeline (#28940)
B-Step62 Mar 25, 2024
00a09ed
fix 😭
ArthurZucker Mar 25, 2024
e3e16dd
[`revert commit`] revert 00a09ed448082da3d6d35fb23a37b7d04f7b4dcd
ArthurZucker Mar 25, 2024
7eb3ba8
remove quotes in code example (#29812)
johko Mar 25, 2024
b5a6d6e
Add warnings if training args differ from checkpoint trainer state (#…
jonflynng Mar 26, 2024
b32bf85
Replace 'decord' with 'av' in VideoClassificationPipeline (#29747)
Tyx-main Mar 26, 2024
de81a67
Fix header in IFE task guide (#29859)
merveenoyan Mar 26, 2024
b9ceb03
[docs] Indent ordered list in add_new_model.md (#29796)
windsonsea Mar 26, 2024
998b5bb
Allow `bos_token_id is None` during the generation with `inputs_embed…
LZHgrla Mar 26, 2024
ef60995
Add `cosine_with_min_lr` scheduler in Trainer (#29341)
liuyanyi Mar 26, 2024
07d7952
Disable AMD memory benchmarks (#29871)
IlyasMoutawwakil Mar 26, 2024
f01e160
Set custom_container in build docs workflows (#29855)
Wauplin Mar 26, 2024
8e08aca
Support `num_attention_heads` != `num_key_value_heads` in Flax Llama …
bminixhofer Mar 27, 2024
1c39974
Add Qwen2MoE (#29377)
bozheng-hit Mar 27, 2024
cefb819
Mamba `slow_forward` gradient fix (#29563)
vasqu Mar 27, 2024
a81cf9e
Fix 29807, sinusoidal positional encodings overwritten by post_init()…
hovnatan Mar 27, 2024
4d8427f
Reimplement "Automatic safetensors conversion when lacking these file…
LysandreJik Mar 27, 2024
31c575b
fix fuyu device_map compatibility (#29880)
SunMarc Mar 27, 2024
0efcf32
Move `eos_token_id` to stopping criteria (#29459)
zucchini-nlp Mar 27, 2024
7576974
add Cambricon MLUs support (#29627)
huismiling Mar 27, 2024
a25037b
MixtralSparseMoeBlock: add gate jitter (#29865)
lorenzoverardo Mar 27, 2024
d9dc993
Fix typo in T5Block error message (#29881)
Mingosnake Mar 28, 2024
b256516
[`make fix-copies`] update and help (#29924)
ArthurZucker Mar 28, 2024
543889f
[`GptNeox`] don't gather on pkv when using the trainer (#29892)
ArthurZucker Mar 28, 2024
3a7e683
[`pipeline`]. Zero shot add doc warning (#29845)
ArthurZucker Mar 28, 2024
22d159d
Adding Flash Attention 2 Support for GPT2 (#29226)
EduardoPach Mar 28, 2024
7c19faf
[doc] fix some typos and add `xpu` to the testing documentation (#29894)
faaany Mar 28, 2024
248d5d2
Tests: replace `torch.testing.assert_allclose` by `torch.testing.asse…
gante Mar 28, 2024
c9d2e85
Add beam search visualizer to the doc (#29876)
aymeric-roucher Mar 28, 2024
855b95c
Safe import of LRScheduler (#29919)
amyeroberts Mar 28, 2024
aac7099
add functions to inspect model and optimizer status to trainer.py (#2…
CKeibel Mar 28, 2024
441de62
RoPE models: add numerical sanity-check test for RoPE scaling (#29808)
gante Mar 28, 2024
e677479
[`Mamba`] from pretrained issue with `self.embeddings` (#29851)
ArthurZucker Mar 28, 2024
a2a7f71
[ `TokenizationLlama`] fix the way we convert tokens to strings to ke…
ArthurZucker Mar 28, 2024
4df5b9b
Allow GradientAccumulationPlugin to be configured from AcceleratorCon…
fabianlim Mar 28, 2024
2bbbf1b
[`BC`] Fix BC for other libraries (#29934)
ArthurZucker Mar 28, 2024
e203646
Fix doc issue #29758 in DebertaV2Config class (#29842)
vinayakkgarg Mar 28, 2024
536ea2a
[`LlamaSlowConverter`] Slow to Fast better support (#29797)
ArthurZucker Mar 28, 2024
ba56ed0
Update installs in image classification doc (#29947)
MariaHei Mar 28, 2024
43d17c1
Mark `test_eager_matches_sdpa_generate` flaky for some models (#29479)
ydshieh Mar 29, 2024
5ad7f17
Super tiny fix 12 typos about "with with" (#29926)
fzyzcjy Mar 29, 2024
6fd93fe
Fix rope theta for OpenLlama (#29893)
jla524 Mar 30, 2024
156d30d
Add warning message for `run_qa.py` (#29867)
jla524 Mar 30, 2024
e644b60
fix: get mlflow version from mlflow-skinny (#29918)
Mar 30, 2024
f6701bc
Reset alarm signal when the function is ended (#29706)
coldnight Mar 30, 2024
46d6368
Update model card and link of blog post. (#29928)
bozheng-hit Mar 30, 2024
6e58407
[`BC`] Fix BC for AWQ quant (#29965)
TechxGenus Mar 30, 2024
3b8e293
Rework tests to compare trainer checkpoint args (#29883)
muellerzr Mar 31, 2024
569f6c7
Fix FA2 tests (#29909)
ylacombe Apr 1, 2024
fa2c49b
Fix copies main ci (#29979)
ArthurZucker Apr 1, 2024
e4f5b57
[tests] fix the wrong output in `ImageToTextPipelineTests.test_condit…
faaany Apr 1, 2024
c9f6e5e
Generate: move misplaced test (#29902)
gante Apr 1, 2024
096f304
[docs] Big model loading (#29920)
stevhliu Apr 2, 2024
83b26dd
[`generate`] fix breaking change for patch (#29976)
ArthurZucker Apr 2, 2024
416711c
Fix 29807 sinusoidal positional encodings in Flaubert, Informer and X…
hovnatan Apr 2, 2024
33288ff
[bnb] Fix bug in `_replace_with_bnb_linear` (#29958)
SunMarc Apr 2, 2024
fed27ff
Adding FlaxNoRepeatNGramLogitsProcessor (#29677)
giganttheo Apr 2, 2024
0d04b1e
Add Flash Attention 2 support to Musicgen and Musicgen Melody (#29939)
ylacombe Apr 2, 2024
cb5927c
[Docs] Make an ordered list prettier in add_tensorflow_model.md (#29949)
windsonsea Apr 2, 2024
15cd687
Fix `skip_special_tokens` for `Wav2Vec2CTCTokenizer._decode` (#29311)
msublee Apr 2, 2024
9b0a8ea
Hard error when ignoring tensors. (#27484) (#29906)
Narsil Apr 2, 2024
5080ab1
Generate: fix logits processors doctests (#29718)
gante Apr 2, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
1 change: 1 addition & 0 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,7 @@ jobs:
command: pip freeze | tee installed.txt
- store_artifacts:
path: ~/transformers/installed.txt
- run: python -c "from transformers import *" || (echo '🚨 import failed, this means you introduced unprotected imports! 🚨'; exit 1)
- run: ruff check examples tests src utils
- run: ruff format tests src utils --check
- run: python utils/custom_init_isort.py --check_only
Expand Down
146 changes: 76 additions & 70 deletions .github/workflows/build-docker-images.yml
Original file line number Diff line number Diff line change
Expand Up @@ -198,41 +198,44 @@ jobs:
push: true
tags: huggingface/transformers-pytorch-gpu

# Need to be fixed with the help from Guillaume.
# latest-pytorch-amd:
# name: "Latest PyTorch (AMD) [dev]"
# runs-on: [self-hosted, docker-gpu, amd-gpu, single-gpu, mi210]
# steps:
# - name: Set up Docker Buildx
# uses: docker/setup-buildx-action@v3
# - name: Check out code
# uses: actions/checkout@v3
# - name: Login to DockerHub
# uses: docker/login-action@v3
# with:
# username: ${{ secrets.DOCKERHUB_USERNAME }}
# password: ${{ secrets.DOCKERHUB_PASSWORD }}
# - name: Build and push
# uses: docker/build-push-action@v5
# with:
# context: ./docker/transformers-pytorch-amd-gpu
# build-args: |
# REF=main
# push: true
# tags: huggingface/transformers-pytorch-amd-gpu${{ inputs.image_postfix }}
# # Push CI images still need to be re-built daily
# -
# name: Build and push (for Push CI) in a daily basis
# # This condition allows `schedule` events, or `push` events that trigger this workflow NOT via `workflow_call`.
# # The later case is useful for manual image building for debugging purpose. Use another tag in this case!
# if: inputs.image_postfix != '-push-ci'
# uses: docker/build-push-action@v5
# with:
# context: ./docker/transformers-pytorch-amd-gpu
# build-args: |
# REF=main
# push: true
# tags: huggingface/transformers-pytorch-amd-gpu-push-ci
latest-pytorch-amd:
name: "Latest PyTorch (AMD) [dev]"
runs-on: [intel-cpu, 8-cpu, ci]
steps:
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
-
name: Check out code
uses: actions/checkout@v3
-
name: Login to DockerHub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
-
name: Build and push
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-pytorch-amd-gpu
build-args: |
REF=main
push: true
tags: huggingface/transformers-pytorch-amd-gpu${{ inputs.image_postfix }}
# Push CI images still need to be re-built daily
-
name: Build and push (for Push CI) in a daily basis
# This condition allows `schedule` events, or `push` events that trigger this workflow NOT via `workflow_call`.
# The later case is useful for manual image building for debugging purpose. Use another tag in this case!
if: inputs.image_postfix != '-push-ci'
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-pytorch-amd-gpu
build-args: |
REF=main
push: true
tags: huggingface/transformers-pytorch-amd-gpu-push-ci

latest-tensorflow:
name: "Latest TensorFlow [dev]"
Expand Down Expand Up @@ -262,41 +265,44 @@ jobs:
push: true
tags: huggingface/transformers-tensorflow-gpu

# latest-pytorch-deepspeed-amd:
# name: "PyTorch + DeepSpeed (AMD) [dev]"

# runs-on: [self-hosted, docker-gpu, amd-gpu, single-gpu, mi210]
# steps:
# - name: Set up Docker Buildx
# uses: docker/setup-buildx-action@v3
# - name: Check out code
# uses: actions/checkout@v3
# - name: Login to DockerHub
# uses: docker/login-action@v3
# with:
# username: ${{ secrets.DOCKERHUB_USERNAME }}
# password: ${{ secrets.DOCKERHUB_PASSWORD }}
# - name: Build and push
# uses: docker/build-push-action@v5
# with:
# context: ./docker/transformers-pytorch-deepspeed-amd-gpu
# build-args: |
# REF=main
# push: true
# tags: huggingface/transformers-pytorch-deepspeed-amd-gpu${{ inputs.image_postfix }}
# # Push CI images still need to be re-built daily
# -
# name: Build and push (for Push CI) in a daily basis
# # This condition allows `schedule` events, or `push` events that trigger this workflow NOT via `workflow_call`.
# # The later case is useful for manual image building for debugging purpose. Use another tag in this case!
# if: inputs.image_postfix != '-push-ci'
# uses: docker/build-push-action@v5
# with:
# context: ./docker/transformers-pytorch-deepspeed-amd-gpu
# build-args: |
# REF=main
# push: true
# tags: huggingface/transformers-pytorch-deepspeed-amd-gpu-push-ci
latest-pytorch-deepspeed-amd:
name: "PyTorch + DeepSpeed (AMD) [dev]"
runs-on: [intel-cpu, 8-cpu, ci]
steps:
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
-
name: Check out code
uses: actions/checkout@v3
-
name: Login to DockerHub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
-
name: Build and push
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-pytorch-deepspeed-amd-gpu
build-args: |
REF=main
push: true
tags: huggingface/transformers-pytorch-deepspeed-amd-gpu${{ inputs.image_postfix }}
# Push CI images still need to be re-built daily
-
name: Build and push (for Push CI) in a daily basis
# This condition allows `schedule` events, or `push` events that trigger this workflow NOT via `workflow_call`.
# The later case is useful for manual image building for debugging purpose. Use another tag in this case!
if: inputs.image_postfix != '-push-ci'
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-pytorch-deepspeed-amd-gpu
build-args: |
REF=main
push: true
tags: huggingface/transformers-pytorch-deepspeed-amd-gpu-push-ci

latest-quantization-torch-docker:
name: "Latest Pytorch + Quantization [dev]"
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/build_documentation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ jobs:
package: transformers
notebook_folder: transformers_doc
languages: de en es fr hi it ko pt tr zh ja te
custom_container: huggingface/transformers-doc-builder
secrets:
token: ${{ secrets.HUGGINGFACE_PUSH }}
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
1 change: 1 addition & 0 deletions .github/workflows/build_pr_documentation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,4 @@ jobs:
pr_number: ${{ github.event.number }}
package: transformers
languages: de en es fr hi it ko pt tr zh ja te
custom_container: huggingface/transformers-doc-builder
2 changes: 2 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -51,12 +51,14 @@ repo-consistency:
# this target runs checks on all files

quality:
@python -c "from transformers import *" || (echo '🚨 import failed, this means you introduced unprotected imports! 🚨'; exit 1)
ruff check $(check_dirs) setup.py conftest.py
ruff format --check $(check_dirs) setup.py conftest.py
python utils/custom_init_isort.py --check_only
python utils/sort_auto_mappings.py --check_only
python utils/check_doc_toc.py


# Format source code automatically and check is there are any problems left that need manual fixing

extra_style_checks:
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -473,6 +473,7 @@ Current number of checkpoints: ![](https://img.shields.io/endpoint?url=https://h
1. **[PVTv2](https://huggingface.co/docs/transformers/model_doc/pvt_v2)** (from Shanghai AI Laboratory, Nanjing University, The University of Hong Kong etc.) released with the paper [PVT v2: Improved Baselines with Pyramid Vision Transformer](https://arxiv.org/abs/2106.13797) by Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao.
1. **[QDQBert](https://huggingface.co/docs/transformers/model_doc/qdqbert)** (from NVIDIA) released with the paper [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius.
1. **[Qwen2](https://huggingface.co/docs/transformers/model_doc/qwen2)** (from the Qwen team, Alibaba Group) released with the paper [Qwen Technical Report](https://arxiv.org/abs/2309.16609) by Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan, Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu, Benfeng Xu, Jin Xu, An Yang, Hao Yang, Jian Yang, Shusheng Yang, Yang Yao, Bowen Yu, Hongyi Yuan, Zheng Yuan, Jianwei Zhang, Xingxuan Zhang, Yichang Zhang, Zhenru Zhang, Chang Zhou, Jingren Zhou, Xiaohuan Zhou and Tianhang Zhu.
1. **[Qwen2MoE](https://huggingface.co/docs/transformers/main/model_doc/qwen2_moe)** (from the Qwen team, Alibaba Group) released with [blog post](https://qwenlm.github.io/blog/qwen-moe/) by Bo Zheng, Dayiheng Liu, Rui Men, Junyang Lin, Zhou San, Bowen Yu, An Yang, Mingfeng Xue, Fei Huang, Binyuan Hui, Mei Li, Tianyu Liu, Xingzhang Ren, Xuancheng Ren, Kexin Yang, Chang Zhou, Jingren Zhou.
1. **[RAG](https://huggingface.co/docs/transformers/model_doc/rag)** (from Facebook) released with the paper [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401) by Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela.
1. **[REALM](https://huggingface.co/docs/transformers/model_doc/realm.html)** (from Google Research) released with the paper [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909) by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang.
1. **[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
Expand Down
1 change: 1 addition & 0 deletions README_de.md
Original file line number Diff line number Diff line change
Expand Up @@ -469,6 +469,7 @@ Aktuelle Anzahl der Checkpoints: ![](https://img.shields.io/endpoint?url=https:/
1. **[PVTv2](https://huggingface.co/docs/transformers/model_doc/pvt_v2)** (from Shanghai AI Laboratory, Nanjing University, The University of Hong Kong etc.) released with the paper [PVT v2: Improved Baselines with Pyramid Vision Transformer](https://arxiv.org/abs/2106.13797) by Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao.
1. **[QDQBert](https://huggingface.co/docs/transformers/model_doc/qdqbert)** (from NVIDIA) released with the paper [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius.
1. **[Qwen2](https://huggingface.co/docs/transformers/model_doc/qwen2)** (from the Qwen team, Alibaba Group) released with the paper [Qwen Technical Report](https://arxiv.org/abs/2309.16609) by Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan, Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu, Benfeng Xu, Jin Xu, An Yang, Hao Yang, Jian Yang, Shusheng Yang, Yang Yao, Bowen Yu, Hongyi Yuan, Zheng Yuan, Jianwei Zhang, Xingxuan Zhang, Yichang Zhang, Zhenru Zhang, Chang Zhou, Jingren Zhou, Xiaohuan Zhou and Tianhang Zhu.
1. **[Qwen2MoE](https://huggingface.co/docs/transformers/main/model_doc/qwen2_moe)** (from the Qwen team, Alibaba Group) released with the paper [blog post](https://qwenlm.github.io/blog/qwen-moe/) by Bo Zheng, Dayiheng Liu, Rui Men, Junyang Lin, Zhou San, Bowen Yu, An Yang, Mingfeng Xue, Fei Huang, Binyuan Hui, Mei Li, Tianyu Liu, Xingzhang Ren, Xuancheng Ren, Kexin Yang, Chang Zhou, Jingren Zhou.
1. **[RAG](https://huggingface.co/docs/transformers/model_doc/rag)** (from Facebook) released with the paper [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401) by Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela.
1. **[REALM](https://huggingface.co/docs/transformers/model_doc/realm.html)** (from Google Research) released with the paper [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909) by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang.
1. **[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
Expand Down
1 change: 1 addition & 0 deletions README_es.md
Original file line number Diff line number Diff line change
Expand Up @@ -446,6 +446,7 @@ Número actual de puntos de control: ![](https://img.shields.io/endpoint?url=htt
1. **[PVTv2](https://huggingface.co/docs/transformers/model_doc/pvt_v2)** (from Shanghai AI Laboratory, Nanjing University, The University of Hong Kong etc.) released with the paper [PVT v2: Improved Baselines with Pyramid Vision Transformer](https://arxiv.org/abs/2106.13797) by Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao.
1. **[QDQBert](https://huggingface.co/docs/transformers/model_doc/qdqbert)** (from NVIDIA) released with the paper [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius.
1. **[Qwen2](https://huggingface.co/docs/transformers/model_doc/qwen2)** (from the Qwen team, Alibaba Group) released with the paper [Qwen Technical Report](https://arxiv.org/abs/2309.16609) by Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan, Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu, Benfeng Xu, Jin Xu, An Yang, Hao Yang, Jian Yang, Shusheng Yang, Yang Yao, Bowen Yu, Hongyi Yuan, Zheng Yuan, Jianwei Zhang, Xingxuan Zhang, Yichang Zhang, Zhenru Zhang, Chang Zhou, Jingren Zhou, Xiaohuan Zhou and Tianhang Zhu.
1. **[Qwen2MoE](https://huggingface.co/docs/transformers/main/model_doc/qwen2_moe)** (from the Qwen team, Alibaba Group) released with the paper [blog post](https://qwenlm.github.io/blog/qwen-moe/) by Bo Zheng, Dayiheng Liu, Rui Men, Junyang Lin, Zhou San, Bowen Yu, An Yang, Mingfeng Xue, Fei Huang, Binyuan Hui, Mei Li, Tianyu Liu, Xingzhang Ren, Xuancheng Ren, Kexin Yang, Chang Zhou, Jingren Zhou.
1. **[RAG](https://huggingface.co/docs/transformers/model_doc/rag)** (from Facebook) released with the paper [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401) by Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela.
1. **[REALM](https://huggingface.co/docs/transformers/model_doc/realm.html)** (from Google Research) released with the paper [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909) by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang.
1. **[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
Expand Down
1 change: 1 addition & 0 deletions README_fr.md
Original file line number Diff line number Diff line change
Expand Up @@ -467,6 +467,7 @@ Nombre actuel de points de contrôle : ![](https://img.shields.io/endpoint?url=h
1. **[PVTv2](https://huggingface.co/docs/transformers/model_doc/pvt_v2)** (de Shanghai AI Laboratory, Nanjing University, The University of Hong Kong etc.) publié dans l'article [PVT v2: Improved Baselines with Pyramid Vision Transformer](https://arxiv.org/abs/2106.13797) parWenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao.
1. **[QDQBert](https://huggingface.co/docs/transformers/model_doc/qdqbert)** (de NVIDIA) a été publié dans l'article [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) par Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev et Paulius Micikevicius.
1. **[Qwen2](https://huggingface.co/docs/transformers/model_doc/qwen2)** (de l'équipe Qwen, Alibaba Group) a été publié avec le rapport technique [Qwen Technical Report](https://arxiv.org/abs/2309.16609) par Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan, Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu, Benfeng Xu, Jin Xu, An Yang, Hao Yang, Jian Yang, Shusheng Yang, Yang Yao, Bowen Yu, Hongyi Yuan, Zheng Yuan, Jianwei Zhang, Xingxuan Zhang, Yichang Zhang, Zhenru Zhang, Chang Zhou, Jingren Zhou, Xiaohuan Zhou et Tianhang Zhu.
1. **[Qwen2MoE](https://huggingface.co/docs/transformers/main/model_doc/qwen2_moe)** (de l'équipe Qwen, Alibaba Group) a été publié avec le rapport technique [blog post](https://qwenlm.github.io/blog/qwen-moe/) par Bo Zheng, Dayiheng Liu, Rui Men, Junyang Lin, Zhou San, Bowen Yu, An Yang, Mingfeng Xue, Fei Huang, Binyuan Hui, Mei Li, Tianyu Liu, Xingzhang Ren, Xuancheng Ren, Kexin Yang, Chang Zhou, Jingren Zhou.
1. **[RAG](https://huggingface.co/docs/transformers/model_doc/rag)** (de Facebook) a été publié dans l'article [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401) par Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela.
1. **[REALM](https://huggingface.co/docs/transformers/model_doc/realm.html)** (de Google Research) a été publié dans l'article [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909) par Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat et Ming-Wei Chang.
1. **[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer)** (de Google Research) a été publié dans l'article [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) par Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
Expand Down
Loading
Loading