huggingface · sgugger · Apr 21, 2021 · Apr 20, 2021 · Apr 21, 2021 · Apr 21, 2021
diff --git a/.circleci/config.yml b/.circleci/config.yml
@@ -306,12 +306,12 @@ jobs:
                       - v0.4-{{ checksum "setup.py" }}
             - run: pip install --upgrade pip
             - run: pip install .[sklearn,torch,sentencepiece,testing]
-            - run: pip install -r examples/_tests_requirements.txt
+            - run: pip install -r examples/pytorch/_tests_requirements.txt
             - save_cache:
                   key: v0.4-torch_examples-{{ checksum "setup.py" }}
                   paths:
                       - '~/.cache/pip'
-            - run: TRANSFORMERS_IS_CI=1 python -m pytest -n 8 --dist=loadfile -s --make-reports=examples_torch ./examples/ | tee examples_output.txt
+            - run: TRANSFORMERS_IS_CI=1 python -m pytest -n 8 --dist=loadfile -s --make-reports=examples_torch ./examples/pytorch/ | tee examples_output.txt
             - store_artifacts:
                   path: ~/transformers/examples_output.txt
             - store_artifacts:

diff --git a/.github/workflows/self-scheduled.yml b/.github/workflows/self-scheduled.yml
@@ -59,7 +59,7 @@ jobs:
           HF_HOME: /mnt/cache
           TRANSFORMERS_IS_CI: yes
         run: |
-          pip install -r examples/_tests_requirements.txt
+          pip install -r examples/pytorch/_tests_requirements.txt
           python -m pytest -n 1 --dist=loadfile --make-reports=examples_torch_gpu examples
 
       - name: Failure short reports

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -285,7 +285,7 @@ $ python -m pytest -n auto --dist=loadfile -s -v ./tests/
 and for the examples:
 
 ```bash
-$ pip install -r examples/requirements.txt  # only needed the first time
+$ pip install -r examples/xxx/requirements.txt  # only needed the first time
 $ python -m pytest -n auto --dist=loadfile -s -v ./examples/
 ```
 In fact, that's how `make test` and `make test-examples` are implemented (sans the `pip install` line)!

diff --git a/Makefile b/Makefile
@@ -73,7 +73,7 @@ test:
 # Run tests for examples
 
 test-examples:
-	python -m pytest -n auto --dist=loadfile -s -v ./examples/
+	python -m pytest -n auto --dist=loadfile -s -v ./examples/pytorch/
 
 # Run tests for SageMaker DLC release
 

diff --git a/docker/transformers-pytorch-tpu/Dockerfile b/docker/transformers-pytorch-tpu/Dockerfile
@@ -53,7 +53,7 @@ RUN git clone https://github.com/huggingface/transformers.git && \
     git checkout CI && \
     cd .. && \
     pip install ./transformers && \
-    pip install -r ./transformers/examples/requirements.txt && \
+    pip install -r ./transformers/examples/pytorch/_test_requirements.txt && \
     pip install pytest
 
 RUN python -c "import torch_xla; print(torch_xla.__version__)"

diff --git a/docker/transformers-pytorch-tpu/bert-base-cased.jsonnet b/docker/transformers-pytorch-tpu/bert-base-cased.jsonnet
@@ -27,7 +27,7 @@ local bertBaseCased = base.BaseTest {
   },
   command: utils.scriptCommand(
     |||
-      python -m pytest -s transformers/examples/test_xla_examples.py -v
+      python -m pytest -s transformers/examples/pytorch/test_xla_examples.py -v
       test_exit_code=$?
       echo "\nFinished running commands.\n"
       test $test_exit_code -eq 0

diff --git a/docs/source/benchmarks.rst b/docs/source/benchmarks.rst
@@ -65,10 +65,10 @@ respectively.
 .. code-block:: bash
 
     ## PYTORCH CODE
-    python examples/benchmarking/run_benchmark.py --help
+    python examples/pytorch/benchmarking/run_benchmark.py --help
 
     ## TENSORFLOW CODE
-    python examples/benchmarking/run_benchmark_tf.py --help
+    python examples/tensorflow/benchmarking/run_benchmark_tf.py --help
 
 
 An instantiated benchmark object can then simply be run by calling ``benchmark.run()``.

diff --git a/docs/source/converting_tensorflow_models.rst b/docs/source/converting_tensorflow_models.rst
@@ -34,7 +34,7 @@ This CLI takes as input a TensorFlow checkpoint (three files starting with ``ber
 configuration file (\ ``bert_config.json``\ ), and creates a PyTorch model for this configuration, loads the weights
 from the TensorFlow checkpoint in the PyTorch model and saves the resulting model in a standard PyTorch save file that
 can be imported using ``from_pretrained()`` (see example in :doc:`quicktour` , `run_glue.py
-<https://github.com/huggingface/transformers/blob/master/examples/text-classification/run_glue.py>`_\ ).
+<https://github.com/huggingface/transformers/blob/master/examples/pytorch/text-classification/run_glue.py>`_\ ).
 
 You only need to run this conversion script **once** to get a PyTorch model. You can then disregard the TensorFlow
 checkpoint (the three files starting with ``bert_model.ckpt``\ ) but be sure to keep the configuration file (\

diff --git a/docs/source/installation.md b/docs/source/installation.md
@@ -168,13 +168,13 @@ Here is an example of how this can be used on a filesystem that is shared betwee
 On the instance with the normal network run your program which will download and cache models (and optionally datasets if you use 🤗 Datasets). For example:
 
 ```
-python examples/seq2seq/run_translation.py --model_name_or_path t5-small --dataset_name wmt16 --dataset_config ro-en ...
+python examples/pytorch/translation/run_translation.py --model_name_or_path t5-small --dataset_name wmt16 --dataset_config ro-en ...
 ```
 
 and then with the same filesystem you can now run the same program on a firewalled instance:
 ```
 HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 \
-python examples/seq2seq/run_translation.py --model_name_or_path t5-small --dataset_name wmt16 --dataset_config ro-en ...
+python examples/pytorch/translation/run_translation.py --model_name_or_path t5-small --dataset_name wmt16 --dataset_config ro-en ...
 ```
 and it should succeed without any hanging waiting to timeout.
 

diff --git a/docs/source/main_classes/processors.rst b/docs/source/main_classes/processors.rst
@@ -69,7 +69,8 @@ Example usage
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 An example using these processors is given in the `run_glue.py
-<https://github.com/huggingface/pytorch-transformers/blob/master/examples/text-classification/run_glue.py>`__ script.
+<https://github.com/huggingface/pytorch-transformers/blob/master/examples/legacy/text-classification/run_glue.py>`__
+script.
 
 
 XNLI
@@ -90,7 +91,8 @@ This library hosts the processor to load the XNLI data:
 Please note that since the gold labels are available on the test set, evaluation is performed on the test set.
 
 An example using these processors is given in the `run_xnli.py
-<https://github.com/huggingface/pytorch-transformers/blob/master/examples/text-classification/run_xnli.py>`__ script.
+<https://github.com/huggingface/pytorch-transformers/blob/master/examples/legacy/text-classification/run_xnli.py>`__
+script.
 
 
 SQuAD
@@ -169,4 +171,4 @@ Using `tensorflow_datasets` is as easy as using a data file:
 
 
 Another example using these processors is given in the :prefix_link:`run_squad.py
-<examples/question-answering/run_squad.py>` script.
+<examples/legacy/question-answering/run_squad.py>` script.
diff --git a/docs/source/main_classes/trainer.rst b/docs/source/main_classes/trainer.rst
@@ -338,7 +338,7 @@ For example here is how you could use it for ``run_translation.py`` with 2 GPUs:
 
 .. code-block:: bash
 
-    python -m torch.distributed.launch --nproc_per_node=2 examples/seq2seq/run_translation.py \
+    python -m torch.distributed.launch --nproc_per_node=2 examples/pytorch/translation/run_translation.py \
     --model_name_or_path t5-small --per_device_train_batch_size 1   \
     --output_dir output_dir --overwrite_output_dir \
     --do_train --max_train_samples 500 --num_train_epochs 1 \
@@ -363,7 +363,7 @@ For example here is how you could use it for ``run_translation.py`` with 2 GPUs:
 
 .. code-block:: bash
 
-    python -m torch.distributed.launch --nproc_per_node=2 examples/seq2seq/run_translation.py \
+    python -m torch.distributed.launch --nproc_per_node=2 examples/pytorch/translation/run_translation.py \
     --model_name_or_path t5-small --per_device_train_batch_size 1   \
     --output_dir output_dir --overwrite_output_dir \
     --do_train --max_train_samples 500 --num_train_epochs 1 \
@@ -540,7 +540,7 @@ Here is an example of running ``run_translation.py`` under DeepSpeed deploying a
 
 .. code-block:: bash
 
-    deepspeed examples/seq2seq/run_translation.py \
+    deepspeed examples/pytorch/translation/run_translation.py \
     --deepspeed tests/deepspeed/ds_config.json \
     --model_name_or_path t5-small --per_device_train_batch_size 1   \
     --output_dir output_dir --overwrite_output_dir --fp16 \
@@ -565,7 +565,7 @@ To deploy DeepSpeed with one GPU adjust the :class:`~transformers.Trainer` comma
 
 .. code-block:: bash
 
-    deepspeed --num_gpus=1 examples/seq2seq/run_translation.py \
+    deepspeed --num_gpus=1 examples/pytorch/translation/run_translation.py \
     --deepspeed tests/deepspeed/ds_config.json \
     --model_name_or_path t5-small --per_device_train_batch_size 1   \
     --output_dir output_dir --overwrite_output_dir --fp16 \
@@ -617,7 +617,7 @@ Notes:
 
    .. code-block:: bash
 
-       deepspeed --include localhost:1 examples/seq2seq/run_translation.py ...
+       deepspeed --include localhost:1 examples/pytorch/translation/run_translation.py ...
 
    In this example, we tell DeepSpeed to use GPU 1 (second gpu).
 
@@ -711,7 +711,7 @@ shell from a cell. For example, to use ``run_translation.py`` you would launch i
 .. code-block::
 
     !git clone https://github.com/huggingface/transformers
-    !cd transformers; deepspeed examples/seq2seq/run_translation.py ...
+    !cd transformers; deepspeed examples/pytorch/translation/run_translation.py ...
 
 or with ``%%bash`` magic, where you can write a multi-line code for the shell program to run:
 
@@ -721,7 +721,7 @@ or with ``%%bash`` magic, where you can write a multi-line code for the shell pr
 
     git clone https://github.com/huggingface/transformers
     cd transformers
-    deepspeed examples/seq2seq/run_translation.py ...
+    deepspeed examples/pytorch/translation/run_translation.py ...
 
 In such case you don't need any of the code presented at the beginning of this section.
 

diff --git a/docs/source/model_doc/bart.rst b/docs/source/model_doc/bart.rst
@@ -42,7 +42,7 @@ Examples
 _______________________________________________________________________________________________________________________
 
 - Examples and scripts for fine-tuning BART and other models for sequence to sequence tasks can be found in
-  :prefix_link:`examples/seq2seq/ <examples/seq2seq/README.md>`.
+  :prefix_link:`examples/pytorch/summarization/ <examples/pytorch/summarization/README.md>`.
 - An example of how to train :class:`~transformers.BartForConditionalGeneration` with a Hugging Face :obj:`datasets`
   object can be found in this `forum discussion
   <https://discuss.huggingface.co/t/train-bart-for-conditional-generation-e-g-summarization/1904>`__.

diff --git a/docs/source/model_doc/barthez.rst b/docs/source/model_doc/barthez.rst
@@ -42,7 +42,7 @@ Examples
 _______________________________________________________________________________________________________________________
 
 - BARThez can be fine-tuned on sequence-to-sequence tasks in a similar way as BART, check:
-  :prefix_link:`examples/seq2seq/ <examples/seq2seq/README.md>`.
+  :prefix_link:`examples/pytorch/summarization/ <examples/pytorch/summarization/README.md>`.
 
 
 BarthezTokenizer

diff --git a/docs/source/model_doc/distilbert.rst b/docs/source/model_doc/distilbert.rst
@@ -45,7 +45,7 @@ Tips:
   necessary though, just let us know if you need this option.
 
 The original code can be found `here
-<https://github.com/huggingface/transformers/tree/master/examples/distillation>`__.
+<https://github.com/huggingface/transformers/tree/master/research-project/distillation>`__.
 
 
 DistilBertConfig

diff --git a/docs/source/model_doc/pegasus.rst b/docs/source/model_doc/pegasus.rst
@@ -52,7 +52,8 @@ Examples
 _______________________________________________________________________________________________________________________
 
 - :prefix_link:`Script <examples/research_projects/seq2seq-distillation/finetune_pegasus_xsum.sh>` to fine-tune pegasus
-  on the XSUM dataset. Data download instructions at :prefix_link:`examples/seq2seq/ <examples/seq2seq/README.md>`.
+  on the XSUM dataset. Data download instructions at :prefix_link:`examples/pytorch/summarization/
+  <examples/pytorch/summarization/README.md>`.
 - FP16 is not supported (help/ideas on this appreciated!).
 - The adafactor optimizer is recommended for pegasus fine-tuning.
 

diff --git a/docs/source/model_doc/retribert.rst b/docs/source/model_doc/retribert.rst
@@ -21,7 +21,7 @@ Question Answering <https://yjernite.github.io/lfqa.html>`__. RetriBERT is a sma
 pair of BERT encoders with lower-dimension projection for dense semantic indexing of text.
 
 Code to train and use the model can be found `here
-<https://github.com/huggingface/transformers/tree/master/examples/distillation>`__.
+<https://github.com/huggingface/transformers/tree/master/examples/research-projects/distillation>`__.
 
 
 RetriBertConfig

diff --git a/docs/source/model_doc/xlnet.rst b/docs/source/model_doc/xlnet.rst
@@ -41,7 +41,7 @@ Tips:
   using only a sub-set of the output tokens as target which are selected with the :obj:`target_mapping` input.
 - To use XLNet for sequential decoding (i.e. not in fully bi-directional setting), use the :obj:`perm_mask` and
   :obj:`target_mapping` inputs to control the attention span and outputs (see examples in
-  `examples/text-generation/run_generation.py`)
+  `examples/pytorch/text-generation/run_generation.py`)
 - XLNet is one of the few models that has no sequence length limit.
 
 The original code can be found `here <https://github.com/zihangdai/xlnet/>`__.

diff --git a/docs/source/model_summary.rst b/docs/source/model_summary.rst
@@ -682,7 +682,8 @@ The `mbart-large-en-ro checkpoint <https://huggingface.co/facebook/mbart-large-e
 romanian translation.
 
 The `mbart-large-cc25 <https://huggingface.co/facebook/mbart-large-cc25>`_ checkpoint can be finetuned for other
-translation and summarization tasks, using code in ```examples/seq2seq/``` , but is not very useful without finetuning.
+translation and summarization tasks, using code in ```examples/pytorch/translation/``` , but is not very useful without
+finetuning.
 
 
 ProphetNet

diff --git a/docs/source/multilingual.rst b/docs/source/multilingual.rst
@@ -90,8 +90,8 @@ You can then feed it all as input to your model:
     >>> outputs = model(input_ids, langs=langs)
 
 
-The example :prefix_link:`run_generation.py <examples/text-generation/run_generation.py>` can generate text using the
-CLM checkpoints from XLM, using the language embeddings.
+The example :prefix_link:`run_generation.py <examples/pytorch/text-generation/run_generation.py>` can generate text
+using the CLM checkpoints from XLM, using the language embeddings.
 
 XLM without Language Embeddings
 -----------------------------------------------------------------------------------------------------------------------

diff --git a/docs/source/sagemaker.md b/docs/source/sagemaker.md
@@ -325,7 +325,7 @@ When you create a `HuggingFace` Estimator, you can specify a [training script th
 
 If you are using `git_config` to run the [🤗 Transformers examples scripts](https://github.com/huggingface/transformers/tree/master/examples) keep in mind that you need to configure the right `'branch'` for you `transformers_version`, e.g. if you use `transformers_version='4.4.2` you have to use `'branch':'v4.4.2'`. 
 
-As an example to use `git_config` with an [example script from the transformers repository](https://github.com/huggingface/transformers/tree/master/examples/text-classification).
+As an example to use `git_config` with an [example script from the transformers repository](https://github.com/huggingface/transformers/tree/master/pytorch/text-classification).
 
 _Tip: define `output_dir` as `/opt/ml/model` in the hyperparameter for the script to save your model to S3 after training._
 
@@ -338,7 +338,7 @@ git_config = {'repo': 'https://github.com/huggingface/transformers.git','branch'
  # create the Estimator
 huggingface_estimator = HuggingFace(
         entry_point='run_glue.py',
-        source_dir='./examples/text-classification',
+        source_dir='./examples/pytorch/text-classification',
         git_config=git_config,
         instance_type='ml.p3.2xlarge',
         instance_count=1,