adapter-hub · calpt · Feb 15, 2023 · Jan 19, 2023 · Jan 21, 2023 · Jan 26, 2023
diff --git a/adapter_docs/classes/adapter_training.rst b/adapter_docs/classes/adapter_training.rst
@@ -0,0 +1,7 @@
+Adapter Training
+====================
+
+Classes and methods related to training adapters.
+
+.. automodule:: transformers.adapters.training
+    :members:
diff --git a/adapter_docs/index.rst b/adapter_docs/index.rst
@@ -82,6 +82,7 @@ Currently, we support the PyTorch versions of all models as listed on the `Model
    classes/adapter_modules
    classes/adapter_layer
    classes/model_mixins
+   classes/adapter_training
    classes/adapter_utils
 
 .. toctree::

diff --git a/adapter_docs/training.md b/adapter_docs/training.md
@@ -1,22 +1,38 @@
 # Adapter Training
 
-This section describes some examples on training different types of adapter modules in Transformer models.
-The presented training scripts are only slightly modified from the original [examples by Huggingface](https://huggingface.co/transformers/examples.html).
+This section describes some examples of training adapter methods for different scenarios. We focus on integrating adapter methods into existing training scripts for Transformer models.
+All presented scripts are only slightly modified from the original [examples from HuggingFace Transformers](https://huggingface.co/transformers/examples.html).
 To run the scripts, make sure you have the latest version of the repository and have installed some additional requirements:
 
 ```
 git clone https://github.com/adapter-hub/adapter-transformers
-cd transformers
+cd adapter-transformers
 pip install .
-pip install -r ./examples/<your_examples_folder>/requirements.txt
+pip install -r ./examples/pytorch/<your_examples_folder>/requirements.txt
 ```
 
 ## Train a Task Adapter
 
 Training a task adapter module on a dataset only requires minor modifications from training the full model.
-Suppose we have an existing script for training a Transformer model, here we will use HuggingFace's [run_glue.py](https://github.com/Adapter-Hub/adapter-transformers/blob/master/examples/text-classification/run_glue.py) example script for training on the GLUE dataset.
+Suppose we have an existing script for training a Transformer model.
+In the following, we will use HuggingFace's [run_glue.py](https://github.com/Adapter-Hub/adapter-transformers/blob/master/examples/pytorch/text-classification/run_glue.py) example script for training on the GLUE benchmark.
+We go through all required changes step by step:
 
-In our example, we replaced the built-in `AutoModelForSequenceClassification` class with the `AutoAdapterModel` class introduced by `adapter-transformers` (learn more about prediction heads [here](prediction_heads.md)).
+### Step A - Parse `AdapterArguments`
+
+The [`AdapterArguments`](transformers.adapters.training.AdapterArguments) class integrated into adapter-transformers provides a set of command-line options useful for training adapters.
+These include options such as `--train_adapter` for activating adapter training and `--load_adapter` for loading adapters from checkpoints.
+Thus, the first step of integrating adapters is to add these arguments to the line where `HfArgumentParser` is instantiated:
+
+```python
+parser = HfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments, AdapterArguments))
+# ...
+model_args, data_args, training_args, adapter_args = parser.parse_args_into_dataclasses()
+```
+
+### Step B - Switch model class (optional)
+
+In our example, we replace the built-in `AutoModelForSequenceClassification` class with the `AutoAdapterModel` class introduced by `adapter-transformers`.
 Therefore, the model instantiation changed to:
 
 ```python
@@ -27,22 +43,25 @@ model = AutoAdapterModel.from_pretrained(
 model.add_classification_head(data_args.task_name, num_labels=num_labels)
 ```
 
-Compared to fine-tuning the full model, there is only one significant adaptation we have to make: adding a new adapter module and activating it.
+Note that this change is entirely optional and training will also work with the original model class.
+Learn more about the benefits of AdapterModel classes [here](prediction_heads.md)
+
+### Step C - Setup adapter methods
+
+```{eval-rst}
+.. tip::
+    In the following, we show how to setup adapters manually. In most cases, you can use the built-in ``setup_adapter_training()`` method to perform this job automatically. Just add a statement similar to this anywhere between model instantiation and training start in your script: ``setup_adapter_training(model, adapter_args, task_name)``
+```
+
+Compared to fine-tuning the full model, there is only this one significant adaptation we have to make: adding an adapter setup and activating it.
 
 ```python
 # task adapter - only add if not existing
 if task_name not in model.config.adapters:
     # resolve the adapter config
-    adapter_config = AdapterConfig.load(
-        adapter_args.adapter_config,
-        non_linearity=adapter_args.adapter_non_linearity,
-        reduction_factor=adapter_args.adapter_reduction_factor,
-    )
+    adapter_config = AdapterConfig.load(adapter_args.adapter_config)
     # add a new adapter
-    model.add_adapter(
-        task_name,
-        config=adapter_config
-    )
+    model.add_adapter(task_name, config=adapter_config)
 # Enable adapter training
 model.train_adapter(task_name)
 ```
@@ -63,10 +82,20 @@ on complex setups checkout the [Composition Blocks](https://docs.adapterhub.ml/a
 model.set_active_adapters(task_name)
 ```
 
+### Step D - Switch to `AdapterTrainer` class
+
+Finally, we switch the `Trainer` class built into Transformers for adapter-transformers' [`AdapterTrainer`](transformers.adapters.AdapterTrainer) class that is optimized for training adapter methods.
+See [below for more information](#adaptertrainer).
+
+Technically, this change is not required as no changes to the training loop are required for training adapters.
+However, `AdapterTrainer` e.g. provides better support for checkpointing and reloading adapter weights.
+
+### Step E - Start training
+
 The rest of the training procedure does not require any further changes in code.
 
-You can find the full version of the modified training script for GLUE at [run_glue.py](https://github.com/Adapter-Hub/adapter-transformers/blob/master/examples/text-classification/run_glue.py) in the `examples` folder of our repository.
-We also adapted [various other example scripts](https://github.com/Adapter-Hub/adapter-transformers/tree/master/examples) (e.g. `run_glue.py`, `run_multiple_choice.py`, `run_squad.py`, ...) to support adapter training.
+You can find the full version of the modified training script for GLUE at [run_glue.py](https://github.com/Adapter-Hub/adapter-transformers/blob/master/examples/pytorch/text-classification/run_glue.py) in the `examples` folder of our repository.
+We also adapted [various other example scripts](https://github.com/Adapter-Hub/adapter-transformers/tree/master/examples/pytorch) (e.g. `run_glue.py`, `run_multiple_choice.py`, `run_squad.py`, ...) to support adapter training.
 
 To start adapter training on a GLUE task, you can run something similar to:
 
@@ -103,7 +132,7 @@ The important flag here is `--train_adapter` which switches from fine-tuning the
 ## Train a Language Adapter
 
 Training a language adapter is equally straightforward as training a task adapter. Similarly to the steps for task adapters
-described above, we add a language adapter module to an existing model training script. Here, we modified HuggingFace's [run_mlm.py](https://github.com/Adapter-Hub/adapter-transformers/blob/v2/examples/language-modeling/run_mlm.py) script for masked language modeling with BERT-based models.
+described above, we add a language adapter module to an existing model training script. Here, we modified HuggingFace's [run_mlm.py](https://github.com/Adapter-Hub/adapter-transformers/blob/master/examples/pytorch/language-modeling/run_mlm.py) script for masked language modeling with BERT-based models.
 
 Training a language adapter on BERT using this script may look like the following:
 
@@ -126,7 +155,7 @@ python run_mlm.py \
 
 ## Train AdapterFusion
 
-We provide an example for training _AdapterFusion_ ([Pfeiffer et al., 2020](https://arxiv.org/pdf/2005.00247)) on the GLUE dataset: [run_fusion_glue.py](https://github.com/Adapter-Hub/adapter-transformers/blob/master/examples/adapterfusion/run_fusion_glue.py). 
+We provide an example for training _AdapterFusion_ ([Pfeiffer et al., 2020](https://arxiv.org/pdf/2005.00247)) on the GLUE dataset: [run_fusion_glue.py](https://github.com/Adapter-Hub/adapter-transformers/blob/master/examples/pytorch/adapterfusion/run_fusion_glue.py). 
 You can adapt this script to train AdapterFusion with different pre-trained adapters on your own dataset.
 
 ```{eval-rst}
@@ -158,7 +187,8 @@ python run_fusion_glue.py \
 
 
 ## AdapterTrainer
-Similar to the `Trainer` class provided by huggingface, adapter-transformers provides an `AdapterTrainer` class. This class is only
+
+Similar to the `Trainer` class provided by HuggingFace, adapter-transformers provides an `AdapterTrainer` class. This class is only
 intended for training adapters. The `Trainer` class should still be used to fully fine-tune models. To train adapters with the `AdapterTrainer`
 class, simply initialize it the same way you would initialize the `Trainer` class e.g.: 
 

diff --git a/examples/pytorch/dependency-parsing/run_udp.py b/examples/pytorch/dependency-parsing/run_udp.py
@@ -13,15 +13,8 @@
 
 import transformers.adapters.composition as ac
 from preprocessing import preprocess_dataset
-from transformers import (
-    AdapterConfig,
-    AutoAdapterModel,
-    AutoConfig,
-    AutoTokenizer,
-    HfArgumentParser,
-    MultiLingAdapterArguments,
-    set_seed,
-)
+from transformers import AutoConfig, AutoTokenizer, HfArgumentParser, set_seed
+from transformers.adapters import AdapterArguments, AdapterConfigBase, AutoAdapterModel, setup_adapter_training
 from utils_udp import UD_HEAD_LABELS, DependencyParsingAdapterTrainer, DependencyParsingTrainer, UDTrainingArguments
 
 
@@ -94,7 +87,7 @@ def main():
     # See all possible arguments in src/transformers/training_args.py
     # or by passing the --help flag to this script.
     # We now keep distinct sets of args, for a cleaner separation of concerns.
-    parser = HfArgumentParser((ModelArguments, DataTrainingArguments, UDTrainingArguments, MultiLingAdapterArguments))
+    parser = HfArgumentParser((ModelArguments, DataTrainingArguments, UDTrainingArguments, AdapterArguments))
     if len(sys.argv) == 2 and sys.argv[1].endswith(".json"):
         # If we pass only one argument to the script and it's the path to a json file,
         # let's parse it to get our arguments.
@@ -170,7 +163,6 @@ def main():
 
     # The task name (with prefix)
     task_name = "ud_" + data_args.task_name
-    language = adapter_args.language
 
     model = AutoAdapterModel.from_pretrained(
         model_args.model_name_or_path,
@@ -183,65 +175,6 @@ def main():
         id2label=label_map,
     )
 
-    if model_args.leave_out_twelvth:
-        logger.info("Leaving out 12")
-        leave_out = [11]
-    else:
-        leave_out = []
-
-    # Setup adapters
-    if adapter_args.train_adapter:
-        # check if adapter already exists, otherwise add it
-        if task_name not in model.config.adapters:
-            # resolve the adapter config
-            adapter_config = AdapterConfig.load(
-                adapter_args.adapter_config,
-                non_linearity=adapter_args.adapter_non_linearity,
-                reduction_factor=adapter_args.adapter_reduction_factor,
-                leave_out=leave_out,
-            )
-            # load a pre-trained from Hub if specified
-            if adapter_args.load_adapter:
-                model.load_adapter(
-                    adapter_args.load_adapter,
-                    config=adapter_config,
-                    load_as=task_name,
-                    leave_out=leave_out,
-                )
-            # otherwise, add a fresh adapter
-            else:
-                model.add_adapter(task_name, config=adapter_config)
-        # optionally load a pre-trained language adapter
-        if adapter_args.load_lang_adapter:
-            # resolve the language adapter config
-            lang_adapter_config = AdapterConfig.load(
-                adapter_args.lang_adapter_config,
-                non_linearity=adapter_args.lang_adapter_non_linearity,
-                reduction_factor=adapter_args.lang_adapter_reduction_factor,
-                leave_out=leave_out,
-            )
-            # load the language adapter from Hub
-            lang_adapter_name = model.load_adapter(
-                adapter_args.load_lang_adapter,
-                config=lang_adapter_config,
-                load_as=adapter_args.language,
-                leave_out=leave_out,
-            )
-        else:
-            lang_adapter_name = None
-        # Freeze all model weights except of those of this adapter
-        model.train_adapter([task_name])
-        # Set the adapters to be used in every forward pass
-        if lang_adapter_name:
-            model.set_active_adapters(ac.Stack(lang_adapter_name, task_name))
-        else:
-            model.set_active_adapters(task_name)
-    else:
-        if adapter_args.load_adapter or adapter_args.load_lang_adapter:
-            raise ValueError(
-                "Adapters can only be loaded in adapters training mode.Use --train_adapter to enable adapter training"
-            )
-
     # Load and preprocess dataset
     if data_args.use_mock_data:
         from datasets import Version, load_dataset_builder
@@ -255,6 +188,21 @@ def main():
         dataset = load_dataset("universal_dependencies", data_args.task_name)
     dataset = preprocess_dataset(dataset, tokenizer, labels, data_args, pad_token_id=-1)
 
+    # Setup adapters
+    if model_args.leave_out_twelvth:
+        logger.info("Leaving out 12")
+        adapter_config_kwargs = {"leave_out": [11]}
+        adapter_load_kwargs = {"leave_out": [11]}
+    else:
+        adapter_config_kwargs = {}
+        adapter_load_kwargs = {}
+    adapter_name, lang_adapter_name = setup_adapter_training(
+        model,
+        adapter_args,
+        task_name,
+        adapter_config_kwargs=adapter_config_kwargs,
+        adapter_load_kwargs=adapter_load_kwargs,
+    )
     # Initialize our Trainer
     # HACK: Set this attribute to False to prevent label columns from being deleted
     training_args.remove_unused_columns = False
@@ -300,30 +248,30 @@ def main():
             logger.info("Loading best model for predictions.")
 
             if adapter_args.train_adapter:
-                if language:
-                    lang_adapter_config = AdapterConfig.load(
-                        config="pfeiffer", non_linearity="gelu", reduction_factor=2, leave_out=leave_out
-                    )
-                    model.load_adapter(
-                        os.path.join(training_args.output_dir, "best_model", language)
-                        if training_args.do_train
-                        else adapter_args.load_lang_adapter,
-                        config=lang_adapter_config,
-                        load_as=language,
-                        leave_out=leave_out,
-                    )
-                task_adapter_config = AdapterConfig.load(
-                    config="pfeiffer", non_linearity="gelu", reduction_factor=16, leave_out=leave_out
-                )
+                adapter_config = AdapterConfigBase.load(adapter_args.adapter_config, **adapter_config_kwargs)
                 model.load_adapter(
                     os.path.join(training_args.output_dir, "best_model", task_name)
                     if training_args.do_train
                     else adapter_args.load_adapter,
-                    config=task_adapter_config,
+                    config=adapter_config,
                     load_as=task_name,
-                    leave_out=leave_out,
+                    **adapter_load_kwargs,
                 )
-                if language:
+                if adapter_args.load_lang_adapter:
+                    lang_adapter_config = AdapterConfigBase.load(
+                        adapter_args.lang_adapter_config, **adapter_config_kwargs
+                    )
+                    lang_adapter_name = model.load_adapter(
+                        os.path.join(training_args.output_dir, "best_model", lang_adapter_name)
+                        if training_args.do_train
+                        else adapter_args.load_lang_adapter,
+                        config=lang_adapter_config,
+                        load_as=lang_adapter_name,
+                        **adapter_load_kwargs,
+                    )
+                else:
+                    lang_adapter_name = None
+                if lang_adapter_name:
                     model.set_active_adapters(ac.Stack(lang_adapter_name, task_name))
                 else:
                     model.set_active_adapters(task_name)