Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor example scripts to leverage config strings #486

Merged
merged 8 commits into from
Feb 15, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions adapter_docs/classes/adapter_training.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Adapter Training
====================

Classes and methods related to training adapters.

.. automodule:: transformers.adapters.training
:members:
1 change: 1 addition & 0 deletions adapter_docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ Currently, we support the PyTorch versions of all models as listed on the `Model
classes/adapter_modules
classes/adapter_layer
classes/model_mixins
classes/adapter_training
classes/adapter_utils

.. toctree::
Expand Down
72 changes: 51 additions & 21 deletions adapter_docs/training.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,38 @@
# Adapter Training

This section describes some examples on training different types of adapter modules in Transformer models.
The presented training scripts are only slightly modified from the original [examples by Huggingface](https://huggingface.co/transformers/examples.html).
This section describes some examples of training adapter methods for different scenarios. We focus on integrating adapter methods into existing training scripts for Transformer models.
All presented scripts are only slightly modified from the original [examples from HuggingFace Transformers](https://huggingface.co/transformers/examples.html).
To run the scripts, make sure you have the latest version of the repository and have installed some additional requirements:

```
git clone https://github.com/adapter-hub/adapter-transformers
cd transformers
cd adapter-transformers
pip install .
pip install -r ./examples/<your_examples_folder>/requirements.txt
pip install -r ./examples/pytorch/<your_examples_folder>/requirements.txt
```

## Train a Task Adapter

Training a task adapter module on a dataset only requires minor modifications from training the full model.
Suppose we have an existing script for training a Transformer model, here we will use HuggingFace's [run_glue.py](https://github.com/Adapter-Hub/adapter-transformers/blob/master/examples/text-classification/run_glue.py) example script for training on the GLUE dataset.
Suppose we have an existing script for training a Transformer model.
In the following, we will use HuggingFace's [run_glue.py](https://github.com/Adapter-Hub/adapter-transformers/blob/master/examples/pytorch/text-classification/run_glue.py) example script for training on the GLUE benchmark.
We go through all required changes step by step:

In our example, we replaced the built-in `AutoModelForSequenceClassification` class with the `AutoAdapterModel` class introduced by `adapter-transformers` (learn more about prediction heads [here](prediction_heads.md)).
### Step A - Parse `AdapterArguments`

The [`AdapterArguments`](transformers.adapters.training.AdapterArguments) class integrated into adapter-transformers provides a set of command-line options useful for training adapters.
These include options such as `--train_adapter` for activating adapter training and `--load_adapter` for loading adapters from checkpoints.
Thus, the first step of integrating adapters is to add these arguments to the line where `HfArgumentParser` is instantiated:

```python
parser = HfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments, AdapterArguments))
# ...
model_args, data_args, training_args, adapter_args = parser.parse_args_into_dataclasses()
```

### Step B - Switch model class (optional)

In our example, we replace the built-in `AutoModelForSequenceClassification` class with the `AutoAdapterModel` class introduced by `adapter-transformers`.
Therefore, the model instantiation changed to:

```python
Expand All @@ -27,22 +43,25 @@ model = AutoAdapterModel.from_pretrained(
model.add_classification_head(data_args.task_name, num_labels=num_labels)
```

Compared to fine-tuning the full model, there is only one significant adaptation we have to make: adding a new adapter module and activating it.
Note that this change is entirely optional and training will also work with the original model class.
Learn more about the benefits of AdapterModel classes [here](prediction_heads.md)

### Step C - Setup adapter methods

```{eval-rst}
.. tip::
In the following, we show how to setup adapters manually. In most cases, you can use the built-in ``setup_adapter_training()`` method to perform this job automatically. Just add a statement similar to this anywhere between model instantiation and training start in your script: ``setup_adapter_training(model, adapter_args, task_name)``
```

Compared to fine-tuning the full model, there is only this one significant adaptation we have to make: adding an adapter setup and activating it.

```python
# task adapter - only add if not existing
if task_name not in model.config.adapters:
# resolve the adapter config
adapter_config = AdapterConfig.load(
adapter_args.adapter_config,
non_linearity=adapter_args.adapter_non_linearity,
reduction_factor=adapter_args.adapter_reduction_factor,
)
adapter_config = AdapterConfig.load(adapter_args.adapter_config)
# add a new adapter
model.add_adapter(
task_name,
config=adapter_config
)
model.add_adapter(task_name, config=adapter_config)
# Enable adapter training
model.train_adapter(task_name)
```
Expand All @@ -63,10 +82,20 @@ on complex setups checkout the [Composition Blocks](https://docs.adapterhub.ml/a
model.set_active_adapters(task_name)
```

### Step D - Switch to `AdapterTrainer` class

Finally, we switch the `Trainer` class built into Transformers for adapter-transformers' [`AdapterTrainer`](transformers.adapters.AdapterTrainer) class that is optimized for training adapter methods.
See [below for more information](#adaptertrainer).

Technically, this change is not required as no changes to the training loop are required for training adapters.
However, `AdapterTrainer` e.g. provides better support for checkpointing and reloading adapter weights.

### Step E - Start training

The rest of the training procedure does not require any further changes in code.

You can find the full version of the modified training script for GLUE at [run_glue.py](https://github.com/Adapter-Hub/adapter-transformers/blob/master/examples/text-classification/run_glue.py) in the `examples` folder of our repository.
We also adapted [various other example scripts](https://github.com/Adapter-Hub/adapter-transformers/tree/master/examples) (e.g. `run_glue.py`, `run_multiple_choice.py`, `run_squad.py`, ...) to support adapter training.
You can find the full version of the modified training script for GLUE at [run_glue.py](https://github.com/Adapter-Hub/adapter-transformers/blob/master/examples/pytorch/text-classification/run_glue.py) in the `examples` folder of our repository.
We also adapted [various other example scripts](https://github.com/Adapter-Hub/adapter-transformers/tree/master/examples/pytorch) (e.g. `run_glue.py`, `run_multiple_choice.py`, `run_squad.py`, ...) to support adapter training.

To start adapter training on a GLUE task, you can run something similar to:

Expand Down Expand Up @@ -103,7 +132,7 @@ The important flag here is `--train_adapter` which switches from fine-tuning the
## Train a Language Adapter

Training a language adapter is equally straightforward as training a task adapter. Similarly to the steps for task adapters
described above, we add a language adapter module to an existing model training script. Here, we modified HuggingFace's [run_mlm.py](https://github.com/Adapter-Hub/adapter-transformers/blob/v2/examples/language-modeling/run_mlm.py) script for masked language modeling with BERT-based models.
described above, we add a language adapter module to an existing model training script. Here, we modified HuggingFace's [run_mlm.py](https://github.com/Adapter-Hub/adapter-transformers/blob/master/examples/pytorch/language-modeling/run_mlm.py) script for masked language modeling with BERT-based models.

Training a language adapter on BERT using this script may look like the following:

Expand All @@ -126,7 +155,7 @@ python run_mlm.py \

## Train AdapterFusion

We provide an example for training _AdapterFusion_ ([Pfeiffer et al., 2020](https://arxiv.org/pdf/2005.00247)) on the GLUE dataset: [run_fusion_glue.py](https://github.com/Adapter-Hub/adapter-transformers/blob/master/examples/adapterfusion/run_fusion_glue.py).
We provide an example for training _AdapterFusion_ ([Pfeiffer et al., 2020](https://arxiv.org/pdf/2005.00247)) on the GLUE dataset: [run_fusion_glue.py](https://github.com/Adapter-Hub/adapter-transformers/blob/master/examples/pytorch/adapterfusion/run_fusion_glue.py).
You can adapt this script to train AdapterFusion with different pre-trained adapters on your own dataset.

```{eval-rst}
Expand Down Expand Up @@ -158,7 +187,8 @@ python run_fusion_glue.py \


## AdapterTrainer
Similar to the `Trainer` class provided by huggingface, adapter-transformers provides an `AdapterTrainer` class. This class is only

Similar to the `Trainer` class provided by HuggingFace, adapter-transformers provides an `AdapterTrainer` class. This class is only
intended for training adapters. The `Trainer` class should still be used to fully fine-tune models. To train adapters with the `AdapterTrainer`
class, simply initialize it the same way you would initialize the `Trainer` class e.g.:

Expand Down
124 changes: 36 additions & 88 deletions examples/pytorch/dependency-parsing/run_udp.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,15 +13,8 @@

import transformers.adapters.composition as ac
from preprocessing import preprocess_dataset
from transformers import (
AdapterConfig,
AutoAdapterModel,
AutoConfig,
AutoTokenizer,
HfArgumentParser,
MultiLingAdapterArguments,
set_seed,
)
from transformers import AutoConfig, AutoTokenizer, HfArgumentParser, set_seed
from transformers.adapters import AdapterArguments, AdapterConfigBase, AutoAdapterModel, setup_adapter_training
from utils_udp import UD_HEAD_LABELS, DependencyParsingAdapterTrainer, DependencyParsingTrainer, UDTrainingArguments


Expand Down Expand Up @@ -94,7 +87,7 @@ def main():
# See all possible arguments in src/transformers/training_args.py
# or by passing the --help flag to this script.
# We now keep distinct sets of args, for a cleaner separation of concerns.
parser = HfArgumentParser((ModelArguments, DataTrainingArguments, UDTrainingArguments, MultiLingAdapterArguments))
parser = HfArgumentParser((ModelArguments, DataTrainingArguments, UDTrainingArguments, AdapterArguments))
if len(sys.argv) == 2 and sys.argv[1].endswith(".json"):
# If we pass only one argument to the script and it's the path to a json file,
# let's parse it to get our arguments.
Expand Down Expand Up @@ -170,7 +163,6 @@ def main():

# The task name (with prefix)
task_name = "ud_" + data_args.task_name
language = adapter_args.language

model = AutoAdapterModel.from_pretrained(
model_args.model_name_or_path,
Expand All @@ -183,65 +175,6 @@ def main():
id2label=label_map,
)

if model_args.leave_out_twelvth:
logger.info("Leaving out 12")
leave_out = [11]
else:
leave_out = []

# Setup adapters
if adapter_args.train_adapter:
# check if adapter already exists, otherwise add it
if task_name not in model.config.adapters:
# resolve the adapter config
adapter_config = AdapterConfig.load(
adapter_args.adapter_config,
non_linearity=adapter_args.adapter_non_linearity,
reduction_factor=adapter_args.adapter_reduction_factor,
leave_out=leave_out,
)
# load a pre-trained from Hub if specified
if adapter_args.load_adapter:
model.load_adapter(
adapter_args.load_adapter,
config=adapter_config,
load_as=task_name,
leave_out=leave_out,
)
# otherwise, add a fresh adapter
else:
model.add_adapter(task_name, config=adapter_config)
# optionally load a pre-trained language adapter
if adapter_args.load_lang_adapter:
# resolve the language adapter config
lang_adapter_config = AdapterConfig.load(
adapter_args.lang_adapter_config,
non_linearity=adapter_args.lang_adapter_non_linearity,
reduction_factor=adapter_args.lang_adapter_reduction_factor,
leave_out=leave_out,
)
# load the language adapter from Hub
lang_adapter_name = model.load_adapter(
adapter_args.load_lang_adapter,
config=lang_adapter_config,
load_as=adapter_args.language,
leave_out=leave_out,
)
else:
lang_adapter_name = None
# Freeze all model weights except of those of this adapter
model.train_adapter([task_name])
# Set the adapters to be used in every forward pass
if lang_adapter_name:
model.set_active_adapters(ac.Stack(lang_adapter_name, task_name))
else:
model.set_active_adapters(task_name)
else:
if adapter_args.load_adapter or adapter_args.load_lang_adapter:
raise ValueError(
"Adapters can only be loaded in adapters training mode.Use --train_adapter to enable adapter training"
)

# Load and preprocess dataset
if data_args.use_mock_data:
from datasets import Version, load_dataset_builder
Expand All @@ -255,6 +188,21 @@ def main():
dataset = load_dataset("universal_dependencies", data_args.task_name)
dataset = preprocess_dataset(dataset, tokenizer, labels, data_args, pad_token_id=-1)

# Setup adapters
if model_args.leave_out_twelvth:
logger.info("Leaving out 12")
adapter_config_kwargs = {"leave_out": [11]}
adapter_load_kwargs = {"leave_out": [11]}
else:
adapter_config_kwargs = {}
adapter_load_kwargs = {}
adapter_name, lang_adapter_name = setup_adapter_training(
model,
adapter_args,
task_name,
adapter_config_kwargs=adapter_config_kwargs,
adapter_load_kwargs=adapter_load_kwargs,
)
# Initialize our Trainer
# HACK: Set this attribute to False to prevent label columns from being deleted
training_args.remove_unused_columns = False
Expand Down Expand Up @@ -300,30 +248,30 @@ def main():
logger.info("Loading best model for predictions.")

if adapter_args.train_adapter:
if language:
lang_adapter_config = AdapterConfig.load(
config="pfeiffer", non_linearity="gelu", reduction_factor=2, leave_out=leave_out
)
model.load_adapter(
os.path.join(training_args.output_dir, "best_model", language)
if training_args.do_train
else adapter_args.load_lang_adapter,
config=lang_adapter_config,
load_as=language,
leave_out=leave_out,
)
task_adapter_config = AdapterConfig.load(
config="pfeiffer", non_linearity="gelu", reduction_factor=16, leave_out=leave_out
)
adapter_config = AdapterConfigBase.load(adapter_args.adapter_config, **adapter_config_kwargs)
model.load_adapter(
os.path.join(training_args.output_dir, "best_model", task_name)
if training_args.do_train
else adapter_args.load_adapter,
config=task_adapter_config,
config=adapter_config,
load_as=task_name,
leave_out=leave_out,
**adapter_load_kwargs,
)
if language:
if adapter_args.load_lang_adapter:
lang_adapter_config = AdapterConfigBase.load(
adapter_args.lang_adapter_config, **adapter_config_kwargs
)
lang_adapter_name = model.load_adapter(
os.path.join(training_args.output_dir, "best_model", lang_adapter_name)
if training_args.do_train
else adapter_args.load_lang_adapter,
config=lang_adapter_config,
load_as=lang_adapter_name,
**adapter_load_kwargs,
)
else:
lang_adapter_name = None
if lang_adapter_name:
model.set_active_adapters(ac.Stack(lang_adapter_name, task_name))
else:
model.set_active_adapters(task_name)
Expand Down
Loading