Add guide on how to add adapters to models (#121)

* Add guide on how to add adapters to models * Adding section on example adapters. * Add an adapter mixin for model configs.
adapter-hub · Mar 8, 2021 · ed977ac · ed977ac
1 parent 84642ac
commit ed977ac
Show file tree

Hide file tree

Showing 4 changed files with 87 additions and 18 deletions.
diff --git a/adding_adapters_to_a_model.md b/adding_adapters_to_a_model.md
@@ -0,0 +1,63 @@
+# Adding Adapters to a Model
+
+This document gives an overview on how `adapter-transformers` integrates adapter modules into the model architectures of HuggingFace Transformers.
+It can be used as a guide to add adapter support to new model architectures.
+
+Before we start to go into implementation details, first some important design philosophies of `adapter-transformers`:
+
+- _Adapters should integrate seamlessly with existing model classes_: This means (a) if a model architecture supports adapters, it should be possible to use them with all model classes of this architecture and (b) adapters should be entirely opt-in, i.e. the model classes still must work without adapters.
+- _Changes to the original should be minimal_: `adapter-transformers` tries to avoid changes to the original HF code as far as possible. We extensively use Python mixins to achieve this.
+
+Now we go through the integration of adapters into an existing model architecture step by step.
+
+**The following steps might not be applicable to every model architecture.**
+
+## Implementation
+
+❓ Each model architecture with adapter support has a main `adapter_<model_type>.py` module (e.g. `adapter_distilbert.py` for `modeling_distilbert.py`) that provides the required adapter mixins for each modeling component (e.g. there is a `DistilBertTransfomerBlockAdaptersMixin` for the `TransformerBlock` of DistilBERT etc.).
+This is the central module to implement.
+
+**📝 Steps**
+
+- Add a new `adapter_<model_type>.py` module for your architecture (or reuse an existing if possible).
+    - There usually should be one mixin that derives from `BertAdaptersBaseMixin` or has it as a child module.
+    - The mixin for the whole base model class (e.g. `BertModel`) should derive from `ModelAdaptersMixin` and (if possible) `InvertibleAdaptersMixin`.
+    - Have a look at existing examples, e.g. `adapter_distilbert.py`, `adapter_bert.py`.
+- Implement the mixins on the modeling classes (`modeling_<model_type>.py`).
+    - Make sure the calls to `adapters_forward()` are added in the right places.
+- Add the mixin for config classes, `ModelConfigAdaptersMixin`, to the model configuration class in `configuration_<model_type>`.
+    - There are some naming differences on the config attributes of different model architectures. The adapter implementation requires some additional attributes with a specific name to be available. These currently are `hidden_dropout_prob` and `attention_probs_dropout_prob` as in the `BertConfig` class.
+
+❓ Adapter-supporting architectures have a new model class `<model_type>ModelWithHeads`.
+These classes allow flexible adding of and switching between multiple prediction heads of different types.
+
+**📝 Steps**
+
+- In `modeling_<model_type>.py`, add a new `<model_type>ModelWithHeads` class.
+    - This class should implement a mixin (in `adapter_<model_type>.py`) which derives from `ModelWithFlexibleHeadsAdaptersMixin`
+    - In the mixin, add methods for those prediction heads that make sense for the new model architecture.
+- Add `<model_type>ModelWithHeads` to the `MODEL_WITH_HEADS_MAPPING` mapping in `modeling_auto.py` and to `__init__.py`.
+
+## Testing
+
+❓ In addition to the general HuggingFace model tests, there are adapter-specific test cases (usually starting with `test_adapter_`).
+
+**📝 Steps**
+
+- Add the new model architecture to `MODELS_WITH_ADAPTERS` in `test_adapter_common.py` and to `test_adapter_training.py`.
+- Add `<model_type>ModelWithHeads` to `test_modeling_<model_type>.py`.
+- Append `<model_type>` to the list in `check_adapters.py`.
+
+## Documentation
+
+❓ The documentation for `adapter-transformers` lives in the `adapter_docs` folder.
+
+**📝 Steps**
+
+- Add `adapter_docs/classes/<model_type>.rst` (oriented at the doc file in the HF docs, make sure to include `<model_type>ModelWithHeads` and the HF notice, then show the file in the index).
+
+## Training Example Adapters
+
+❓ To make sure the new adapter implementation works properly, it is useful to train some example adapters and compare the training results to full model fine-tuning. Ideally, this would include training adapters on one (or more) tasks that are good for demonstrating the new model architecture (e.g. GLUE benchmark for BERT, summarization for BART) and uploading them to AdapterHub.
+
+HuggingFace already provides example training scripts for many tasks, some of them have already been modified to support adapter training (see https://github.com/Adapter-Hub/adapter-transformers/tree/master/examples).
diff --git a/src/transformers/adapter_model_mixin.py b/src/transformers/adapter_model_mixin.py
@@ -714,6 +714,26 @@ def invertible_adapters_forward(self, hidden_states, rev=False):
         return hidden_states
 
 
+class ModelConfigAdaptersMixin(ABC):
+    """
+    Mixin for model config classes, adding support for adapters.
+
+    Besides adding this mixin to the config class of a model supporting adapters,
+    make sure the following attributes/ properties are present:
+    hidden_dropout_prob, attention_probs_dropout_prob.
+    """
+
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+
+        # adapter configuration
+        adapter_config_dict = kwargs.pop("adapters", None)
+        if adapter_config_dict:
+            self.adapters = ModelAdaptersConfig(**adapter_config_dict)
+        else:
+            self.adapters = ModelAdaptersConfig()
+
+
 class ModelAdaptersMixin(ABC):
     """Mixin for transformer models adding support for loading/ saving adapters."""
 

diff --git a/src/transformers/models/bert/configuration_bert.py b/src/transformers/models/bert/configuration_bert.py
@@ -15,7 +15,7 @@
 # limitations under the License.
 """ BERT model configuration """
 
-from ...adapter_config import ModelAdaptersConfig
+from ...adapter_model_mixin import ModelConfigAdaptersMixin
 from ...configuration_utils import PretrainedConfig
 from ...utils import logging
 
@@ -49,7 +49,7 @@
 }
 
 
-class BertConfig(PretrainedConfig):
+class BertConfig(ModelConfigAdaptersMixin, PretrainedConfig):
     r"""
     This is the configuration class to store the configuration of a :class:`~transformers.BertModel` or a
     :class:`~transformers.TFBertModel`. It is used to instantiate a BERT model according to the specified arguments,
@@ -141,10 +141,3 @@ def __init__(
         self.initializer_range = initializer_range
         self.layer_norm_eps = layer_norm_eps
         self.gradient_checkpointing = gradient_checkpointing
-
-        # adapter configuration
-        adapter_config_dict = kwargs.pop("adapters", None)
-        if adapter_config_dict:
-            self.adapters = ModelAdaptersConfig(**adapter_config_dict)
-        else:
-            self.adapters = ModelAdaptersConfig()
diff --git a/src/transformers/models/distilbert/configuration_distilbert.py b/src/transformers/models/distilbert/configuration_distilbert.py
@@ -15,7 +15,7 @@
 """ DistilBERT model configuration """
 
 
-from ...adapter_config import ModelAdaptersConfig
+from ...adapter_model_mixin import ModelConfigAdaptersMixin
 from ...configuration_utils import PretrainedConfig
 from ...utils import logging
 
@@ -33,7 +33,7 @@
 }
 
 
-class DistilBertConfig(PretrainedConfig):
+class DistilBertConfig(ModelConfigAdaptersMixin, PretrainedConfig):
     r"""
     This is the configuration class to store the configuration of a :class:`~transformers.DistilBertModel` or a
     :class:`~transformers.TFDistilBertModel`. It is used to instantiate a DistilBERT model according to the specified
@@ -126,13 +126,6 @@ def __init__(
         self.qa_dropout = qa_dropout
         self.seq_classif_dropout = seq_classif_dropout
 
-        # adapter configuration
-        adapter_config_dict = kwargs.pop("adapters", None)
-        if adapter_config_dict:
-            self.adapters = ModelAdaptersConfig(**adapter_config_dict)
-        else:
-            self.adapters = ModelAdaptersConfig()
-
     @property
     def hidden_size(self):
         return self.dim