Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add adapter support for X-MOD model #581

Merged
merged 4 commits into from
Sep 5, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions docs/classes/models/xmod.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
X-MOD
=====

.. important::
The X-MOD implementation integrated into Transformers already supports adapters.
To make this implementation compatible with Adapters, a few changes were necessary:

- Pre-trained X-MOD checkpoints require conversion before they can be used with Adapters. We provide pre-converted checkpoints for the following models:
- ``facebook/xmod-base`` -> ``AdapterHub/xmod-base`` with languages adapters split into separate repos (e.g. ``AdapterHub/xmod-base-af_ZA``)
- In Adapters, the X-MOD classes rely on the usual adapter methods instead of the custom methods introduced in Transformers, i.e.:
- ``set_active_adapters()`` instead of ``set_default_language()``.
- ``AdapterSetup`` context instead of ``lang_ids`` parameter.

The abstract from the paper is the following:

*Multilingual pre-trained models are known to suffer from the curse of multilinguality, which causes per-language performance to drop as they cover more languages. We address this issue by introducing language-specific modules, which allows us to grow the total capacity of the model, while keeping the total number of trainable parameters per language constant. In contrast with prior work that learns language-specific components post-hoc, we pre-train the modules of our Cross-lingual Modular (X-MOD) models from the start. Our experiments on natural language inference, named entity recognition and question answering show that our approach not only mitigates the negative interference between languages, but also enables positive transfer, resulting in improved monolingual and cross-lingual performance. Furthermore, our approach enables adding languages post-hoc with no measurable drop in performance, no longer limiting the model usage to the set of pre-trained languages.*

XmodAdapterModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: adapters.XmodAdapterModel
:members:
:inherited-members: XmodPreTrainedModel
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ Currently, we support the PyTorch versions of all models as listed on the `Model
classes/models/t5
classes/models/vit
classes/models/xlmroberta
classes/models/xmod

.. toctree::
:maxdepth: 2
Expand Down
1 change: 1 addition & 0 deletions docs/model_overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ The table below further shows which model architectures support which adaptation
| [T5](classes/models/t5.html) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| [ViT](classes/models/vit.html) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| [XLM-RoBERTa](classes/models/xlmroberta.html) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| [X-MOD](classes/models/xmod.html) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |

(*) If the used encoder and decoder model class are supported.

Expand Down
2 changes: 2 additions & 0 deletions src/adapters/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,7 @@
"models.t5": ["T5AdapterModel"],
"models.vit": ["ViTAdapterModel"],
"models.xlm_roberta": ["XLMRobertaAdapterModel"],
"models.xmod": ["XmodAdapterModel"],
"trainer": ["AdapterTrainer", "Seq2SeqAdapterTrainer"],
"training": [
"AdapterArguments",
Expand Down Expand Up @@ -206,6 +207,7 @@
from .models.t5 import T5AdapterModel
from .models.vit import ViTAdapterModel
from .models.xlm_roberta import XLMRobertaAdapterModel
from .models.xmod import XmodAdapterModel
from .trainer import AdapterTrainer, Seq2SeqAdapterTrainer
from .training import AdapterArguments, setup_adapter_training
from .utils import (
Expand Down
1 change: 1 addition & 0 deletions src/adapters/composition.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,7 @@ def __init__(
"xlm-roberta",
"bert-generation",
"llama",
"xmod",
],
}

Expand Down
13 changes: 7 additions & 6 deletions src/adapters/configuration/adapter_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -162,9 +162,10 @@ class BnConfig(AdapterConfigBase):
use_gating (:obj:`bool`, optional):
Place a trainable gating module besides the added parameter module to control module activation. This is
e.g. used for UniPELT. Defaults to False.
residual_before_ln (:obj:`bool`, optional):
If True, take the residual connection around the adapter bottleneck before the layer normalization. Only
applicable if :obj:`original_ln_before` is True.
residual_before_ln (:obj:`bool` or :obj:`str`, optional):
If True, take the residual connection around the adapter bottleneck before the layer normalization. If set
to "post_add", take the residual connection around the adapter bottleneck after the previous residual
connection. Only applicable if :obj:`original_ln_before` is True.
adapter_residual_before_ln (:obj:`bool`, optional):
If True, apply the residual connection around the adapter modules before the new layer normalization within
the adapter. Only applicable if :obj:`ln_after` is True and :obj:`is_parallel` is False.
Expand Down Expand Up @@ -225,7 +226,7 @@ class BnConfig(AdapterConfigBase):
is_parallel: bool = False
scaling: Union[float, str] = 1.0
use_gating: bool = False
residual_before_ln: bool = True
residual_before_ln: Union[bool, str] = True
adapter_residual_before_ln: bool = False
inv_adapter: Optional[str] = None
inv_adapter_reduction_factor: Optional[float] = None
Expand Down Expand Up @@ -267,7 +268,7 @@ class SeqBnConfig(BnConfig):

original_ln_before: bool = True
original_ln_after: bool = True
residual_before_ln: bool = True
residual_before_ln: Union[bool, str] = True
adapter_residual_before_ln: bool = False
ln_before: bool = False
ln_after: bool = False
Expand Down Expand Up @@ -306,7 +307,7 @@ class DoubleSeqBnConfig(BnConfig):

original_ln_before: bool = False
original_ln_after: bool = True
residual_before_ln: bool = True
residual_before_ln: Union[bool, str] = True
adapter_residual_before_ln: bool = False
ln_before: bool = False
ln_after: bool = False
Expand Down
55 changes: 55 additions & 0 deletions src/adapters/head_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -256,6 +256,61 @@
},
"layers": ["lm_head.dense", None, "lm_head.layer_norm", "lm_head.decoder"],
},
# Xmod
"XmodForSequenceClassification": {
"config": {
"head_type": "classification",
"layers": 2,
"activation_function": "tanh",
"use_pooler": False,
},
"layers": [None, "classifier.dense", None, None, "classifier.out_proj"],
},
"XmodForMultipleChoice": {
"config": {
"head_type": "multiple_choice",
"layers": 1,
"activation_function": None,
"use_pooler": True,
},
"layers": [None, "classifier"],
},
"XmodForTokenClassification": {
"config": {
"head_type": "tagging",
"layers": 1,
"activation_function": None,
},
"layers": [None, "classifier"],
},
"XmodForQuestionAnswering": {
"config": {
"head_type": "question_answering",
"layers": 1,
"activation_function": None,
},
"layers": [None, "qa_outputs"],
},
"XmodForMaskedLM": {
"config": {
"head_type": "masked_lm",
"layers": 2,
"activation_function": "gelu",
"layer_norm": True,
"bias": True,
},
"layers": ["lm_head.dense", None, "lm_head.layer_norm", "lm_head.decoder"],
},
"XmodForCausalLM": {
"config": {
"head_type": "causal_lm",
"layers": 2,
"activation_function": "gelu",
"layer_norm": True,
"bias": True,
},
"layers": ["lm_head.dense", None, "lm_head.layer_norm", "lm_head.decoder"],
},
# BART
"BartForSequenceClassification": {
"config": {
Expand Down
8 changes: 7 additions & 1 deletion src/adapters/layer.py
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,13 @@ def enable_adapters(self, adapter_setup: AdapterCompositionBlock, unfreeze_adapt
for param in self.adapter_fusion_layer[sub_setup.name].parameters():
param.requires_grad = True

def get_adapter(self, adapter_name):
def freeze_adapter(self, adapter_name: str, freeze: bool = True):
if adapter_name in self.adapters:
self.adapters[adapter_name].train(not freeze)
for param in self.adapters[adapter_name].parameters():
param.requires_grad = not freeze

def get_adapter(self, adapter_name: str):
if adapter_name in self.adapters:
return self.adapters[adapter_name]
else:
Expand Down
6 changes: 6 additions & 0 deletions src/adapters/lora.py
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,12 @@ def enable_adapters(self, adapter_setup: AdapterCompositionBlock, unfreeze_adapt
for param in self.loras[name].parameters():
param.requires_grad = True

def freeze_adapter(self, adapter_name: str, freeze: bool = True):
if adapter_name in self.loras:
self.loras[adapter_name].train(not freeze)
for param in self.loras[adapter_name].parameters():
param.requires_grad = not freeze

def get_adapter(self, adapter_name: str) -> nn.Module:
if adapter_name in self.loras:
return self.loras[adapter_name]
Expand Down
7 changes: 5 additions & 2 deletions src/adapters/modeling.py
Original file line number Diff line number Diff line change
Expand Up @@ -145,15 +145,18 @@ def pre_forward(
"""
query = None

if self.residual_before_ln:
if self.residual_before_ln is True:
residual = hidden_states

if fusion_config is not None and fusion_config["query_before_ln"]:
query = hidden_states

if self.original_ln_before:
if layer_norm:
hidden_states = layer_norm(hidden_states + input_tensor)
hidden_states = hidden_states + input_tensor
if self.residual_before_ln == "post_add":
residual = hidden_states
hidden_states = layer_norm(hidden_states)
else:
hidden_states = hidden_states + input_tensor

Expand Down
3 changes: 3 additions & 0 deletions src/adapters/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
from .llama.mixin_llama import LlamaModelAdapterMixin
from .t5.mixin_t5 import T5BlockAdaptersMixin, T5ModelAdaptersMixin, T5ModelAdaptersWithHeadsMixin
from .vit.mixin_vit import ViTIntermediateAdaptersMixin, ViTModelAdaptersMixin
from .xmod.mixin_xmod import XmodModelAdaptersMixin


# IMPORTANT: Only add classes to this mapping that are not copied into the adapters package
Expand Down Expand Up @@ -58,6 +59,8 @@
"ViTModel": ViTModelAdaptersMixin,
"XLMRobertaLayer": BertLayerAdaptersMixin,
"XLMRobertaModel": BertModelAdaptersMixin,
"XmodLayer": BertLayerAdaptersMixin,
"XmodModel": XmodModelAdaptersMixin,
"DebertaModel": BertModelAdaptersMixin,
"DebertaLayer": BertLayerAdaptersMixin,
"DebertaV2Model": BertModelAdaptersMixin,
Expand Down
1 change: 1 addition & 0 deletions src/adapters/models/auto/adapter_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
("t5", "T5AdapterModel"),
("vit", "ViTAdapterModel"),
("xlm-roberta", "XLMRobertaAdapterModel"),
("xmod", "XmodAdapterModel"),
]
)

Expand Down
39 changes: 39 additions & 0 deletions src/adapters/models/xmod/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# flake8: noqa
# There's no way to ignore "F401 '...' imported but unused" warnings in this
# module, but to preserve other warnings. So, don't check this module at all.

# Copyright 2023 The Adapter-Hub Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from typing import TYPE_CHECKING

from transformers.utils import _LazyModule


_import_structure = {
"adapter_model": ["XmodAdapterModel"],
}


if TYPE_CHECKING:
from .adapter_model import XmodAdapterModel

else:
import sys

sys.modules[__name__] = _LazyModule(
__name__,
globals()["__file__"],
_import_structure,
)
Loading
Loading