You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using distributed or parallel set-up in script?: Yes
Information
Model I am using (Bert, XLNet ...): EncoderDecoderModel
Language I am using the model on (English, Chinese ...): English
Adapter setup I am using (if any): AdapterConfig
The problem arises when using:
the official example scripts: (give details below)
my own modified scripts: (give details below)
The tasks I am working on is:
an official GLUE/SQUaD task: (give the name)
my own task or dataset: (give details below)
To reproduce
from transformers import EncoderDecoderModel, AdapterConfig
model = EncoderDecoderModel.from_encoder_decoder_pretrained("bert-base-uncased", "bert-base-uncased")
When not leaving out layers, it's okay.
### no leave_out
adapter_config = AdapterConfig(mh_adapter=True, output_adapter=True, reduction_factor=4, non_linearity="gelu")
model.add_adapter("en", adapter_config)
model.add_adapter("de", adapter_config)
print(model.adapter_summary())
#### print result
================================================================================
Name Architecture #Param %Param Active Train
--------------------------------------------------------------------------------
en bottleneck 7,100,928 2.871 0 1
de bottleneck 7,100,928 2.871 0 1
--------------------------------------------------------------------------------
Full model 247,363,386 100.000 1
================================================================================
When trying to leave out all encoder layers, not any adapter is added.
### leave_out first 12 layers of encoder
adapter_config = AdapterConfig(mh_adapter=True, output_adapter=True, reduction_factor=4, non_linearity="gelu", leave_out=list(range(12)))
model.add_adapter("en", adapter_config, overwrite_ok=True)
model.add_adapter("de", adapter_config, overwrite_ok=True)
print(model.adapter_summary())
##### print result
================================================================================
Name Architecture #Param %Param Active Train
--------------------------------------------------------------------------------
en bottleneck 0 0.000 0 1
de bottleneck 0 0.000 0 1
--------------------------------------------------------------------------------
Full model 247,363,386 100.000 1
================================================================================
When only leaving out the first 6 layers of the encoder, we see adapters are only added to the encoder, leaving the decoder.
Also, it seems EncoderDecoderModelAdaptersMixin.iter_layers should count decoder layer_id starting with len(self.encoder.layers)) like this?
def iter_layers(self) -> Iterable[Tuple[int, nn.Module]]:
for i, layer in self.encoder.iter_layers():
yield i, layer
encoder_layer_n = len(self.encoder.encoder.layer)
for i, layer in self.decoder.iter_layers():
yield i + encoder_layer_n, layer
@hSterz My current workaround is setting model.decoder.base_model.config.adapters = model.encoder.base_model.config.adapters and changing the iter_layer. It seems to work fine.
This issue has been automatically marked as stale because it has been without activity for 90 days. This issue will be closed in 14 days unless you comment or remove the stale label.
Environment info
adapter-transformers
version: 3.1.0Information
Model I am using (Bert, XLNet ...): EncoderDecoderModel
Language I am using the model on (English, Chinese ...): English
Adapter setup I am using (if any): AdapterConfig
The problem arises when using:
The tasks I am working on is:
To reproduce
When not leaving out layers, it's okay.
When trying to leave out all encoder layers, not any adapter is added.
When only leaving out the first 6 layers of the encoder, we see adapters are only added to the encoder, leaving the decoder.
Expected behavior
The EncoderDecoderModel class work like BART-like models.
The text was updated successfully, but these errors were encountered: