Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AdapterConfig's leave_out not work well in EncoderDecoderModel #472

Open
2 of 4 tasks
ZeguanXiao opened this issue Jan 11, 2023 · 4 comments
Open
2 of 4 tasks

AdapterConfig's leave_out not work well in EncoderDecoderModel #472

ZeguanXiao opened this issue Jan 11, 2023 · 4 comments
Assignees
Labels
bug:encoder-decoder bug Something isn't working

Comments

@ZeguanXiao
Copy link

ZeguanXiao commented Jan 11, 2023

Environment info

  • adapter-transformers version: 3.1.0
  • Platform: Ubuntu 18.04 (Linux-5.4.0-87-generic-x86_64-with-glibc2.27)
  • Python version: Python 3.9.13
  • PyTorch version (GPU?): 1.13.1 (GPU)
  • Tensorflow version (GPU?): False
  • Using GPU in script?: True
  • Using distributed or parallel set-up in script?: Yes

Information

Model I am using (Bert, XLNet ...): EncoderDecoderModel

Language I am using the model on (English, Chinese ...): English

Adapter setup I am using (if any): AdapterConfig

The problem arises when using:

  • the official example scripts: (give details below)
  • my own modified scripts: (give details below)

The tasks I am working on is:

  • an official GLUE/SQUaD task: (give the name)
  • my own task or dataset: (give details below)

To reproduce

from transformers import EncoderDecoderModel, AdapterConfig
model = EncoderDecoderModel.from_encoder_decoder_pretrained("bert-base-uncased", "bert-base-uncased")

When not leaving out layers, it's okay.

### no leave_out
adapter_config = AdapterConfig(mh_adapter=True, output_adapter=True, reduction_factor=4, non_linearity="gelu")
model.add_adapter("en", adapter_config)
model.add_adapter("de", adapter_config)
print(model.adapter_summary())

#### print result
================================================================================
Name                     Architecture         #Param      %Param  Active   Train
--------------------------------------------------------------------------------
en                       bottleneck        7,100,928       2.871       0       1
de                       bottleneck        7,100,928       2.871       0       1
--------------------------------------------------------------------------------
Full model                               247,363,386     100.000               1
================================================================================

When trying to leave out all encoder layers, not any adapter is added.

### leave_out first 12 layers of encoder
adapter_config = AdapterConfig(mh_adapter=True, output_adapter=True, reduction_factor=4, non_linearity="gelu", leave_out=list(range(12)))
model.add_adapter("en", adapter_config, overwrite_ok=True)
model.add_adapter("de", adapter_config, overwrite_ok=True)
print(model.adapter_summary())
##### print result
================================================================================
Name                     Architecture         #Param      %Param  Active   Train
--------------------------------------------------------------------------------
en                       bottleneck                0       0.000       0       1
de                       bottleneck                0       0.000       0       1
--------------------------------------------------------------------------------
Full model                               247,363,386     100.000               1
================================================================================

When only leaving out the first 6 layers of the encoder, we see adapters are only added to the encoder, leaving the decoder.

### leave_out first 6 layers of encoder
adapter_config = AdapterConfig(mh_adapter=True, output_adapter=True, reduction_factor=4, non_linearity="gelu", leave_out=list(range(6)))
model.add_adapter("en", adapter_config, overwrite_ok=True)
model.add_adapter("de", adapter_config, overwrite_ok=True)
print(model.adapter_summary())

##### print result
================================================================================
Name                     Architecture         #Param      %Param  Active   Train
--------------------------------------------------------------------------------
en                       bottleneck        3,550,464       1.435       0       1
de                       bottleneck        3,550,464       1.435       0       1
--------------------------------------------------------------------------------
Full model                               247,363,386     100.000               1
================================================================================

#### check parameter
##### print result
print([name for name, p in model.named_parameters() if "adapter" in name])
['encoder.encoder.layer.6.attention.output.adapters.en.adapter_down.0.weight',
 'encoder.encoder.layer.6.attention.output.adapters.en.adapter_down.0.bias',
 'encoder.encoder.layer.6.attention.output.adapters.en.adapter_up.weight',
 'encoder.encoder.layer.6.attention.output.adapters.en.adapter_up.bias',
 'encoder.encoder.layer.6.attention.output.adapters.de.adapter_down.0.weight',
 'encoder.encoder.layer.6.attention.output.adapters.de.adapter_down.0.bias',
 'encoder.encoder.layer.6.attention.output.adapters.de.adapter_up.weight',
 'encoder.encoder.layer.6.attention.output.adapters.de.adapter_up.bias',
 'encoder.encoder.layer.6.output.adapters.en.adapter_down.0.weight',
 'encoder.encoder.layer.6.output.adapters.en.adapter_down.0.bias',
 'encoder.encoder.layer.6.output.adapters.en.adapter_up.weight',
 'encoder.encoder.layer.6.output.adapters.en.adapter_up.bias',
 'encoder.encoder.layer.6.output.adapters.de.adapter_down.0.weight',
 'encoder.encoder.layer.6.output.adapters.de.adapter_down.0.bias',
 'encoder.encoder.layer.6.output.adapters.de.adapter_up.weight',
 'encoder.encoder.layer.6.output.adapters.de.adapter_up.bias',
 'encoder.encoder.layer.7.attention.output.adapters.en.adapter_down.0.weight',
 'encoder.encoder.layer.7.attention.output.adapters.en.adapter_down.0.bias',
 'encoder.encoder.layer.7.attention.output.adapters.en.adapter_up.weight',
 'encoder.encoder.layer.7.attention.output.adapters.en.adapter_up.bias',
 'encoder.encoder.layer.7.attention.output.adapters.de.adapter_down.0.weight',
 'encoder.encoder.layer.7.attention.output.adapters.de.adapter_down.0.bias',
 'encoder.encoder.layer.7.attention.output.adapters.de.adapter_up.weight',
 'encoder.encoder.layer.7.attention.output.adapters.de.adapter_up.bias',
 'encoder.encoder.layer.7.output.adapters.en.adapter_down.0.weight',
 'encoder.encoder.layer.7.output.adapters.en.adapter_down.0.bias',
 'encoder.encoder.layer.7.output.adapters.en.adapter_up.weight',
 'encoder.encoder.layer.7.output.adapters.en.adapter_up.bias',
 'encoder.encoder.layer.7.output.adapters.de.adapter_down.0.weight',
 'encoder.encoder.layer.7.output.adapters.de.adapter_down.0.bias',
 'encoder.encoder.layer.7.output.adapters.de.adapter_up.weight',
 'encoder.encoder.layer.7.output.adapters.de.adapter_up.bias',
 'encoder.encoder.layer.8.attention.output.adapters.en.adapter_down.0.weight',
 'encoder.encoder.layer.8.attention.output.adapters.en.adapter_down.0.bias',
 'encoder.encoder.layer.8.attention.output.adapters.en.adapter_up.weight',
 'encoder.encoder.layer.8.attention.output.adapters.en.adapter_up.bias',
 'encoder.encoder.layer.8.attention.output.adapters.de.adapter_down.0.weight',
 'encoder.encoder.layer.8.attention.output.adapters.de.adapter_down.0.bias',
 'encoder.encoder.layer.8.attention.output.adapters.de.adapter_up.weight',
 'encoder.encoder.layer.8.attention.output.adapters.de.adapter_up.bias',
 'encoder.encoder.layer.8.output.adapters.en.adapter_down.0.weight',
 'encoder.encoder.layer.8.output.adapters.en.adapter_down.0.bias',
 'encoder.encoder.layer.8.output.adapters.en.adapter_up.weight',
 'encoder.encoder.layer.8.output.adapters.en.adapter_up.bias',
 'encoder.encoder.layer.8.output.adapters.de.adapter_down.0.weight',
 'encoder.encoder.layer.8.output.adapters.de.adapter_down.0.bias',
 'encoder.encoder.layer.8.output.adapters.de.adapter_up.weight',
 'encoder.encoder.layer.8.output.adapters.de.adapter_up.bias',
 'encoder.encoder.layer.9.attention.output.adapters.en.adapter_down.0.weight',
 'encoder.encoder.layer.9.attention.output.adapters.en.adapter_down.0.bias',
 'encoder.encoder.layer.9.attention.output.adapters.en.adapter_up.weight',
 'encoder.encoder.layer.9.attention.output.adapters.en.adapter_up.bias',
 'encoder.encoder.layer.9.attention.output.adapters.de.adapter_down.0.weight',
 'encoder.encoder.layer.9.attention.output.adapters.de.adapter_down.0.bias',
 'encoder.encoder.layer.9.attention.output.adapters.de.adapter_up.weight',
 'encoder.encoder.layer.9.attention.output.adapters.de.adapter_up.bias',
 'encoder.encoder.layer.9.output.adapters.en.adapter_down.0.weight',
 'encoder.encoder.layer.9.output.adapters.en.adapter_down.0.bias',
 'encoder.encoder.layer.9.output.adapters.en.adapter_up.weight',
 'encoder.encoder.layer.9.output.adapters.en.adapter_up.bias',
 'encoder.encoder.layer.9.output.adapters.de.adapter_down.0.weight',
 'encoder.encoder.layer.9.output.adapters.de.adapter_down.0.bias',
 'encoder.encoder.layer.9.output.adapters.de.adapter_up.weight',
 'encoder.encoder.layer.9.output.adapters.de.adapter_up.bias',
 'encoder.encoder.layer.10.attention.output.adapters.en.adapter_down.0.weight',
 'encoder.encoder.layer.10.attention.output.adapters.en.adapter_down.0.bias',
 'encoder.encoder.layer.10.attention.output.adapters.en.adapter_up.weight',
 'encoder.encoder.layer.10.attention.output.adapters.en.adapter_up.bias',
 'encoder.encoder.layer.10.attention.output.adapters.de.adapter_down.0.weight',
 'encoder.encoder.layer.10.attention.output.adapters.de.adapter_down.0.bias',
 'encoder.encoder.layer.10.attention.output.adapters.de.adapter_up.weight',
 'encoder.encoder.layer.10.attention.output.adapters.de.adapter_up.bias',
 'encoder.encoder.layer.10.output.adapters.en.adapter_down.0.weight',
 'encoder.encoder.layer.10.output.adapters.en.adapter_down.0.bias',
 'encoder.encoder.layer.10.output.adapters.en.adapter_up.weight',
 'encoder.encoder.layer.10.output.adapters.en.adapter_up.bias',
 'encoder.encoder.layer.10.output.adapters.de.adapter_down.0.weight',
 'encoder.encoder.layer.10.output.adapters.de.adapter_down.0.bias',
 'encoder.encoder.layer.10.output.adapters.de.adapter_up.weight',
 'encoder.encoder.layer.10.output.adapters.de.adapter_up.bias',
 'encoder.encoder.layer.11.attention.output.adapters.en.adapter_down.0.weight',
 'encoder.encoder.layer.11.attention.output.adapters.en.adapter_down.0.bias',
 'encoder.encoder.layer.11.attention.output.adapters.en.adapter_up.weight',
 'encoder.encoder.layer.11.attention.output.adapters.en.adapter_up.bias',
 'encoder.encoder.layer.11.attention.output.adapters.de.adapter_down.0.weight',
 'encoder.encoder.layer.11.attention.output.adapters.de.adapter_down.0.bias',
 'encoder.encoder.layer.11.attention.output.adapters.de.adapter_up.weight',
 'encoder.encoder.layer.11.attention.output.adapters.de.adapter_up.bias',
 'encoder.encoder.layer.11.output.adapters.en.adapter_down.0.weight',
 'encoder.encoder.layer.11.output.adapters.en.adapter_down.0.bias',
 'encoder.encoder.layer.11.output.adapters.en.adapter_up.weight',
 'encoder.encoder.layer.11.output.adapters.en.adapter_up.bias',
 'encoder.encoder.layer.11.output.adapters.de.adapter_down.0.weight',
 'encoder.encoder.layer.11.output.adapters.de.adapter_down.0.bias',
 'encoder.encoder.layer.11.output.adapters.de.adapter_up.weight',
 'encoder.encoder.layer.11.output.adapters.de.adapter_up.bias']

Expected behavior

The EncoderDecoderModel class work like BART-like models.

@ZeguanXiao ZeguanXiao added the bug Something isn't working label Jan 11, 2023
@ZeguanXiao
Copy link
Author

ZeguanXiao commented Jan 11, 2023

Also, it seems EncoderDecoderModelAdaptersMixin.iter_layers should count decoder layer_id starting with len(self.encoder.layers)) like this?

    def iter_layers(self) -> Iterable[Tuple[int, nn.Module]]:
        for i, layer in self.encoder.iter_layers():
            yield i, layer

        encoder_layer_n = len(self.encoder.encoder.layer)
        for i, layer in self.decoder.iter_layers():
            yield i + encoder_layer_n, layer

@hSterz hSterz self-assigned this Jan 13, 2023
@hSterz
Copy link
Member

hSterz commented Jan 13, 2023

Hey @ZeguanXiao, I see why this is unexpected behavior. Unfortunately it is not as easy as changing the iter_layer indices. I will look into this.

@ZeguanXiao
Copy link
Author

@hSterz My current workaround is setting model.decoder.base_model.config.adapters = model.encoder.base_model.config.adapters and changing the iter_layer. It seems to work fine.

@adapter-hub-bert
Copy link
Member

This issue has been automatically marked as stale because it has been without activity for 90 days. This issue will be closed in 14 days unless you comment or remove the stale label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug:encoder-decoder bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants