Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for Electra #400

Closed
wants to merge 4 commits into from

Conversation

amitkumarj441
Copy link

@amitkumarj441 amitkumarj441 commented Aug 5, 2022

Copy link
Member

@calpt calpt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @amitkumarj441, thanks a lot for working on this! While there are still a few tests failing currently, I did a partial review of your changes and left some comments. All in all, it looks very good.

Besides fixing the missing tests, please have look at our contribution guide for the required documentation steps for a new model. Let me know if anything is unclear or you need any assistance from our side!

self.add_prediction_head(head, overwrite_ok=overwrite_ok)


class ElectraModelWithHeads(ElectraAdapterModel):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model classes of the form XModelWithHeads are deprecated, so we don't want to add those classes for newly supported architectures. ElectraAdapterModel should be used for all cases. Please remove this class.

@@ -401,6 +401,16 @@
},
"layers": {"classifier"},
},
# Electra
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Electra also provides other task-specific model classes (e.g. ElectraForTokenClassification) in its modeling file, it would be great to also have conversations for those here.

from ..model_mixin import InvertibleAdaptersMixin, ModelAdaptersMixin


# For backwards compatibility, ElectraSelfOutput inherits directly from AdapterLayer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# For backwards compatibility, ElectraSelfOutput inherits directly from AdapterLayer

super().__init__("mh_adapter", None)


# For backwards compatibility, ElectraOutput inherits directly from AdapterLayer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# For backwards compatibility, ElectraOutput inherits directly from AdapterLayer

@amitkumarj441
Copy link
Author

Hey @amitkumarj441, thanks a lot for working on this! While there are still a few tests failing currently, I did a partial review of your changes and left some comments. All in all, it looks very good.

Besides fixing the missing tests, please have look at our contribution guide for the required documentation steps for a new model. Let me know if anything is unclear or you need any assistance from our side!

Thanks @calpt for reviewing this PR. I will make changes as suggested soon.

@pauli31
Copy link

pauli31 commented Oct 10, 2022

Any update soon?

@calpt calpt linked an issue Oct 10, 2022 that may be closed by this pull request
self.value = nn.Linear(config.hidden_size, self.all_head_size)
self.query = LoRALinear(config.hidden_size, self.all_head_size, "selfattn", config)
self.key = LoRALinear(config.hidden_size, self.all_head_size, "selfattn", config)
self.value = LoRALinear(config.hidden_size, self.all_head_size, "selfattn", config)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In BERT there is the attn_key param

self.query = LoRALinear(config.hidden_size, self.all_head_size, "selfattn", config, attn_key="q")
self.key = LoRALinear(config.hidden_size, self.all_head_size, "selfattn", config, attn_key="k")
self.value = LoRALinear(config.hidden_size, self.all_head_size, "selfattn", config, attn_key="v")

@@ -295,6 +306,7 @@ def forward(
# if encoder bi-directional self-attention `past_key_value` is always `None`
past_key_value = (key_layer, value_layer)

key_layer, value_layer, attention_mask = self.prefix_tuning(key_layer, value_layer, attention_mask)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

key_layer, value_layer, attention_mask = self.prefix_tuning(key_layer, value_layer, hidden_states, attention_mask)
Missing hidden_states param maybe?

@hSterz
Copy link
Member

hSterz commented Sep 5, 2023

Hey, thanks for your work on this. We have been working on developing a new adapters version of the library which is decoupled from the transformers library (see #584 for details). We want to add Electra support to the adapters library and started implementing the support based on this PR in #583.

@calpt
Copy link
Member

calpt commented Sep 5, 2023

Closing in favor of #583.

@calpt calpt closed this Sep 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for ElectraModel
4 participants