-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added ELECTRA as a thin wrapper around BERT #358
Added ELECTRA as a thin wrapper around BERT #358
Conversation
StringTransformations.regex_sub((r"\.gamma$", ".weight"), backward=None), | ||
StringTransformations.regex_sub((r"\.beta$", ".bias"), backward=None), | ||
# Prefixes. | ||
StringTransformations.remove_prefix("electra.", reversible=False), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is only two notably differences from BERT here, this like and embeddings_project.weight
and embeddings_project.bias
|
||
HF_CONFIG_KEYS: List[Tuple[HFConfigKey, Optional[HFConfigKeyDefault]]] = [ | ||
(CommonHFKeys.ATTENTION_PROBS_DROPOUT_PROB, None), | ||
(CommonHFKeys.EMBEDDING_SIZE, None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to add embedding size here, which led to a conflict with the BERT model. This was what led to the thin wrapper class. It is def. possible to avoid it by implementing an if else logic in _config_from_hf
.
This seemed like a reasonable comprise between not duplicating functionality and avoid coupling, though I could imagine you would want ELECTRA to be a part of BERT or completely independent.
[ | ||
"jonfd/electra-small-nordic", | ||
"Maltehb/aelaectra-danish-electra-small-cased", | ||
"google/electra-small-discriminator", | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked using a variety of models, but I imagine you might want to replace this with a dummy Electra model.
@pytest.mark.skipif(not has_hf_transformers, reason="requires huggingface transformers") | ||
@pytest.mark.parametrize( | ||
"model_name", ["jonfd/electra-small-nordic", "Maltehb/aelaectra-danish-electra-small-cased", "google/electra-small-discriminator"] | ||
) | ||
def test_from_hf_hub_equals_hf_tokenizer(model_name: str, sample_texts): | ||
compare_tokenizer_outputs_with_hf_tokenizer( | ||
sample_texts, model_name, BERTTokenizer | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is simply to show/test that the electra models can use the BERT tokenizer
Awesome, thanks a lot! I hope to have some time over the coming days to review your PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot! Added a small comment.
Co-authored-by: Daniël de Kok <me@github.danieldk.eu>
Hi @danieldk I see that it now causes an error. Sadly I can't see the error on buildkite and when I run it locally I can't reproduce the error. btw. I have also added a related PR over on spacy-curated-transformers (which I plan to finish up once this one is through) |
The BuildKite CI failure appears to be unrelated to the PR. We'll look into getting it fixed. On a related note, I've now enabled the GitHub Actions CI for this PR, which seems to have unearthed an issue (formatting/ |
@KennethEnevoldsen would you still like to work on this PR? Otherwise, I can also do the last bits to push it over the finish line. |
Hi @danieldk I would love to. I have a deadline this week, but I can come back to it next week (potentially before if I find the time). |
@danieldk I have fixed the isort issues and formatted using black. I have run the tests locally as well and they pass (or are skipped). |
@danieldk checked the error, but it does not seem to be related to the PR (in tokenizer test on rotary embedding). Seems like there is an additional argument seq_len. |
update from main
Thanks a lot, looks good! I'll make a small test model in a bit to replace the tests. |
Description
Added support for loading ELECTRA model from the HF hub using a thin wrapper around BERTEncoder (required as a few keys need to be mapped differently) and re-using the same config.
The tests are a bit extensive (loading three different models); I imagine you might want to create your own dummy Electra models for testing.
Checklist