-
Notifications
You must be signed in to change notification settings - Fork 339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.add_embeddings
not getting the right embedding size
#383
Labels
bug
Something isn't working
Comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Environment info
adapter-transformers
version: v3.0.1+ (commit 11bd9d2)Information
Model I am using (Bert, XLNet ...): XLMR
Language I am using the model on (English, Chinese ...):
Adapter setup I am using (if any):
The problem arises when using:
The tasks I am working on is:
To reproduce
I'm not sure whether it should be an upstream issue from
transformers
where they did not update the.vocab_size
attribute properly or if it is the intended behavior. But I believe we should respect the actual size of the vocabulary size the user intends to use.https://github.com/adapter-hub/adapter-transformers/blob/master/src/transformers/adapters/model_mixin.py#L155
Steps to reproduce the behavior:
Expected behavior
The input dimension of the new embeddings at the end of the example should be 250003.
I believe (tested) an easy fix would be changing this line to
embedding = nn.Embedding(len(tokenizer), embedding_dim)
.But I'm not sure whether this issue should be fixed here in the adapter or in the upstream transformer tokenizer code.
The text was updated successfully, but these errors were encountered: