-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bf/combine transformer embeddings #2558
Bf/combine transformer embeddings #2558
Conversation
6de0b1e
to
3803b00
Compare
0a84cf3
to
12f78cb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this! Still testing, but found an error that appears with the following code:
embeddings = TransformerWordEmbeddings(model='xlm-roberta-base',
layers="-1",
subtoken_pooling="first",
fine_tune=True,
use_context=False,
)
text = "."
sentence = Sentence(text)
embeddings.embed(sentence)
Suggestion to solve this (I think) added in-line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for refactoring this!
@helpmefindaname I found another error. It seems the fix for the previous error now broke sentences that are too long (over 512 subtokens). Reproducible with this script: from flair.data import Sentence
from flair.embeddings import TransformerWordEmbeddings
# example transformer embeddings
embeddings = TransformerWordEmbeddings(model='distilbert-base-uncased')
# create sentence with more than 512 subtokens
long_sentence = Sentence('a ' * 513)
# embed
embeddings.embed(long_sentence) Throws the same assertion error as previously, i.e.: File ".../flair/flair/embeddings/base.py", line 769, in _add_embeddings_internal
self._add_embeddings_to_sentences(expanded_sentences)
File ".../flair/flair/embeddings/base.py", line 728, in _add_embeddings_to_sentences
self._extract_token_embeddings(sentence_hidden_states, sentences, all_token_subtoken_lengths)
File ".../flair/flair/embeddings/base.py", line 656, in _extract_token_embeddings
assert subword_start_idx < subword_end_idx <= sentence_hidden_state.size()[1]
AssertionError Any ideas how to fix this? |
Hi, |
creates a transformer embedding that combines both
TransformerWordEmbedding
andTransformerDocumentEmbedding
it should be able to:
max
ormean
)current state: