-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Masked Relation Classifier #2748
Conversation
- Correct docstring relation direction
- Add option do remove the `<unk>` tag from the passed label dictionary
…relation label is now omitted)
I think this PR is ready for review now. I have some more ideas to incorporate but they are beyond the basic functionality and may be added later within smaller PRs that are easier to review. For example:
I could also add some more benchmarks if desired. |
hi @dobbersc as you introduce some kind of special tokens, have you tried adding them specifically to the vocabulary of the transformer embeddings? You could do this by adding something like:
in the I would be interested if that yields even some more improvements |
# Conflicts: # tests/test_relation_classifier.py
Hey @helpmefindaname, thank you for your advice. These are the results I got on CONLL04 (same hyperparameters and training script as in the PR description):
Unfortunately, I get worse scores with added special tokens. Initially, when I tested the first two configurations (with label masks), I suspected that distilbert associates a meaning to the labels PER, ORG, LOC, etc., that is useful for this task. Since for added special tokens, distilbert initializes its embedding layer's weights with random values I assumed that this information is now lost and has to be re-learned. But after the last two configurations (without label masks), I don't get why the scores are decreasing, when I add the special tokens. Do you have any ideas here? Code I added to the `__init__` (only works for CONLL04)...# Add the cross-product of "H-", "T-", "R-" and all entity types
if isinstance(self.document_embeddings, TransformerDocumentEmbeddings):
special_tokens: List[str] = [
mask_func(label)
for mask_func, label in itertools.product(
[self._label_aware_head_mask, self._label_aware_tail_mask, self._label_aware_remainder_mask],
["Loc", "Peop", "Org"], # TODO: Retrieve these dynamically
)
]
tokenizer = self.document_embeddings.tokenizer
num_added_tokens = tokenizer.add_special_tokens(
{"additional_special_tokens": special_tokens}
)
self.document_embeddings.model.resize_token_embeddings(len(tokenizer))
log.info(
f"{self.__class__.__name__}: "
f"Added {num_added_tokens} {special_tokens} additional special tokens to {self.document_embeddings.name}"
) |
Hi @dobbersc
we see that there are many overlapping tokens. Maybe it goes in the direction that it uses tokens like Respectively, I suppose that if we have That said, I would have the theory (I can check them myself, if you give me a week), that it might be beneficial to introduce special tokens that represent only a part. Let's say we encode the tokens as Another completely unrelated idea: As you mentioned that the Labels might be useful for the task: You could add a mechanism to rename the labels e.g. |
Hello @dobbersc @helpmefindaname very interesting results and discussion! Regarding renaming labels: With the new corpus logic, renaming of any label is now easy and requires only to define a ### EXAMPLE 1: load WITHOUT label map
corpus = RE_ENGLISH_CONLL04()
# get example sentence
example_sentence: Sentence = corpus.train[1]
# print sentence text, its NER labels, and its relations
print(example_sentence.text)
for entity in example_sentence.get_labels('ner'):
print(entity)
for relation in example_sentence.get_labels('relation'):
print(relation)
### EXAMPLE 2: load WITH label map
corpus = RE_ENGLISH_CONLL04(label_name_map={'Loc': 'Location',
'Peop': 'Person',
'Org': 'Organization',
'Live_In': 'Lives in'})
# get example sentence
example_sentence: Sentence = corpus.train[1]
# print sentence text, its NER labels, and its relations
print(example_sentence.text)
for entity in example_sentence.get_labels('ner'):
print(entity)
for relation in example_sentence.get_labels('relation'):
print(relation) |
Hi again, I did some testing and basically all my ideas lead to an decrease of scores, here are my runs, all with some adjustments in the tokens : with label [H-PER], [T-PER], [R-PER], .... no special token: 0.7985 with label [H-PERSON], [T-PERSON], [R-PERSON], .... no special token: 0.7889 with label [TOKEN-HEAD-PER], [TOKEN-TAIL-PER], [TOKEN-RREMAINDER-PER], .... special tokens ("[TOKEN", "-HEAD-", "-TAIL-", "-REMAINDER-", "PER]", "LOC]", "ORG]",: 0.792 with label [H-PER], [T-PER], [R-PER], .... special tokens ("PER", "LOC", "ORG"): 0.7839 with label [TOKEN-HEAD-PER], [TOKEN-TAIL-PER], [TOKEN-RREMAINDER-PER], .... special tokens ("[TOKEN", "-HEAD-", "-TAIL-", "-REMAINDER-",: 0.7854 |
Thanks for sharing these interesting results @helpmefindaname! I wonder if the masking of the "remainder" NER tags could be a problem since they share many subtokens with HEAD and TAIL tags - and anyway only the head and tail NER are really relevant and what the algorithm should be focusing on. @dobbersc could you do some training runs to evaluate the impact on accuracy of whether |
I don't have any concrete numbers saved from my runs while experimenting with the model on the fly. But in general, with |
Sorry that I only post now. I did some more masking experimentation similar to @helpmefindaname but nothing that got better scores. Here are some scores for the
For the original masking strategy ([H-PER], etc.) the run with |
@dobbersc thanks a lot for this great implementation of masked relation extraction! From my experiments, it looks to significantly outperform our prior relation extractor. I'll probably train one or two models to include with the next Flair release. |
This PR implements a new or alternate relation classifier.
Relation Classification (RC) is the task of identifying the semantic relation between two entities in a text.
In contrast to (end-to-end) Relation Extraction (RE), RC requires pre-labelled entities.
Example: For the
founded_by
relation fromORG
(head) toPER
(tail) and the sentence "Larry Page and Sergey Brin founded Google .", we extract the relationsThe Relation Classifier Model builds upon a text classifier. The model generates an encoded sentence for each entity pair in the cross product of all entities in the original sentence. In the encoded representation, the entities in the current entity pair are masked with special control tokens. (For an example, see the docstring of the
_encode_sentence_for_training
function.) Then, for each encoded sentence, the model takes its document embedding and puts the resulting text representation(s) through a linear layer to get the class relation label.In the following, I leave some results of the masked relation classifier vs. the current relation extractor on CONLL04. I have not optimized their hyperparameters to the fullest. Nevertheless, the difference is quite clear.
Current Relation Extractor
Training Script
Masked Relation Classifier
Training Script