-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NER label re-alignment always expects B labelled first sub-words #10263
Comments
Hello @joshdevins! Indeed, this is a valid issue. The current pipeline outputs tokens that were attributed a class, but ignores the following tokens. For models that were trained with labels on all subwords this works, but using a padded sub-word label like you've done yields unsatisfactory results. I think we could do better here when specifying We can open a Good First Issue for this, or would you like to try your hand at it? |
I think there's a few strategies that can be used to realign labels in the pipeline (I can enumerate these later). However, if we put these strategies in the pipeline only, the evaluation used in fine-tuning NER with the script will differ/be more limited since the evaluation currently has just two choices: use the label of the first sub-word only (ignore the other sub-words), or use each of labels on sub-words. It would be best to have the same realignment strategies available in both places. In addition, the strategy used at training time for evaluation should really be the one that is used in the pipeline (or at least the default). So we might also consider storing the strategy in the config file that the pipeline can later read. Happy to hear your thoughts. I'm trying to write down all the realignment strategies that make sense so I will update the thread later once I can wrap my head around the options 😆 |
Strategies that I can think of for how to label at inference time (+for evaluation):
This is a nice example of the latter two, see Step 4: Evaluation As a general principle, I would argue that if I would propose two flags:
|
I definitely agree with that statement, and it seems like the most straightforward way to improve that pipeline. I agree with the two flags you propose. Having finer control over these would be of great utility.
Yes, definitely. These are definitely model-specific as they're reliant on the training, so adding them to the configuration would make things simpler. |
@LysandreJik Sounds good. Unfortunately I don't have time myself to work on this right now but hopefully in the future if someone else doesn't pick this one up. |
I'll put this up as a good first issue to see if a member of the community feels like working on it. Thank you for the discussion and for writing all of this up! |
I like to work on this. @LysandreJik besides @joshdevins's solution is there anything that I should consider? Do you have any suggestions? |
Wonderful @elk-cloner! I think it's good to take it step by step, and @joshdevins' proposal already offers a very complete approach to re-alignment. Yes, adding those two flags to the |
Thanks @elk-cloner for having a look! Happy to contribute by reviewing PRs, etc. |
Environment info
transformers
version: 4.3.1Who can help
Information
Model I am using (Bert, XLNet ...): DistilBERT fine-tuned for conll03
The problem arises when using:
The tasks I am working on is:
To reproduce
Steps to reproduce the behavior:
run_ner.py
example script, all default valuesE.g. Accenture → A ##cc ##ent ##ure → B-ORG O O O → A (ORG)
E.g. Max Mustermann → Max Must ##erman ##n → B-PER I-PER I-PER O → Max Musterman (PER)
E.g. Elasticsearch → El ##astic ##sea #rch → O O I-MISC O → ##sea (MISC)
Expected behavior
I would expect that the realignment takes the label from the first word part or the best scoring sub-word part and propogates that label to the entire word, never returning sub-words. The default in
run_ner.py
is to use a padded sub-word label at training as per the BERT paper, but I've not tried setting that toFalse
yet as that's not the typical/standard practice.E.g. Accenture → A ##cc ##ent ##ure → B-ORG O O O → Accenture (ORG)
E.g. Max Mustermann → Max Must ##erman ##n → B-PER I-PER I-PER O → Max Mustermann (PER)
E.g. Elasticsearch → El ##astic ##sea #rch → O O I-MISC O → Elasticsearch (MISC)
I'll add that it seems odd that this business logic is in the
pipeline
. When evaluating on conll03, I assume we are using the sub-words/first word, but this realignment should be considered during evaluation. As-is, I suspect the recall is lower than it should be.The text was updated successfully, but these errors were encountered: