NER label re-alignment always expects B labelled first sub-words #10263

joshdevins · 2021-02-18T15:17:04Z

Environment info

transformers version: 4.3.1
Platform: Darwin-19.6.0-x86_64-i386-64bit
Python version: 3.7.7
PyTorch version (GPU?): 1.7.1 (False)
Tensorflow version (GPU?): not installed (NA)
Using GPU in script?: No
Using distributed or parallel set-up in script?: No

Who can help

bert, tokenizers, pipelines: @LysandreJik
trainer, maintained examples: @sgugger

Information

Model I am using (Bert, XLNet ...): DistilBERT fine-tuned for conll03

The problem arises when using:

the official example scripts: (give details below)
my own modified scripts: (give details below)

The tasks I am working on is:

an official GLUE/SQUaD task: (give the name)
my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

Fine-tune a BERT model for NER/conll03 using the run_ner.py example script, all default values
Correct the label alignments, see config.json
Infer using entities that have not been seen at training time, and are composed of multiple word-parts as defined by WordPiece (my assumption as to the cause).
Sub-words are labelled but pipeline re-grouping/label alignment relies on perfect sub-word labelling:

E.g. Accenture → A ##cc ##ent ##ure → B-ORG O O O → A (ORG)
E.g. Max Mustermann → Max Must ##erman ##n → B-PER I-PER I-PER O → Max Musterman (PER)
E.g. Elasticsearch → El ##astic ##sea #rch → O O I-MISC O → ##sea (MISC)

Expected behavior

I would expect that the realignment takes the label from the first word part or the best scoring sub-word part and propogates that label to the entire word, never returning sub-words. The default in run_ner.py is to use a padded sub-word label at training as per the BERT paper, but I've not tried setting that to False yet as that's not the typical/standard practice.

E.g. Accenture → A ##cc ##ent ##ure → B-ORG O O O → Accenture (ORG)
E.g. Max Mustermann → Max Must ##erman ##n → B-PER I-PER I-PER O → Max Mustermann (PER)
E.g. Elasticsearch → El ##astic ##sea #rch → O O I-MISC O → Elasticsearch (MISC)

I'll add that it seems odd that this business logic is in the pipeline. When evaluating on conll03, I assume we are using the sub-words/first word, but this realignment should be considered during evaluation. As-is, I suspect the recall is lower than it should be.

The text was updated successfully, but these errors were encountered:

LysandreJik · 2021-02-18T21:31:16Z

Hello @joshdevins! Indeed, this is a valid issue. The current pipeline outputs tokens that were attributed a class, but ignores the following tokens. For models that were trained with labels on all subwords this works, but using a padded sub-word label like you've done yields unsatisfactory results.

I think we could do better here when specifying grouped_entities=True to the NER pipeline, by looking ahead and checking if the tokens following a classified token are subwords tokens, in which case they can be grouped alongside the start of word token. I think this could be achievable by using offsets in fast tokenizers, as fast tokenizers are necessary for grouped entities anyway.

We can open a Good First Issue for this, or would you like to try your hand at it?

joshdevins · 2021-02-19T14:45:11Z

I think there's a few strategies that can be used to realign labels in the pipeline (I can enumerate these later). However, if we put these strategies in the pipeline only, the evaluation used in fine-tuning NER with the script will differ/be more limited since the evaluation currently has just two choices: use the label of the first sub-word only (ignore the other sub-words), or use each of labels on sub-words. It would be best to have the same realignment strategies available in both places.

In addition, the strategy used at training time for evaluation should really be the one that is used in the pipeline (or at least the default). So we might also consider storing the strategy in the config file that the pipeline can later read.

Happy to hear your thoughts. I'm trying to write down all the realignment strategies that make sense so I will update the thread later once I can wrap my head around the options 😆

joshdevins · 2021-02-19T16:33:16Z

Strategies that I can think of for how to label at inference time (+for evaluation):

If training with padded sub-words/label for first sub-word only, e.g. Max Mustermann → Max Must ##erman ##n → B-PER I-PER X X
- Use the label from the first sub-word (default)
If training with the same label for each sub-word, e.g. Max Mustermann → Max Must ##erman ##n → B-PER I-PER I-PER I-PER
- "First": (See above) Use the label from the first sub-word
- "Max": Use the label with the maximum score across all sub-words
- "Average": Average the score of each label across each sub-word and take the label with the maximum score (default)

This is a nice example of the latter two, see Step 4: Evaluation

As a general principle, I would argue that if grouped_entities=True, we should never be returning sub-words alone. Either they're part of a word that has a label, or they're not. I honestly still don't understand what the flag ignore_subwords is supposed to control 🤷

I would propose two flags:

grouped_entities (boolean) -- note that this implies subword grouping/label realignment (see below)
- True will group all words into larger entities, e.g. Max Mustermann -> B-PER I-PER -> "Max Musterman" (PER)
- False will leave words separated, , e.g. Max Mustermann -> B-PER I-PER -> "Max Musterman" (PER)
subword_label_realignment (boolean or strategy name)
- True will use the default for the way the NER fine-tuning was performed, see default suggestions above
- False will leave sub-words alone -- note that this implies that grouped_entities should be ignores
- strategy name -- based on the above strategies

LysandreJik · 2021-02-22T22:02:20Z

As a general principle, I would argue that if grouped_entities=True, we should never be returning sub-words alone. Either they're part of a word that has a label, or they're not. I honestly still don't understand what the flag ignore_subwords is supposed to control 🤷

I definitely agree with that statement, and it seems like the most straightforward way to improve that pipeline. I agree with the two flags you propose. Having finer control over these would be of great utility.

In addition, the strategy used at training time for evaluation should really be the one that is used in the pipeline (or at least the default). So we might also consider storing the strategy in the config file that the pipeline can later read.

Yes, definitely. These are definitely model-specific as they're reliant on the training, so adding them to the configuration would make things simpler.

joshdevins · 2021-02-23T14:46:13Z

@LysandreJik Sounds good. Unfortunately I don't have time myself to work on this right now but hopefully in the future if someone else doesn't pick this one up.

LysandreJik · 2021-02-23T15:41:59Z

I'll put this up as a good first issue to see if a member of the community feels like working on it. Thank you for the discussion and for writing all of this up!

elk-cloner · 2021-03-03T16:23:05Z

I like to work on this. @LysandreJik besides @joshdevins's solution is there anything that I should consider? Do you have any suggestions?
I'm thinking to add these two flags here and probably change group_sub_entities and group_entities functions too.

LysandreJik · 2021-03-03T16:31:56Z

Wonderful @elk-cloner! I think it's good to take it step by step, and @joshdevins' proposal already offers a very complete approach to re-alignment.

Yes, adding those two flags to the __init__ makes sense! An important part of the development of that feature will be to develop tests to ensure that the behavior is the expected one. Please ping both @Narsil and I on the PR so that we can review!

joshdevins · 2021-03-03T16:51:54Z

Thanks @elk-cloner for having a look! Happy to contribute by reviewing PRs, etc.

LysandreJik added Good First Issue Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want! labels Feb 23, 2021

elk-cloner mentioned this issue Mar 6, 2021

Ner label re alignment #10568

Closed

5 tasks

elk-cloner mentioned this issue Mar 22, 2021

Refactor label re-alignment in NER pipeline and add tests elk-cloner/transformers#2

Merged

francescorubbo mentioned this issue May 7, 2021

[TokenClassification] Label realignment for subword aggregation #11622

Closed

5 tasks

Narsil mentioned this issue May 11, 2021

[TokenClassification] Label realignment for subword aggregation #11680

Merged

5 tasks

Narsil closed this as completed in #11680 May 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NER label re-alignment always expects B labelled first sub-words #10263

NER label re-alignment always expects B labelled first sub-words #10263

joshdevins commented Feb 18, 2021 •

edited

Loading

LysandreJik commented Feb 18, 2021

joshdevins commented Feb 19, 2021

joshdevins commented Feb 19, 2021 •

edited

Loading

LysandreJik commented Feb 22, 2021

joshdevins commented Feb 23, 2021

LysandreJik commented Feb 23, 2021

elk-cloner commented Mar 3, 2021

LysandreJik commented Mar 3, 2021

joshdevins commented Mar 3, 2021

NER label re-alignment always expects B labelled first sub-words #10263

NER label re-alignment always expects B labelled first sub-words #10263

Comments

joshdevins commented Feb 18, 2021 • edited Loading

Environment info

Who can help

Information

To reproduce

Expected behavior

LysandreJik commented Feb 18, 2021

joshdevins commented Feb 19, 2021

joshdevins commented Feb 19, 2021 • edited Loading

LysandreJik commented Feb 22, 2021

joshdevins commented Feb 23, 2021

LysandreJik commented Feb 23, 2021

elk-cloner commented Mar 3, 2021

LysandreJik commented Mar 3, 2021

joshdevins commented Mar 3, 2021

joshdevins commented Feb 18, 2021 •

edited

Loading

joshdevins commented Feb 19, 2021 •

edited

Loading