Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NER label re-alignment always expects B labelled first sub-words #10263

Closed
2 of 4 tasks
joshdevins opened this issue Feb 18, 2021 · 9 comments · Fixed by #11680
Closed
2 of 4 tasks

NER label re-alignment always expects B labelled first sub-words #10263

joshdevins opened this issue Feb 18, 2021 · 9 comments · Fixed by #11680
Labels
Good First Issue Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want!

Comments

@joshdevins
Copy link
Contributor

joshdevins commented Feb 18, 2021

Environment info

  • transformers version: 4.3.1
  • Platform: Darwin-19.6.0-x86_64-i386-64bit
  • Python version: 3.7.7
  • PyTorch version (GPU?): 1.7.1 (False)
  • Tensorflow version (GPU?): not installed (NA)
  • Using GPU in script?: No
  • Using distributed or parallel set-up in script?: No

Who can help

Information

Model I am using (Bert, XLNet ...): DistilBERT fine-tuned for conll03

The problem arises when using:

  • the official example scripts: (give details below)
  • my own modified scripts: (give details below)

The tasks I am working on is:

  • an official GLUE/SQUaD task: (give the name)
  • my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

  1. Fine-tune a BERT model for NER/conll03 using the run_ner.py example script, all default values
  2. Correct the label alignments, see config.json
  3. Infer using entities that have not been seen at training time, and are composed of multiple word-parts as defined by WordPiece (my assumption as to the cause).
  4. Sub-words are labelled but pipeline re-grouping/label alignment relies on perfect sub-word labelling:

E.g. Accenture → A ##cc ##ent ##ure → B-ORG O O O → A (ORG)
E.g. Max Mustermann → Max Must ##erman ##n → B-PER I-PER I-PER O → Max Musterman (PER)
E.g. Elasticsearch → El ##astic ##sea #rch → O O I-MISC O → ##sea (MISC)

Expected behavior

I would expect that the realignment takes the label from the first word part or the best scoring sub-word part and propogates that label to the entire word, never returning sub-words. The default in run_ner.py is to use a padded sub-word label at training as per the BERT paper, but I've not tried setting that to False yet as that's not the typical/standard practice.

E.g. Accenture → A ##cc ##ent ##ure → B-ORG O O O → Accenture (ORG)
E.g. Max Mustermann → Max Must ##erman ##n → B-PER I-PER I-PER O → Max Mustermann (PER)
E.g. Elasticsearch → El ##astic ##sea #rch → O O I-MISC O → Elasticsearch (MISC)

I'll add that it seems odd that this business logic is in the pipeline. When evaluating on conll03, I assume we are using the sub-words/first word, but this realignment should be considered during evaluation. As-is, I suspect the recall is lower than it should be.

@LysandreJik
Copy link
Member

Hello @joshdevins! Indeed, this is a valid issue. The current pipeline outputs tokens that were attributed a class, but ignores the following tokens. For models that were trained with labels on all subwords this works, but using a padded sub-word label like you've done yields unsatisfactory results.

I think we could do better here when specifying grouped_entities=True to the NER pipeline, by looking ahead and checking if the tokens following a classified token are subwords tokens, in which case they can be grouped alongside the start of word token. I think this could be achievable by using offsets in fast tokenizers, as fast tokenizers are necessary for grouped entities anyway.

We can open a Good First Issue for this, or would you like to try your hand at it?

@joshdevins
Copy link
Contributor Author

I think there's a few strategies that can be used to realign labels in the pipeline (I can enumerate these later). However, if we put these strategies in the pipeline only, the evaluation used in fine-tuning NER with the script will differ/be more limited since the evaluation currently has just two choices: use the label of the first sub-word only (ignore the other sub-words), or use each of labels on sub-words. It would be best to have the same realignment strategies available in both places.

In addition, the strategy used at training time for evaluation should really be the one that is used in the pipeline (or at least the default). So we might also consider storing the strategy in the config file that the pipeline can later read.

Happy to hear your thoughts. I'm trying to write down all the realignment strategies that make sense so I will update the thread later once I can wrap my head around the options 😆

@joshdevins
Copy link
Contributor Author

joshdevins commented Feb 19, 2021

Strategies that I can think of for how to label at inference time (+for evaluation):

  • If training with padded sub-words/label for first sub-word only, e.g. Max MustermannMax Must ##erman ##nB-PER I-PER X X
    • Use the label from the first sub-word (default)
  • If training with the same label for each sub-word, e.g. Max MustermannMax Must ##erman ##nB-PER I-PER I-PER I-PER
    • "First": (See above) Use the label from the first sub-word
    • "Max": Use the label with the maximum score across all sub-words
    • "Average": Average the score of each label across each sub-word and take the label with the maximum score (default)

This is a nice example of the latter two, see Step 4: Evaluation

subword_voting

As a general principle, I would argue that if grouped_entities=True, we should never be returning sub-words alone. Either they're part of a word that has a label, or they're not. I honestly still don't understand what the flag ignore_subwords is supposed to control 🤷

I would propose two flags:

  • grouped_entities (boolean) -- note that this implies subword grouping/label realignment (see below)
    • True will group all words into larger entities, e.g. Max Mustermann -> B-PER I-PER -> "Max Musterman" (PER)
    • False will leave words separated, , e.g. Max Mustermann -> B-PER I-PER -> "Max Musterman" (PER)
  • subword_label_realignment (boolean or strategy name)
    • True will use the default for the way the NER fine-tuning was performed, see default suggestions above
    • False will leave sub-words alone -- note that this implies that grouped_entities should be ignores
    • strategy name -- based on the above strategies

@LysandreJik
Copy link
Member

As a general principle, I would argue that if grouped_entities=True, we should never be returning sub-words alone. Either they're part of a word that has a label, or they're not. I honestly still don't understand what the flag ignore_subwords is supposed to control 🤷

I definitely agree with that statement, and it seems like the most straightforward way to improve that pipeline. I agree with the two flags you propose. Having finer control over these would be of great utility.

In addition, the strategy used at training time for evaluation should really be the one that is used in the pipeline (or at least the default). So we might also consider storing the strategy in the config file that the pipeline can later read.

Yes, definitely. These are definitely model-specific as they're reliant on the training, so adding them to the configuration would make things simpler.

@joshdevins
Copy link
Contributor Author

@LysandreJik Sounds good. Unfortunately I don't have time myself to work on this right now but hopefully in the future if someone else doesn't pick this one up.

@LysandreJik
Copy link
Member

I'll put this up as a good first issue to see if a member of the community feels like working on it. Thank you for the discussion and for writing all of this up!

@LysandreJik LysandreJik added Good First Issue Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want! labels Feb 23, 2021
@elk-cloner
Copy link
Contributor

I like to work on this. @LysandreJik besides @joshdevins's solution is there anything that I should consider? Do you have any suggestions?
I'm thinking to add these two flags here and probably change group_sub_entities and group_entities functions too.

@LysandreJik
Copy link
Member

Wonderful @elk-cloner! I think it's good to take it step by step, and @joshdevins' proposal already offers a very complete approach to re-alignment.

Yes, adding those two flags to the __init__ makes sense! An important part of the development of that feature will be to develop tests to ensure that the behavior is the expected one. Please ping both @Narsil and I on the PR so that we can review!

@joshdevins
Copy link
Contributor Author

Thanks @elk-cloner for having a look! Happy to contribute by reviewing PRs, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Good First Issue Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want!
Projects
None yet
3 participants