Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High F1 score. But poor accuracy during Inference due to tokenisation #5541

Closed
3 tasks
sudharsan2020 opened this issue Jul 6, 2020 · 3 comments
Closed
3 tasks
Labels

Comments

@sudharsan2020
Copy link

🐛 Bug

Information

I am using Bert-Base-cased model to train my custom Named entity recognition(NER) model with a sequence length of 512.

Language I am using the model on: English

The problem arises when using:

  • the official example scripts: token-classification/run_ner.py

The tasks I am working on is:

  • an official GLUE/SQUaD task: Named entity recognition
  • my own task or dataset: Custom Dataset

To reproduce

Steps to reproduce the behavior:

1.Use the default NER Pipeline to load the custom trained model
self.model_prediction_pipeline = pipeline( "ner", model=model_path, tokenizer= model_path, grouped_entities=True )
2. I've attached the Evaluation results of the model.
eval_loss = 0.021479165139844086
eval_precision = 0.8725970149253731
eval_recall = 0.8868932038834951
eval_f1 = 0.8796870297923562
epoch = 5.0

Expected behavior

  1. Model should produce a good accuracy corresponding to the F1 score.
  2. However during Inference, I am not getting an accuracy over 30%
  3. Not sure if the inconsistent tokenisation leads to poor results.

Environment info

  • transformers version: 3.0.0
  • Platform: Linux-4.15.0-109-generic-x86_64-with-debian-buster-sid
  • Python version: 3.7.7
  • PyTorch version (GPU?): 1.4.0
  • Tensorflow version (GPU?):NA
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?: No
@LysandreJik
Copy link
Member

Hello! Why do you believe the tokenization to be the issue here?

@sudharsan2020
Copy link
Author

sudharsan2020 commented Jul 6, 2020

@LysandreJik Thanks for reaching out.

Please find my observations with the inconsistency in the Tokenizer(possible issue), since I was using the HuggingFace provided script for training the custom NER Model.

1. Expected name:
AUDLEY THOMPSON

Predicted name:
{'entity_group': 'B-PER', 'score': 0.9993636608123779, 'word': 'AUDLE'},
{'entity_group': 'I-PER', 'score': 0.8126876294612885, 'word': '##Y THOMPS'}

Issue:
Last two letters got skipped

2. Expected name:
DANIEL, BROWN

Predicted name:
{'entity_group': 'B-PER', 'score': 0.9559168517589569, 'word': 'DAN'},
{'entity_group': 'I-PER', 'score': 0.9092316627502441, 'word': '##IE'},
{'entity_group': 'B-PER', 'score': 0.5071505904197693, 'word': '##L'},
{'entity_group': 'I-PER', 'score': 0.849787175655365, 'word': ', BROWN'}

Issue:
The wordpiece tokenizer splits the begin entity into smaller pieces. However model predicts that as an "I-PER" entity which makes it really difficult to merge continuous entities

3. Expected name:
VINEY, PAJTSHIA

Predicted name:
{'entity_group': 'B-PER', 'score': 0.9991838335990906, 'word': 'VI'},
{'entity_group': 'I-PER', 'score': 0.9591831763585409, 'word': '##Y , PA'}
{'entity_group': 'I-PER', 'score': 0.7927274107933044, 'word': '##IA'}

Issue:
'NE' word is missed in the name: 'VINEY'
'JTSH' word is missed in the name: 'PAJTSHIA'

4. Expected name:
Pierson, Garcia

Predicted name:
{'entity_group': 'B-PER', 'score': 0.9972472190856934, 'word': 'Pierson'},
{'entity_group': 'I-PER', 'score': 0.8200799822807312, 'word': 'GA'},
{'entity_group': 'I-PER', 'score': 0.8131067156791687, 'word': '##IA'}

Issue:
'RC' word is missed in the name: 'Garcia'

Please let me know if I am missing something.
Missing characters and split tokens are major reasons for the accuracy drop while merging the Begin(B-PER) and Info(I-PER) entities.

@stale
Copy link

stale bot commented Sep 4, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants