-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High F1 score. But poor accuracy during Inference due to tokenisation #5541
Comments
Hello! Why do you believe the tokenization to be the issue here? |
@LysandreJik Thanks for reaching out. Please find my observations with the inconsistency in the Tokenizer(possible issue), since I was using the HuggingFace provided script for training the custom NER Model. 1. Expected name: Predicted name: Issue: 2. Expected name: Predicted name: Issue: 3. Expected name: Predicted name: Issue: 4. Expected name: Predicted name: Issue: Please let me know if I am missing something. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
🐛 Bug
Information
I am using Bert-Base-cased model to train my custom Named entity recognition(NER) model with a sequence length of 512.
Language I am using the model on: English
The problem arises when using:
The tasks I am working on is:
To reproduce
Steps to reproduce the behavior:
1.Use the default NER Pipeline to load the custom trained model
self.model_prediction_pipeline = pipeline( "ner", model=model_path, tokenizer= model_path, grouped_entities=True )
2. I've attached the Evaluation results of the model.
eval_loss = 0.021479165139844086
eval_precision = 0.8725970149253731
eval_recall = 0.8868932038834951
eval_f1 = 0.8796870297923562
epoch = 5.0
Expected behavior
Environment info
transformers
version: 3.0.0The text was updated successfully, but these errors were encountered: