-
Notifications
You must be signed in to change notification settings - Fork 27.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't discard entity_group when token is the last in the sequence. #5439
Conversation
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
47c645e
to
3f5a220
Compare
LGTM! Thanks @mfuntowicz ! Before: In [6]: nlp("My name is Wolfgang and I live in Berlin")
Out[6]: [{'entity_group': 'I-PER', 'score': 0.9991481900215149, 'word': 'Wolfgang'}] With this PR: In [5]: nlp("My name is Wolfgang and I live in Berlin")
Out[5]:
[{'entity_group': 'I-PER', 'score': 0.9991481900215149, 'word': 'Wolfgang'},
{'entity_group': 'I-LOC', 'score': 0.9983668327331543, 'word': 'Berlin'}] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@LysandreJik CI error seems unrelated, is it ok for you if I merge? |
Did you check this, @enzoampil? Just making sure to ping you as you contributed #3957 🤗 |
@julien-c Did a few checks as well and looks great! Was planning to include this in this PR #4987 (2nd point), but this seems to solve it cleanly already, so will consider this fix for that PR 😄 UPDATE: Ended up modifying this fix in the PR above, due to cases where the last token was repeating (for the test cases set in the above PR). |
Signed-off-by: Morgan Funtowicz funtowiczmo@gmail.com