-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make transformer_ner continue processing other entities after the first non-matching #309
Conversation
…after the first non-matching
The reason it previously (somewhat) silently failed was because of the overly eager So this PR prevents that from happening by making sure the list isn't empty before attempting to call However, perhaps this would be a good time to tackle the underlying catch-(almost)-all exception issue as well? Do we really want to catch any Do we know what type of exceptions we are actually expecting to see (and catch) in this block? Or perhaps we can identify the part of the code that we want to catch exceptions for? It's entirely possible that you don't know the answers to these questions (I certainly don't). But I thought I'd bring them up in case we know of a better approach. |
Good questions. I tried to fix the bug with minimum change and avoided making assumptions about what the original developer(s) were thinking. It is possible that their goal was simply to log the "forgivable" error and move on to the next doc. In terms of exception handling, it's generally recommended to create and handle specific types of exceptions when necessary. Given catch-all exception handling is not uncommon throughout the code base, it may be worthwhile to improve them altogether on a dedicated ticket, following a broader discussion. |
Fair enough. It might not be worth the time to look into it in this specific context.
I think that's a good point as well. I'm just somewhat afraid we might end up in a situation where we don't feel this is worth the time. It's not ideal, but such is life. |
Alright. I have logged a new ticket and feel free to add any thoughts/observations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Achieves its goal.
Make transformer_ner continue processing other entities after the first non-matching
Currently,
transformer_ner
stops processing as soon as the span of a recognised entity (e.g., a subword) does not match any text tokens so the other remaining entities won’t be inspected or added to theDoc
object. This PR fixed that.