-
-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Counting multiple entity mentions #22
Comments
Hi @acxcv , The thing that is making me think at this moment is that we have two different definitions of entities:
The behaviour of the The desired behaviour would be: But, due to the definition of a Span (the class that holds entities), the result is: I think one reason of this happening, is also because the standard built-in models of spacy (e.g. Unfortunately, I think that it would be better to keep this behaviour in the default operations with the entities. I can provide an implementation, but not very soon because I am quite busy writing my PhD thesis at the moment. Martino |
I edited this post. I had been confused with how entity counts are handled in processed docs.
Example
Expected behavior
Actual behavior
This makes sense because there may be entities that share the same surface form but point to a different identifier in the KG. However, it did cause some confusion with me.
This seems to be equivalent to
Solution
In order to achieve what I wanted, I needed to call
In this example, counting
kb_id_
s led to the same result as counting__str__
properties. However, I would discourage from using__str__
except you're interested in surface forms only.If this becomes relevant to other users, I suggest implementing something like
doc.unique_ents
for the set of entities in the document anddoc.unique_ent_counts
for the Counter dict.The text was updated successfully, but these errors were encountered: