Document Entities and Profiles #1267
Replies: 6 comments
-
Thanks for summarising this, @mynameisfiber. I'm totally with you on the proposed abolition of To me, the most controversial part in this is PR #3, the proposed Another thing I'd been pondering over the weekend was the notion of temporarily introducing two xref systems in Aleph: the existing one (which in production use), and a derived version that also generates scored entity sets. That would help us play with this result type a bit, perhaps behind a feature flag :) |
Beta Was this translation helpful? Give feedback.
-
Ok, one more thought: should we also take this refactor of the |
Beta Was this translation helpful? Give feedback.
-
Thanks for writing this up! :) Agreed - unless there is a strong benefit to having a separate I was a bit unclear on what "Add profile membership information into Aleph entity" means in the initial proposed implementation steps - I'm assuming membership in a profile would be indicated solely by being a part of a profile EntitySet and wouldn't be stored in the entity itself? Definitely agree with wanting a distinct From a UI standpoint my biggest open questions are:
|
Beta Was this translation helpful? Give feedback.
-
In my thoughts, the main benefit to having a separate entity type is to not clutter the main entity indexes with tokens that may be incorrectly identified. Also, given that the NER system doesn't specifically categorize things into the schemas we have, it would be nice to have more generalized schemas to store their outputs so the decision of whether a thing is a I definitely agree with the splitting of this issue into multiple PR's and the additional judgement types. I was thinking about having this be a integer field in @kjacks My thought was to attach profile membership to the entity as a cache essentially so that whenever we retrieve an entity from the index we can make the decision to show it's profile information instead of having to do separate lookups for each entity. In the future this'll also make it easier to aggregate search results on the profile information in order to dedupe entities within the same profile and display that instead. In general, I think this would make it easier for the UI to be given an entity and make the decision client-side whether to display the profile view of it or the entity itself. For the UI elements, maybe we can sketch some things up next week? I'm also unsure the best ways to do this but with the additional information we'll have access to there's probably some interesting things we can do. |
Beta Was this translation helpful? Give feedback.
-
@mynameisfiber Yes! Sounds good - would love to talk through some options for UI directions we could take with this :) |
Beta Was this translation helpful? Give feedback.
-
Hey, just for the record my 2 cents as I sometimes was part of the discussions around these things... I prefer the approach having abstract
instead of this not so straightforward lifecycle path:
totally looking forward to this feature! if there is anything i can assist in implementing, let me know. |
Beta Was this translation helpful? Give feedback.
-
This issue proposes a solution for two very linked feature requests in aleph: document entities and entity profiles.
Document entities encompasses the desire to have any outputs of the document NER analysis create real Aleph entities. Currently, the tokens extracted from the document are stored as text attached to the
Document
entity and are second-class citizens. By turning them into true entities, they can be x-ref'd, used in diagrams and more easily accessed by users. In order to accomplish this, a new type of entity, theDocumentExtract
entity, is going to be created that can be xref'd on but otherwise will be invisible to the system. This means that the entity could be linked with another entity and shown in the profile view, described below, but otherwise won't clutter the system. The reason for doing this is that any NER extraction is bound to be noisy and create many pathological entities. By making them invisible unless explicitly linked to an existing entity, we allow users to control when these extracted entities are considered "relevant" to their Aleph use-case.Entity profiles is a way of clustering entities which have been linked together through the xref system (and subsequently merged through the Linkage system by a user). By having clusters of entities we have confirmation represent the same physical thing, we have the ability to show in the UI a unified profile representing all the information we know. This is useful in the case that we have multiple entities for, say, Putin, but each entity contains different information. The merged profile will show all of the information in one place and make available a single entity for easier analysis.
An additional requirement of the profile system is that profiles are namespaced to a dataset. A dataset is considered a "workspace" for a particular project and thus can have it's own requirements of the considerations when determining whether two entities are "the same". For example, one project may think that every branch of a bank should be merged into the same profile while another project wants to consider each branch separately. By namespacing the profiles to a dataset, we can still reference any entity (from all of Aleph) without potentially stepping on other projects notion of similarity.
To implement this, we've identified the following steps:
Linkage
data model intoEntitySet
andEntitySetItem
models, adding ascore
andadded_by
field to theEntitySetItem
Linkage
API to use the new modelsDocumentExcerpt
family of entities to represent extracted tokens from a documentDocumentExcerpt
entities from tokens extracted from a document iningest-file
DocumentExcerpt
entities are xref'd onIn the near future, we can also add the following features using the above system:
Beta Was this translation helpful? Give feedback.
All reactions