-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pipeline for automated KG generation #13
Comments
Just a minor comment concerning the RDF pattern in the diagram. I think it is unnecessarily complicated. I would suggest that the pattern should be along the following lines. oekg:scenario123 a oeo:Scenario. I am not sure about oeo:Scenario, xyz:has_record and xyz:Table entities. Firstly, are the tables associated with a scenario or a scenario projection? Secondly, depending on the answer on the first question, we need a relation that links it to an information entity, namely a table. It is probably a good idea to look at the OBI to see whether we can reuse a relation and a class from them. But regardless of whether we use oeo:Scenario, xyz:has_record and xyz:Table or some other IRIs, the pattern should be correct. EDIT: Included the line connecting scenario and table to OEP. I am not sure what ontology term for xyz:has_IRI. |
Sometimes, datasets contain scenarios: Also, a scenario usually has many datasets(as input: assumptions, model parameter ..., as output: projections) |
Yes. Currently the connection between tables and scenarios works mainly via the tags in the scenario schema, but in the future this link has to (also) be made via the factsheets/bundles.
That means, that there are tables that are used in more than one scenario. But that should be no problem, as far as the assignment exists also outside the tables, right? |
That is also my question: whether or not we have an explicit connection (usable via APIs) between the scenario and its datasets. But 'tags' work for filtering in this case. |
No, it should be no problem. At least not for the "dumb and dirty" approach that we are currently following. Our approach contains of going through the content of all tables that are associated with a scenario projection. If an entry is either an OEO term or has been annotated by a third party with an OEO term, we use it as as object in an is-about triple. If it is something else, we try to automatically match it to an OEO term. (In the first approach by simple string matching, at some later stage we can improve that by using more sophisticated approaches.) Since the names of scenarios won't be in the OEO, tables that contain names of other scenarios won't be matched and, thus, ignored. That's ok. Actually, I expect that most of the terms won't be automatically be matchable to something in the OEO, even if we use very sophisticated methods. |
As a first step, the following 'dumb and dirty' versions are the results of a pipeline based on simple 'string matching' between values in the tables and OEO concepts: |
Thanks @adelmemariani . Let's continue the discussion here. Does your script consider synonyms and alternative terms that are given in the OEO? I'm wondering, why |
😮 My script was not aware of 'synonyms' so far. Thnaks @stap-m . I will work on it... |
By considering the The overall result would be much better if we have synonyms for other unassignable terms. |
Acutally, we agreed on using |
In the internal OvGU-meeting with @adelmemariani @fabianneuhaus and myself we developed a workflow for an automated KG generation.
The task is now to establish the basic pipeline for this KG such that a first version can be created.
Semantic enrichment etc. should not be considered at this stage and will be adressed later.
KG and workflow
The text was updated successfully, but these errors were encountered: