Pipeline for automated KG generation #13

stap-m · 2022-10-11T09:02:42Z

In the internal OvGU-meeting with @adelmemariani @fabianneuhaus and myself we developed a workflow for an automated KG generation.
The task is now to establish the basic pipeline for this KG such that a first version can be created.
Semantic enrichment etc. should not be considered at this stage and will be adressed later.

KG and workflow

fabianneuhaus · 2022-10-11T12:32:02Z

Just a minor comment concerning the RDF pattern in the diagram. I think it is unnecessarily complicated. I would suggest that the pattern should be along the following lines.

oekg:scenario123 a oeo:Scenario.
oekg:scenario123 xyz:has_IRI < address of website on OEP > .
oekg:scenario123 xyz:has_record oekg:table456.
oekg:table456 a xyz:Table.
oekg:table456 xyz:has_IRI < address of website on OEP > .
oekg:table456 is about oeo:entity.

I am not sure about oeo:Scenario, xyz:has_record and xyz:Table entities. Firstly, are the tables associated with a scenario or a scenario projection? Secondly, depending on the answer on the first question, we need a relation that links it to an information entity, namely a table. It is probably a good idea to look at the OBI to see whether we can reuse a relation and a class from them. But regardless of whether we use oeo:Scenario, xyz:has_record and xyz:Table or some other IRIs, the pattern should be correct.

EDIT: Included the line connecting scenario and table to OEP. I am not sure what ontology term for xyz:has_IRI.

adelmemariani · 2022-10-11T19:42:56Z

Sometimes, datasets contain scenarios:

Also, a scenario usually has many datasets(as input: assumptions, model parameter ..., as output: projections)
This makes it difficult to make a pipeline. Besides, dataset values are not easily mappable to OEO concepts because users choosed vague and abbreviated terms.

stap-m · 2022-10-12T06:54:23Z

Firstly, are the tables associated with a scenario or a scenario projection?

Yes. Currently the connection between tables and scenarios works mainly via the tags in the scenario schema, but in the future this link has to (also) be made via the factsheets/bundles.

Sometimes, datasets contain scenarios:

That means, that there are tables that are used in more than one scenario. But that should be no problem, as far as the assignment exists also outside the tables, right?

adelmemariani · 2022-10-12T10:10:43Z

That means, that there are tables that are used in more than one scenario. But that should be no problem, as far as the assignment also exists outside the tables, right?

That is also my question: whether or not we have an explicit connection (usable via APIs) between the scenario and its datasets. But 'tags' work for filtering in this case.

fabianneuhaus · 2022-10-12T10:56:53Z

That means, that there are tables that are used in more than one scenario. But that should be no problem, as far as the assignment exists also outside the tables, right?

No, it should be no problem. At least not for the "dumb and dirty" approach that we are currently following. Our approach contains of going through the content of all tables that are associated with a scenario projection. If an entry is either an OEO term or has been annotated by a third party with an OEO term, we use it as as object in an is-about triple. If it is something else, we try to automatically match it to an OEO term. (In the first approach by simple string matching, at some later stage we can improve that by using more sophisticated approaches.) Since the names of scenarios won't be in the OEO, tables that contain names of other scenarios won't be matched and, thus, ignored. That's ok. Actually, I expect that most of the terms won't be automatically be matchable to something in the OEO, even if we use very sophisticated methods.

adelmemariani · 2022-10-12T11:53:22Z

As a first step, the following 'dumb and dirty' versions are the results of a pipeline based on simple 'string matching' between values in the tables and OEO concepts:

With IRIs:
https://github.com/OpenEnergyPlatform/oekg/blob/Trial_autogenerated_oekg_via_pieline/Dummy_OEKG_With_Senario_Datasets.ttl

With labels:
https://github.com/OpenEnergyPlatform/oekg/blob/Trial_autogenerated_oekg_via_pieline/Dummy_OEKG_With_Senario_Datasets_With_Labels.ttl

stap-m · 2022-10-12T13:19:56Z

The following is the list of 'not assignable terms’ for datasets that belong to KS_2050:
https://github.com/OpenEnergyPlatform/oekg/blob/Trial_autogenerated_oekg_via_pieline/not_assignables.txt

Thanks @adelmemariani . Let's continue the discussion here.

Does your script consider synonyms and alternative terms that are given in the OEO? I'm wondering, why PJ wasn't found. It is as annotated as exact synonym of petajoule (OEO_00050006).

adelmemariani · 2022-10-12T13:43:56Z

Does your script consider synonyms and alternative terms that are given in the OEO? I'm wondering, why PJ wasn't found. It is as annotated as exact synonym of petajoule (OEO_00050006).

😮 My script was not aware of 'synonyms' so far. Thnaks @stap-m . I will work on it...

adelmemariani · 2022-10-12T14:30:55Z

By considering the has exact synonym relations, the 'petajoule' and 'PJ' is now mappable and 'PJ' is no longer in the list of unassignable terms:
https://github.com/OpenEnergyPlatform/oekg/blob/Trial_autogenerated_oekg_via_pieline/Dummy_OEKG_With_Senario_Datasets_With_Labels_And_IRIs.ttl#L376

The overall result would be much better if we have synonyms for other unassignable terms.

stap-m · 2022-10-13T06:47:12Z

😮 My script was not aware of 'synonyms' so far. Thnaks @stap-m . I will work on it...

Acutally, we agreed on using alternative term instead of synonyms, but appearently, there are still some artifacts...

stap-m assigned adelmemariani Oct 11, 2022

Ludee added the enhancement New feature or request label Oct 13, 2022

Ludee changed the title ~~create pipeline for automated KG-generation~~ Pipeline for automated KG generation Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipeline for automated KG generation #13

Pipeline for automated KG generation #13

stap-m commented Oct 11, 2022

fabianneuhaus commented Oct 11, 2022 •

edited

Loading

adelmemariani commented Oct 11, 2022 •

edited

Loading

stap-m commented Oct 12, 2022 •

edited

Loading

adelmemariani commented Oct 12, 2022 •

edited

Loading

fabianneuhaus commented Oct 12, 2022

adelmemariani commented Oct 12, 2022 •

edited

Loading

stap-m commented Oct 12, 2022

adelmemariani commented Oct 12, 2022 •

edited

Loading

adelmemariani commented Oct 12, 2022 •

edited

Loading

stap-m commented Oct 13, 2022

Pipeline for automated KG generation #13

Pipeline for automated KG generation #13

Comments

stap-m commented Oct 11, 2022

fabianneuhaus commented Oct 11, 2022 • edited Loading

adelmemariani commented Oct 11, 2022 • edited Loading

stap-m commented Oct 12, 2022 • edited Loading

adelmemariani commented Oct 12, 2022 • edited Loading

fabianneuhaus commented Oct 12, 2022

adelmemariani commented Oct 12, 2022 • edited Loading

stap-m commented Oct 12, 2022

adelmemariani commented Oct 12, 2022 • edited Loading

adelmemariani commented Oct 12, 2022 • edited Loading

stap-m commented Oct 13, 2022

fabianneuhaus commented Oct 11, 2022 •

edited

Loading

adelmemariani commented Oct 11, 2022 •

edited

Loading

stap-m commented Oct 12, 2022 •

edited

Loading

adelmemariani commented Oct 12, 2022 •

edited

Loading

adelmemariani commented Oct 12, 2022 •

edited

Loading

adelmemariani commented Oct 12, 2022 •

edited

Loading

adelmemariani commented Oct 12, 2022 •

edited

Loading