Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semantic interoperability #9

Closed
blokhin opened this issue Jun 10, 2018 · 14 comments
Closed

Semantic interoperability #9

blokhin opened this issue Jun 10, 2018 · 14 comments
Labels
topic/property-standardization The specification of the precise data representation of properties and entries

Comments

@blokhin
Copy link
Member

blokhin commented Jun 10, 2018

It's important to have a common vocabulary of terms at the solid state physics level.

For example, take a simple term formation energy. In fact, there are lots of approaches (whether we include chemical potential and, if yes, how do we define it etc.) Another more complex example: how is the crystalline structure defined in a particular case. Is it centered, are the atoms wrapped in a unit cell, what is the space group setting, etc.

In general, how can we make sure a term used in repo A is the same as used in repo B?

@blokhin
Copy link
Member Author

blokhin commented Jun 10, 2018

At the MPDS we have a curated taxonomy of the physical properties met in the peer-reviewed literature. However it is far not ideal and has many disadvantages (quite rigid, redundant, suited basically for us only etc.). So definitely a more advanced and flexible solution would be preferable.

@blokhin
Copy link
Member Author

blokhin commented Jul 3, 2018

cf. #24

@blokhin
Copy link
Member Author

blokhin commented Jun 10, 2020

cf. #74

@merkys
Copy link
Member

merkys commented Jun 11, 2020

I agree that this is important. In OPTIMADE a lot of effort is put in defining common terms (#24 and #74 are just a couple of related issues), and what is being done is essentially devising a new taxonomy (ontology?) common to the participating databases. IMO, it would be best if OPTIMADE could adopt already existing taxonomy (ontology?) instead.

@merkys merkys added the topic/property-standardization The specification of the precise data representation of properties and entries label Jun 11, 2020
@blokhin
Copy link
Member Author

blokhin commented Jun 11, 2020

I‘d better use the term taxonomy, rather than the ontology. Last years I’ve got much skepticism towards the ontologies in the semantic web sense (tried to describe in our MS Teams ontology channel in a conversation with @shyamd)

@shyamd
Copy link

shyamd commented Jun 12, 2020

I agree we won't have a full ontology. There is a completeness aspect that we're unlikely to fill., but a taxonomy isn't sufficient. In MPDS, you have a 4 level taxonomy. At the bottom are the actual properties. But there are several additional and ill-defined levels of hierarchy from the relationship between these terms.
For instance, take the Knoop hardness. In one DB, it might be measured from a micro-indenter. In another DB, it might be measured by a nano-indenter. Another DB might have aggregate averages. While all of these are the same Knoop hardness property. Ontologically they have additional qualifications. If we didn't care about them we could move one level up the classification to Knoop hardness and get all of these values, but if we wanted just aggregate numbers, we would look for average Knoop hardness. It's these kind of relationships that is important for interpreting different databases with respect to each other.

@blokhin
Copy link
Member Author

blokhin commented Jun 14, 2020

@shyamd when we read the articles and compare the different values of the Knoop hardness to each other, what do we exactly do? Somehow this is to be "algorithmized", is that what you mean?

@shyamd
Copy link

shyamd commented Jun 15, 2020

Sort of. It's about documenting what we consider and using that to build the classification hierarchy. You wouldn't treat Knoop Hardness from two different methods as equal, but you might use that comparison for something. There is no way an algorithm can be built to mimic that kind of behavior if that context difference isn't embedded into the data point. As a community, we're so stuck on embedding metadata next to each value, whereas the ontology embeds that metadata into the meaning of that value, which is how we as humans compartmentalize information. Even if this isn't the best representation, it's an effective means of translating human understanding into machine understanding before we develop w/e the next methodology is.

@shyamd
Copy link

shyamd commented Jun 15, 2020

I think one thing that could be very cool, but would require a significant amount of funding, would be an automatic ontology library based on pydantic that enables instances of pydantic models to be converted into JSON-LD JSON rather than vanilla python dictionaries. Basically build an ontology in a natural programming manner via inheritance and then allow people to use that to decorate their data.

@ml-evs
Copy link
Member

ml-evs commented Mar 28, 2021

This is a topic we should definitely pick up on at the next workshop, both on the development of the OPTIMADE ontology itself, and also on technicalities of how our API specification definitions can incorporate and serve linked data.

I think one thing that could be very cool, but would require a significant amount of funding, would be an automatic ontology library based on pydantic that enables instances of pydantic models to be converted into JSON-LD JSON rather than vanilla python dictionaries. Basically build an ontology in a natural programming manner via inheritance and then allow people to use that to decorate their data.

Would love to hack on something like this, feels like an area where the Python ecosystem is lacking. I only really know of owlready2. Does anyone know of frameworks in other langues for inspiration?

@shyamd
Copy link

shyamd commented Mar 29, 2021

owlready2 is the only contender and it is hopelessly complicated. I tried making an ontology of band_gaps and wasted about a month trying to get it to work right.

@CasperWA
Copy link
Member

For writing EMMO ontologies, one can use EMMO-Python, but it's also based on owlready2 (as far as I know), and is meant to write EMMO ontologies, not ontologies in general. But concepts might be borrowed?

@blokhin
Copy link
Member Author

blokhin commented Mar 29, 2021

Since the ontologies use the RDF/RDFS vocabularies, the Python's mature rdflib suits well for them. OTOH it doesn't provide you any strong reasoning capabilities, except a rather limited brute-force implementation. (We will be putting some efforts to improve this situation via low-level reasoners linking.)

@blokhin
Copy link
Member Author

blokhin commented Jun 7, 2022

Closing as #376 is now a part of the standard.

Also #24, #406, our wiki on semantic assets, OMDI 2021 workshop, and pysemtec GitHub org are relevant.

@blokhin blokhin closed this as completed Jun 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic/property-standardization The specification of the precise data representation of properties and entries
Projects
None yet
Development

No branches or pull requests

5 participants