Decide how to modularize GAZ such that individual subsets can be managed in github #21

cmungall · 2019-04-29T14:50:52Z

What source format? .obo is easy for diffing but this assumes we don't convert to instances. Dependency on Proposal: convert GAZ to an instance-based representation #20
How do we modularize? We need a set of mutually exclusive exhaustive categories. If this is not possible we need an agreed upon prioritization to detemine which entity belongs in which module
How do we determine the initial conversion is not lossy?
- robot diff doesn't scale
- If we punt on Proposal: convert GAZ to an instance-based representation #20 for now, then obo-level diffing is very easy and scalable, could be done at the ascii level even, I also have scripts here: https://github.com/cmungall/obo-scripts
- for an instance representation, there will be no blank nodes so this makes RDF-level diffing easy

cmungall · 2019-05-01T01:17:05Z

@rctauber how were you planning to split things into modules? I see you have breakdown by country just now. Do you include everything that is located in a country, including geographic features such as lakes, rivers and the like? What bout features that overlap two countries?

beckyjackson · 2019-05-01T09:02:41Z

The country modules are everything that is related by either located_in or subClassOf. I'm not sure how overlapping features are handled currently in GAZ, but the modules would reflect that.

We originally discussed starting with countries, then expanding to other subsets like oceans and seas.

But, if overlapping features appear in multiple modules (and I imagine there will be overlap between things like counties and oceans and seas), it will be hard to make sure things stay up-to-date if we are using the modules to develop...

pbuttigieg · 2019-05-02T12:18:22Z

@rctauber

I'm not sure how overlapping features are handled currently in GAZ, but the modules would reflect that.

As long as they're of different types, I think there shouldn't be conflicts in the subClassOf hierarchies. The RO:overlaps relation and its subproperties can be (is?) used to assert this sort of mereotopology. Even if these are in different modules, this should hold as long as there are some checks in place to make sure classes/instances are present across modules.

On that note, @cmungall and I had several conversations over the years about the need to generalise spatial relations in ontologies like BSPO and RO to the planetary science case. I think GAZ will need these too. @cmungall time for an RO-geo subset? Branching off to #24

beckyjackson · 2019-05-06T15:21:49Z

As long as they're of different types, I think there shouldn't be conflicts in the subClassOf hierarchies.

What about the 'located in' hierarchies, though? The modules include subclasses and located in. For example, say a river is located in two countries and we need to update the label of that river. Even if we check for 'overlaps', how do we know which one is newer? I guess I could write a script that takes the changes from the most-recently updated modules but it may get complicated.

How do we determine the initial conversion is not lossy?

Before we tackle the above problem, I think this is the more important issue.

On another note, I can regenerate the modules from GAZ to keep them up-to-date, but I'm using a version of ROBOT that has a few unreleased features. The two main ones are improved templating and use of Jena's TDB feature to store a dataset on-disk (which makes querying infinitely faster). I'm pushing to get the updated templating merged in, then I need to make a PR for the Jena stuff. I don't want to include a custom ROBOT JAR in this repo since there are already many large files.

As soon as these features are released, I can add the rules to the Makefile to generate modules so that anybody can do this. That said, it doesn't solve our problem of using modules to actually build GAZ, but at least the modules can be kept up-to-date.

cmungall · 2019-05-07T01:29:52Z

@rctauber

What about the 'located in' hierarchies, though? The modules include subclasses and located in. For example, say a river is located in two countries and we need to update the label of that river

Not sure if I am totally following. This issue is about modularization rather than labels, it sounds like you may also be making unique labels? (see #26).

But in answer to the main question, it should not be possible for an entity to be in RO:located-in two locations where those locations do not overlap (by definition). Thus if we choose non-overlapping units as the modules and placement in the modules is determined by located-in, then nothing should be in more than one module. But note:

There is no guarantee that located_in has been used in this strict RO sense in GAZ, or that mistakes have not crept in
due to these mereotopological properties there will be some entities that cannot be placed in a module. e.g. a river should not be located in 2 countries, instead the partial-overlaps will have been used. We could have some binning strategy where something gets binned up to the next level (e.g. continent, and then up to earth). But this starts getting complex

cmungall · 2019-05-07T01:33:34Z

Let me also state a few assumptions to check I'm on the same page as everyone:

I assume there will a one-time conversion of the current gaz source into a modular RDF representation in github
Editors will edit individual module files using Protege
A custom release process will build a complete gaz.owl and gaz.obo file and these will be distributed by a mechanism OTHER than github raw files
There will be some kind of Makefile-automated QC sparql checks to make sure that editors creating new entities place them in a module that is consistent with the located-in axiom

beckyjackson · 2019-05-07T13:35:44Z

This issue is about modularization rather than labels, it sounds like you may also be making unique labels?

Sorry, I wasn't super clear. I was just using that as an example if we wanted to update the label of an entity that existed in two modules. This wouldn't be a problem if we are able to define non-overlapping modules, as you suggest above.

I agree with your stated assumptions.

cmungall · 2019-06-21T23:02:13Z

@rctauber going back to your comment from May 6. What are your plans for robot templates here?

beckyjackson · 2019-06-25T09:08:15Z

I don't have templates for the modules right now, but I can always make them if need be. I'm starting to see that ROBOT is having some trouble with any entities that are both named individuals and classes. For example, GAZ:00005229:

<!-- http://purl.obolibrary.org/obo/GAZ_00005229 -->

<owl:Class rdf:about="http://purl.obolibrary.org/obo/GAZ_00005229">
    <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Vennesla</rdfs:label>
    <rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/GAZ_00002718"/>
    <obo:IAO_0000115 rdf:datatype="http://www.w3.org/2001/XMLSchema#string">A populated place.</obo:IAO_0000115>
    <oboInOwl:hasOBONamespace rdf:datatype="http://www.w3.org/2001/XMLSchema#string">GAZ</oboInOwl:hasOBONamespace>
    <oboInOwl:id rdf:datatype="http://www.w3.org/2001/XMLSchema#string">GAZ:00005229</oboInOwl:id>
</owl:Class>

and...

<!-- http://purl.obolibrary.org/obo/GAZ_00005229 -->

<owl:NamedIndividual rdf:about="http://purl.obolibrary.org/obo/GAZ_00005229">
    <obo:RO_0001025 rdf:resource="http://purl.obolibrary.org/obo/GAZ_00012611"/>
</owl:NamedIndividual>

I'm trying to use robot filter to create a "bucket" of things missing from the country modules, but filter isn't working for these types of terms. We may need to resolve #20 before proceeding with modules.

cmungall · 2019-07-09T19:32:42Z

I agree we should fix the punning first.

My question was more along the lines of what you thought was best for the overall strategy. One possibility would be to maintain the entire ontology as a TSV and generate via robot template. I thought you might be thinking along these lines. There would be some definite advantages here. But it could be awkward editing the relational graph. And having mixed mode TSV and OWL may just add more complexity to what is already likely to turn into quite a complex build.

It may be the case that we don't need to worry about templates just now and just focus on modularizing the OWL (but still, fixing the punning would be good)

beckyjackson · 2019-07-09T21:31:01Z

My plan was to modularize first, and then determine if we want to move to templates later. So I think we are in agreement there.

I think we should discuss #20 on our next GAZ call and (perhaps) move forward on converting all those into individuals. Then, I could work on building a "bucket" that contains all the terms not in one of the country modules.

cmungall added question questions or discussion items (comnsider splitting) technical Anything regarding the build/release pipeline or requiring dev help labels Apr 29, 2019

pbuttigieg mentioned this issue May 2, 2019

Geographic relations #24

Open

cmungall mentioned this issue May 9, 2019

Coordinating and Prioritizing tasks for GAZ #27

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decide how to modularize GAZ such that individual subsets can be managed in github #21

Decide how to modularize GAZ such that individual subsets can be managed in github #21

cmungall commented Apr 29, 2019

cmungall commented May 1, 2019

beckyjackson commented May 1, 2019

pbuttigieg commented May 2, 2019

beckyjackson commented May 6, 2019

cmungall commented May 7, 2019

cmungall commented May 7, 2019

beckyjackson commented May 7, 2019

cmungall commented Jun 21, 2019

beckyjackson commented Jun 25, 2019

cmungall commented Jul 9, 2019

beckyjackson commented Jul 9, 2019

Decide how to modularize GAZ such that individual subsets can be managed in github #21

Decide how to modularize GAZ such that individual subsets can be managed in github #21

Comments

cmungall commented Apr 29, 2019

cmungall commented May 1, 2019

beckyjackson commented May 1, 2019

pbuttigieg commented May 2, 2019

beckyjackson commented May 6, 2019

cmungall commented May 7, 2019

cmungall commented May 7, 2019

beckyjackson commented May 7, 2019

cmungall commented Jun 21, 2019

beckyjackson commented Jun 25, 2019

cmungall commented Jul 9, 2019

beckyjackson commented Jul 9, 2019