Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add apomorphy-based phyloref #72

Merged
merged 34 commits into from
Mar 9, 2021
Merged

Add apomorphy-based phyloref #72

merged 34 commits into from
Mar 9, 2021

Conversation

gaurav
Copy link
Member

@gaurav gaurav commented Feb 9, 2021

This PR adds an apomorphy-based phyloreference for testing "Testudinata". It also adds support for the "apomorphy" field in phyloreferences, makes an apomorphy-based phyloreference (i.e. one containing an apomorphy and a single internal specifier) valid, provides instructions on generating the correct JSON-LD for those and modifies the tests so that apomorphy-based phyloreferences can pass testing. In adding these features, I found three bugs, which I also fix in this PR:

  • The test/example.js script was only testing the Brochu 2003 test file. It now tests all JSON files in the test/examples/correct directory.
  • The nomenclatural codes in the JSON Schema file were enumerated in multiple places. This PR moves that enumeration to a single place and refers to them as needed elsewhere in the schema.
  • The code for generating author names in bibliographic citations required that a name field be provided for each author. It can now generate a name from a combination of lastname, firstname and middlename fields.

Based on the format we laid out in the manuscript, the phyloreference Testudinata looks like this in JSON:

{
    "label": "Testudinata",
    "phylorefType": "phyloref:PhyloreferenceUsingApomorphy",
    "definition": "The apomorphy-based clade name, for which a complete turtle shell, as inherited by Testudo graeca Linnaeus 1758 is an apomorphy. A 'complete turtle shell' is herein defined as a composite structure consisting of a carapace with interlocking costals, neurals, peripherals, and a nuchal, together with the plastron comprising interlocking epi-, hyo-, meso- (lost in Testudo graeca), hypo-, xiphiplastra and an entoplastron.",
    "apomorphy": {
        "@type": "https://semanticscience.org/resource/SIO_010056",
        "bearingEntity": "http://purl.obolibrary.org/obo/UBERON_0008271",
        "phenotypicQuality": "http://purl.obolibrary.org/obo/PATO_0000467",
        "definition": "A 'complete turtle shell' is herein defined as a composite structure consisting of a carapace with interlocking costals, neurals, peripherals, and a nuchal, together with the plastron comprising interlocking epi-, hyo-, meso- (lost in Testudo graeca), hypo-, xiphiplastra and an entoplastron. These are articulated with one another along a bridge. Additional elements may be present as well."
    },
    "internalSpecifiers": [{
        "@type": "http://rs.tdwg.org/ontology/voc/TaxonConcept#TaxonConcept",
        "hasName": {
            "@type": "http://rs.tdwg.org/ontology/voc/TaxonName#TaxonName",
            "nomenclaturalCode": "http://rs.tdwg.org/ontology/voc/TaxonName#ICZN",
            "label": "Testudo graeca Linnaeus 1758",
            "nameComplete": "Testudo graeca",
            "genusPart": "Testudo",
            "specificEpithet": "graeca"
        }
    }]
}

And like this in Turtle (based on the NQ file, which can be opened in Protege):

<#phyloref0> a owl:Class ;
    rdfs:label "Testudinata" ;
    obo:IAO_0000115 "The apomorphy-based clade name, for which a complete turtle shell, as inherited by Testudo graeca Linnaeus 1758 is an apomorphy. A 'complete turtle shell' is herein defined as a composite structure consisting of a carapace with interlocking costals, neurals, peripherals, and a nuchal, together with the plastron comprising interlocking epi-, hyo-, meso- (lost in Testudo graeca), hypo-, xiphiplastra and an entoplastron." ;
    obo:IAO_0000119 [ dc:bibliographicCitation "W. G. Joyce et al (2020) Testudinata (#273) [eds: K. de Queiroz and P. D. Cantino and J. A. Gauthier] CRC Press, Boca Raton, FL " ;
            dc:title "Testudinata (#273)" ] ;
    testcase:apomorphy [ a <https://semanticscience.org/resource/SIO_010056> ;
            obo:IAO_0000115 "A 'complete turtle shell' is herein defined as a composite structure consisting of a carapace with interlocking costals, neurals, peripherals, and a nuchal, together with the plastron comprising interlocking epi-, hyo-, meso- (lost in Testudo graeca), hypo-, xiphiplastra and an entoplastron. These are articulated with one another along a bridge. Additional elements may be present as well." ] ;
    testcase:internal_specifier [ a tc:TaxonConcept ;
            tc:hasName [ a tn:TaxonName ;
                    rdfs:label "Testudo graeca Linnaeus 1758" ;
                    tn:genusPart "Testudo" ;
                    tn:nameComplete "Testudo graeca" ;
                    tn:nomenclaturalCode tn:ICZN ;
                    tn:specificEpithet "graeca" ] ] ;
    rdfs:subClassOf <http://ontology.phyloref.org/phyloref.owl#Phyloreference>,
        <http://ontology.phyloref.org/phyloref.owl#PhyloreferenceUsingApomorphy> .

If modeling apomorphies like this makes sense, we will need an additional class in phyloref.owl (phyloref:PhyloreferenceUsingApomorphy). Once this PR has been reviewed, I'll open an issue to add that. We could also include an apomorphy RDF property in phyloref.owl (phyloref:apormophy), but since this is not used for reasoning or in the Phyx format, I don't think it's necessary.

Closes #28.

@gaurav gaurav force-pushed the add-apomorphy-phyloref branch from 4b63d30 to f102381 Compare February 23, 2021 19:00
@gaurav gaurav marked this pull request as ready for review February 23, 2021 19:23
@gaurav gaurav requested a review from hlapp February 23, 2021 19:23
Copy link
Member

@hlapp hlapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So a UBERON term cannot be an exact match with an instance of a phenotype. This needs to be fixed. You could say that the phenotype

ro:inheres_in some obo:UBERON_0008271

(in OWL Manchester, not n-triples RDF, but you get the idea).

Second, we're extending the testcase ontology, which strikes me as a non-starter too. Either we do need an ontology/vocabulary property for this (in which case the Phyloreferencing Ontology was made for precisely this purpose), or we don't. What you say is conflicting on this question.

@hlapp
Copy link
Member

hlapp commented Feb 23, 2021

@balhoff or @wdahdul can you remind me how we encode a presence phenotype? pato:present and inheres_in some <entity>?

@balhoff
Copy link

balhoff commented Feb 23, 2021

@hlapp yes that's correct. Although in the KB we wrap that in 'has part', like has_part some (pato:present and inheres_in some <entity>). This is to align the model with what's found in the phenotype ontologies.

@gaurav
Copy link
Member Author

gaurav commented Feb 25, 2021

Thanks for the input, @balhoff!

Okay, so it sounds like I need to make three or four changes here:

  1. Change testcase:apomorphy to phyloref:apomorphy, and open an issue on the Phyloref Ontology for this new term.
  2. Change the JSON property field exactMatch to something like phenotype.
  3. Change the value in the generated JSON-LD file so that it looks something like this:
  phyloref:apomorphy [ a owl:Class, <https://semanticscience.org/resource/SIO_010056> ;
    obo:IAO_0000115 "A 'complete turtle shell' is herein defined as a composite structure consisting of a carapace with interlocking costals, neurals, peripherals, and a nuchal, together with the plastron comprising interlocking epi-, hyo-, meso- (lost in Testudo graeca), hypo-, xiphiplastra and an entoplastron. These are articulated with one another along a bridge. Additional elements may be present as well." ;
    owl:intersectionOf (
      pato:present 
      [ a owl:Restriction
        owl:onProperty RO:0000052 # Inheres_in
        owl:someValuesFrom obo:UBERON_0008271
      ]
    )
  ] ;
  1. Optionally, we could also go one step further: we could add a field to the apomorphy in JSON that indicates whether the phenotype should be present or absent in the JSON-LD (maybe phenotypePresent as a boolean field?), and then generate pato:present or pato:absent appropriately. What do you think, @hlapp?

@hlapp
Copy link
Member

hlapp commented Feb 26, 2021

  • Note that a owl:Class is redundant. <https://semanticscience.org/resource/SIO_010056> is already a class.
  • I don't think a phenotypePresent attribute in the JSON-LD makes much sense. But you could have a property for quality (or phenotypicQuality), which if provided would then need to be a subclass of pato:quality.

@gaurav
Copy link
Member Author

gaurav commented Mar 1, 2021

Thanks for the feedback, both of you! I've made some changes to the code, so that this how the apomorphy now looks like in Phyx:

"apomorphy": {
    "phenotype": "http://purl.obolibrary.org/obo/UBERON_0008271",
    "phenotypicQuality": "http://purl.obolibrary.org/obo/PATO_0000467",
    "definition": "A 'complete turtle shell' is herein defined as a composite structure consisting of a carapace with interlocking costals, neurals, peripherals, and a nuchal, together with the plastron comprising interlocking epi-, hyo-, meso- (lost in Testudo graeca), hypo-, xiphiplastra and an entoplastron. These are articulated with one another along a bridge. Additional elements may be present as well."
}

Translated into Turtle/OWL, it now looks like this (as translated from the N-Quads file):

phyloref:apomorphy [ a owl:Class ;
    obo:IAO_0000115 "A 'complete turtle shell' is herein defined as a composite structure consisting of a carapace with interlocking costals, neurals, peripherals, and a nuchal, together with the plastron comprising interlocking epi-, hyo-, meso- (lost in Testudo graeca), hypo-, xiphiplastra and an entoplastron. These are articulated with one another along a bridge. Additional elements may be present as well." ;
    rdfs:subClassOf <https://semanticscience.org/resource/SIO_010056> ;
    owl:equivalentClass [ a owl:Class ;
        owl:intersectionOf ( obo:PATO_0000467 [ a owl:Restriction ;
            owl:onProperty obo:RO_0000052 ;
            owl:someValuesFrom obo:UBERON_0008271 ] ) ] ] ;

Does that look right? It doesn't quite work in Protege yet, but I think that's because the apomorphy itself isn't a named class -- it's currently an anonymous class. We could mandate that the user provide an @id for each apomorphy, but I don't think that's a good idea.

Alternatively, I could just shelve the idea of translating the apomorphy into any kind of logical expression for now. I think we've identified the three Phyx fields we would need to express a phenotype in OWL (definition, phenotype and phenotypicQuality). So I think I'll just modify this PR so that we no longer generate a logical expression at all for the apomorphy and call it good. What do you think, @hlapp?

@balhoff
Copy link

balhoff commented Mar 1, 2021

Does that look right? It doesn't quite work in Protege yet, but I think that's because the apomorphy itself isn't a named class -- it's currently an anonymous class.

Right, you can't really make an anonymous class that isn't some kind of expression (like the nested intersection blank node). And you can't use a class expression as the subject of an annotation property. 😕

@hlapp
Copy link
Member

hlapp commented Mar 1, 2021

@gaurav note that in the Turtle/OWL both the a owl:Class statements are redundant (as they are implied). Furthermore, I think we need to either put a phenotype under the phenotype key in the JSON-LD, or rename the key. As is, it is confusing at best, and more likely misleading. If you wanted a value for the entity bearing the phenotype, then the key should be entity, not phenotype. Or name it phenotypeBearer if you wanted something more similar to phenotypicQuality.

As for generating the logical expression, I agree this should be out of scope for now. We could do so for a default case (presence of some anatomical entity; or even a little more generally, a single non-relational quality inhering in a single entity), but as they go phenotypes can get very complex, and so can their logical expressions. Capturing that should be the subject of a future grant.

@gaurav
Copy link
Member Author

gaurav commented Mar 3, 2021

Hi everybody!

Right, you can't really make an anonymous class that isn't some kind of expression (like the nested intersection blank node). And you can't use a class expression as the subject of an annotation property. 😕

Yeah. I guess we could have turned phyloref:apomorphy into an object property, but we'd still need to provide an identifier for the phenotype somehow. I'm now convinced this is out of scope for us.

Furthermore, I think we need to either put a phenotype under the phenotype key in the JSON-LD, or rename the key. As is, it is confusing at best, and more likely misleading. If you wanted a value for the entity bearing the phenotype, then the key should be entity, not phenotype. Or name it phenotypeBearer if you wanted something more similar to phenotypicQuality.

I like phenotypeBearer! So now the JSON representation looks like this:

"apomorphy": {
    "phenotypeBearer": "http://purl.obolibrary.org/obo/UBERON_0008271",
    "phenotypicQuality": "http://purl.obolibrary.org/obo/PATO_0000467",
    "definition": "A 'complete turtle shell' is herein defined as a composite structure consisting of a carapace with interlocking costals, neurals, peripherals, and a nuchal, together with the plastron comprising interlocking epi-, hyo-, meso- (lost in Testudo graeca), hypo-, xiphiplastra and an entoplastron. These are articulated with one another along a bridge. Additional elements may be present as well."
}

I've also added a bit of logic so that the phenotypeQuality in the OWL output will default to obo:PATO_0000467 ("present") if it is not present in the Phyx file. Does that sound like a good idea?

As for generating the logical expression, I agree this should be out of scope for now. We could do so for a default case (presence of some anatomical entity; or even a little more generally, a single non-relational quality inhering in a single entity), but as they go phenotypes can get very complex, and so can their logical expressions. Capturing that should be the subject of a future grant.

Sounds good! I've taken that logic out. The OWL representation is pretty simple now:

phyloref:apomorphy [
    obo:IAO_0000115 "A 'complete turtle shell' is herein defined as ... elements may be present as well."
] ;

Unless we want to include phenotypeBearer and phenotypicQuality in the OWL representation, I think that's good enough for now!

@hlapp
Copy link
Member

hlapp commented Mar 3, 2021

I've also added a bit of logic so that the phenotypeQuality in the OWL output will default to obo:PATO_0000467 ("present") if it is not present in the Phyx file. Does that sound like a good idea?

Sounds risky to me. What if the apomorphy is loss of a trait, and the curator forgets to change the default?

The OWL representation is pretty simple now:

phyloref:apomorphy [
    obo:IAO_0000115 "A 'complete turtle shell' is herein defined as ... elements may be present as well."
] ;

Shouldn't we add a type (subclass of SIO_010056)?

@gaurav
Copy link
Member Author

gaurav commented Mar 4, 2021

Sounds good! I've fixed those in a663d94 and 5fe258a. I've updated the JSON and OWL representations in the description of this PR.

Note that I used @type (i.e. rdf:type), so it's not a subclass (rdfs:subClassOf) of SIO_010056, but rather an instance of SIO_010056. I tried using subClassOf but this causes an error in Protege (Entity not properly recognized, missing triples in input?), probably because -- as Jim mentioned previously -- an anonymous blank node can't be an OWL class. I tried various ways of convincing it that the anonymous blank node should be treated as an RDFS:class and not an OWL:class, but was unsuccessful.

@gaurav gaurav requested a review from hlapp March 5, 2021 04:33
Copy link
Member

@hlapp hlapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So some issues pointed out earlier are starting to re-appear, which concerns me. Are we rushing and losing attention to detail?

docs/context/development/schema.json Outdated Show resolved Hide resolved
docs/context/development/schema.json Outdated Show resolved Hide resolved
docs/context/development/schema.json Outdated Show resolved Hide resolved
src/wrappers/PhyxWrapper.js Show resolved Hide resolved
src/wrappers/PhyxWrapper.js Show resolved Hide resolved
src/wrappers/PhyxWrapper.js Show resolved Hide resolved
test/scripts/phyx2owl.js Show resolved Hide resolved
gaurav and others added 2 commits March 7, 2021 16:09
Co-authored-by: Hilmar Lapp <hlapp@drycafe.net>
Co-authored-by: Hilmar Lapp <hlapp@drycafe.net>
@gaurav
Copy link
Member Author

gaurav commented Mar 7, 2021

So some issues pointed out earlier are starting to re-appear, which concerns me. Are we rushing and losing attention to detail?

I hope not, although I took a few days away from the project to clear my head just in case. To my mind, most of our earlier discussion was to do with how the apomorphy was being represented in Phyx and OWL/JSON-LD, which you didn't comment on in this set of feedback. So hopefully that is decided for now! Most of the issues you point to are either related to how we document these new fields or on other details relating to OWL imports and test suite structure, all of which were very useful. I think once we figure out the JSON Schema definition of phenotypeBearer, we should be done here? If not, no worries -- we can keep working on this until it's good to go.

@gaurav gaurav requested a review from hlapp March 8, 2021 22:25
Copy link
Member

@hlapp hlapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally 👍🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a type for apomorphies in TaxonomicUnitWrapper
3 participants