Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNA extent OWL definition #6

Open
cmungall opened this issue Sep 28, 2018 · 11 comments
Open

DNA extent OWL definition #6

cmungall opened this issue Sep 28, 2018 · 11 comments

Comments

@cmungall
Copy link

I have some questions about this axiom:

'DNA extent' EquivalentTo
 'sequence molecular entity extent' and ('has part' only 
('deoxyribonucleotide residue' or (('chemical entity' or 'biological sequence entity') and (not ('biological sequence unit')))))
  • It's a little hard to follow, and takes a lot of OWL expertise to understand precisely what is entailed and not entailed by this axiom.
  • The OWL definition doesn't mirror the text definition, making it harder to follow.
  • Mixing transitivity and ONLY is usually a mistake (has-part propagates down to atoms, quarks, ...). In this case the OR appears to give protection, but I think this is illusory. For example, an actual molecular DNA extent will have as parts various immaterial entities such as spaces between atoms (I realize this sounds odd from a modern physics point of view, but this is BFO..). Inclusion of this renders the class unsat.
  • What's the use case for this level of complexity vs a more standard OBO-esque set of EL-compliant genus-differentia definitions?
@cmungall
Copy link
Author

the has-part-only issue is more apparent on this axiom:

'genomic DNA extent' SubClassOf
'has part' only 
('genomic DNA extent' or (group and (not ('sequence molecular entity extent'))))

This is a fun one because of the recursivity. But the problem should be apparent. If chebi was to add a perfectly valid group subClassOf has-part some atom, and atom DisjointWith group, you entail that 'genomic DNA extent's are atoms...

@matentzn
Copy link

matentzn commented Sep 28, 2018

The way the 'DNA extent' is currently defined, the following two classes would be inferred as subclasses/instances of it:

 'sequence molecular entity extent' and not(has part some Thing)

(Only merely says there should not be any relations that do not confirm to the range of the only expression - so if there are non at all, the condition is fulfilled)

'has_part' only 'metal atom' 

would be a subclass of DNA extent
#(assuming metal atom and 'biological sequence unit' are disjoint)

Are these two implications intended?

@mikebada
Copy link
Collaborator

mikebada commented Sep 28, 2018

@cmungall I couldn't come up with a way to formally define 'sequence molecular entity extent' (which is a continuous string of biological sequence units, either as a whole molecular entity or as a subsequence), but I wanted to formally define the extent subtypes as extents composed of specific types of sequence units, which is what I think this does. For 'DNA extent', I essentially wanted to say that it's a SMEE whose sequence units are (exclusively) deoxyribonucleotide residues. I agree that using transitivity and 'only' is usually problematic since parthood propagates all the way down, as you note. I've taken this into account by saying that the only parts of DNA extents are either deoxyribonucleotide residues, or they're chemical entities or biological sequence entities (the two main top-level classes of ChEBI and MSO, respectively) that are not biological sequence units. Thus, this definition still allows for parts of DNA extents that aren't deoxyribonucleotide units (e.g., other extents or regions, chemical groups, atoms, electrons, quarks, etc.). The only restriction is that the parts have to be either chemical entities or biological sequence entities, which doesn't seem unreasonable: ChEBI even already includes atoms and subatomic particles, so I think that, e.g., spaces between atoms would still be within its domain even if they're not explicitly represented now. Additionally, the MSO already has immaterial entities in the form of boundaries of sequence residues, specifically, junctions and termini, for things like chromosomal breakpoints and deletions. If we really had to, we could expand the union to include, e.g., BFO sites or whatever, but I'd say that's currently a nonexistent problem.

As for 'genomic DNA extent', it has a similar format to that of 'DNA extent', except that it uses 'group' instead of ('chemical entity' or 'biological sequence entity') as in 'DNA extent'. I was previously using 'group' in the object of the 'has part' expression, but later expanded it to ('chemical entity' or 'biological sequence entity'); I just hadn't updated the axioms for 'genomic DNA extent' yet. However, even with 'group', I don't see how genomic DNA extents would be classified as atoms with your presented axioms...

As to the reasons for the relatively complicated axiomatization, I'd first say that it's pretty close to the semantics I was trying to get; e.g., for 'DNA extent', that it's a SMEE composed of deoxyribonucleotide units. (The natural-language definition perhaps needs to be edited to match better.) However, it was also done for practical inferential reasons: With this axiomatization, along with others I've recently added, the ontology now knows how to properly connect the various types of molecular entities, extents, regions, and residues. For example, it knows that extents of DNA molecules have to be DNA extents, that regions of DNA molecules have to be DNA regions, and that residues of DNA molecules have to be deoxyribonucleotide residues (plus, using the inverse of 'has part', the reverse assertions are inferred as well). This reflects what we know, and results in some really useful inference, I think. For example, 'cDNA region' is defined only as a 'sequence molecular entity region' that's part of some cDNA; however, now that the ontology knows that any region of a DNA must be a DNA region, it can classify 'cDNA region' under 'DNA region', which it couldn't do before all of this axiomatization, so I think that's pretty cool.

@mikebada
Copy link
Collaborator

mikebada commented Sep 28, 2018

@matentzn 'DNA extent' is also a subclass of

'has part' some 'deoxyribonucleotide residue'

so with that I believe your presented classes wouldn't be classified as DNA extents. (Additionally, it currently doesn't, but its parent 'sequence molecular entity extent' should correspondingly be a subclass of 'has part' some 'biological sequence unit'.)

I'm not claiming that the definitions under discussion are totally immune from ill inferential effects, but I'd be interested in examining inferential issues you can think of regarding these definitions when combined with other reasonable (no pun intended) assertions.

@mikebada
Copy link
Collaborator

@cmungall @matentzn One issue of which I'm aware is that these definitions still lead to the classification of SMEEs that have inappropriate types of chemical entities or biological sequence entities as parts. For example,

'sequence molecular entity extent' and 
'has part' some 'deoxyribonucleotide residue' and 
'has part' some CHEBI:solution

(which is obviously nonsensical) would still be classified as a DNA extent. I'm still thinking of how I can further refine these to avoid this...

@cmungall
Copy link
Author

cmungall commented Sep 29, 2018

To see the problem with genomic DNA extent:

Prefix: : <http://x.org/>

Ontology: <http://x.org>

ObjectProperty: has_part Characteristics: Transitive

## CHEBI        
Class: atom
Class: group
    SubClassOf: has_part some atom
    DisjointWith: atom

## MSO        
Class: sequence_molecular_entity_extent
Class: genomic_DNA_extent
    SubClassOf: sequence_molecular_entity_extent
    DisjointWith: atom
    DisjointWith: group
    SubClassOf: has_part some owl:Thing
    SubClassOf:
        has_part only (genomic_DNA_extent or (group and (not (sequence_molecular_entity_extent))))


Individual: p1
    Types: group
Individual: gde1
    Types: genomic_DNA_extent
    Facts: has_part p1

image

This injects an abox of a genomic extent with one group to demonstrate the inconsistency.

Alternatively you could load just the tbox and do a DL query:

image

presumably this is not the intent

@cmungall
Copy link
Author

I'm still thinking of how I can further refine these to avoid this...

I'd recommend not refining further - owl definitions have to be understood by humans as well as machines.

What about a simple EL pattern using has_member? Treat extents as mereological sums of like units. $x extent = extent and has_member some $x. I think you'd get the same inferences you get re cDNA regions.

You would get less constraints off the bat, so if that's a requirement there may be a way to reintroduce these as disjointness GCIs or hidden GCIs

@mikebada
Copy link
Collaborator

But sequence molecular entity extents aren't disjoint with CHEBI groups; in fact, 'sequence molecular entity region', which is a child of 'sequence molecular entity extent', is explicitly asserted to be a subclass of 'group'. Would there still be a problem if the 'genomic DNA extent'/'group' disjointness axiom were removed?

@cmungall
Copy link
Author

how about:

Prefix: : <http://x.org/>

Ontology: <http://x.org>

ObjectProperty: has_part Characteristics: Transitive

## CHEBI        
Class: atom
Class: group
    SubClassOf: has_part some atom
    DisjointWith: atom

## MSO        
Class: sequence_molecular_entity_extent
Class: genomic_DNA_extent
    SubClassOf: sequence_molecular_entity_extent
    DisjointWith: atom
    SubClassOf: has_part some owl:Thing
    SubClassOf:
        has_part only (genomic_DNA_extent or (group and (not (sequence_molecular_entity_extent))))


Individual: a1
    Types: atom
Individual: p1
    Facts: has_part a1        
Individual: gde1
    Types: genomic_DNA_extent
    Facts: has_part p1

image

@mikebada
Copy link
Collaborator

mikebada commented Oct 1, 2018

@cmungall But I noted that I haven't yet expanded the 'group' conjunct to the wider ('chemical entity' or 'biological sequence entity'), as I've done for the other definitions. I think that fixes it, right?

That being said, these definitions are problematic at least for the issue I noted above. The only other way I can currently think of to get the inference I'm seeking is to use specialized 'has part'/'part of' subrelations to refer to specific types of parts, e.g., 'has residue part'/'residue part of'. Is this strategy of defining and using specific partonomic relations considered OBO-kosher? It seems that these would be subrelations of 'has component'/'component of', right? (I think using the latter are problematic in that they seem to require human interpretation as to which components they're referring to.)

@mikebada
Copy link
Collaborator

mikebada commented Oct 2, 2018

@cmungall After toying around some, I think that the aforementioned types of inference might be possible using disjointness axioms instead, which you previously mentioned, e.g.:

nucleotide_extent
     disjointWith: has_part some (biological_sequence_unit and not nucleotide_residue)

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants