Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Derived SO (SO_refactored) differs from current release SO in various ways #13

Open
cmungall opened this issue Feb 22, 2019 · 8 comments

Comments

@cmungall
Copy link

I assume SO_refactored.owl/obo is the output of the compilation step (see #12)

If so, then it differs substantially from the current released version of SO. Some of these are technical and probably easy to fix; e.g. missing axiom annotations on synonyms. Others involve large changes to the hierarchy. Are all of these intentional, or are some bugs? What is the process for evaluating this and announcing any changes to the community?

@cmungall
Copy link
Author

Example:

image

  • logical axioms completely different
  • xref missing in SO_refactored
  • different properties used for synonyms

Also weird stuff, the definitions differ but the definition source is the same - surely if a definition is changed substantially then the definition source must change

@cmungall
Copy link
Author

OK, looking only at logical axioms, I see >1k classes have at least one logical axiom changed. Some of these are not meaningful and may represent redundancies on one or other of the hierarchies, but the majority seem meaningful, see attached

diff.txt

@cmungall
Copy link
Author

Note a number of IDs seem to have disappeared in SO_refactored, but are still present in MSO?

-id: SO:0000054 ! aneuploid
-id: SO:0000055 ! hyperploid
-id: SO:0000056 ! hypoploid
-id: SO:0000359 ! floxed
-id: SO:0000443 ! polymer attribute
-id: SO:0000628 ! chromosomal structural element
-id: SO:0000687 ! deletion junction
-id: SO:0000733 ! feature attribute
-id: SO:0000782 ! natural
-id: SO:0000784 ! foreign
-id: SO:0000814 ! rescue
-id: SO:0000817 ! wild type
-id: SO:0000831 ! gene member region
-id: SO:0000856 ! conserved
-id: SO:0000857 ! homologous
-id: SO:0000858 ! orthologous
-id: SO:0000859 ! paralogous
-id: SO:0000860 ! syntenic
-id: SO:0000976 ! cryptic
-id: SO:0001004 ! low complexity
-id: SO:0001079 ! polypeptide structural motif
-id: SO:0001234 ! mobile
-id: SO:0001409 ! biomaterial region
-id: SO:0001410 ! experimental feature
-id: SO:0001411 ! biological region
-id: SO:0001412 ! topologically defined region
-id: SO:0001761 ! variant quality
-id: SO:0001769 ! variant phenotype --> variant defined by phenotype
-id: SO:0001814 ! coding variant quality
-id: SO:0001815 ! synonymous
-id: SO:0001816 ! non synonymous
-id: SO:0001992 ! nonsynonymous variant
-id: SO:0100001 ! biochemical region of peptide
-id: SO:0100017 ! polypeptide conserved motif
-id: SO:1000160 ! unoriented insertional duplication --> insertional duplication of unspecified orientation

@mikebada
Copy link
Collaborator

Re allele, the logical axioms in the refactored MSO/SO were created to properly place the allele class in the new upper-level structuring. I'd argue that asserting allele as a subclass of variant_collection (as it is in the current public SO) is erroneous--even the original natural language definition asserts that it's one of a set of coexisting sequence variants of a gene.

That being said, we do realize that there's more work still to be done, including automatically fixing IDs, synonyms, and natural-language definitions, about which I'm talking with @msinclair2. I'm also going to start another manual review this week to look for errors to be manually fixed.

@msinclair2
Copy link
Collaborator

All the ID annotations have been automatically fixed.

@mikebada
Copy link
Collaborator

Re classes in the MSO but not SO_refactored: Most of those listed above are SDCs. As I understand the BFO, DCs can't bear DCs themselves, so there's no need for these SDCs in the SO, as the SO sequence entity classes are GDCs.

There are a few sequence entity classes that I recommended for obsoletion, as I thought they were difficult to situate in the refactored upper-level structuring, which I've discussed with Karen and Michael. There may be a few other classes that were inadvertently dropped.

@cmungall
Copy link
Author

cmungall commented Feb 26, 2019 via email

@mikebada
Copy link
Collaborator

Yes, I meant the qualities and realizable entities. I've talked about these with Karen a while ago, who as I recall said that these were mostly created for the formal definitions and not used by the sequence annotators. That being said, another thing on the set of tasks still to do was to make sure that all classes that have been actually used in annotations are accounted for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants