Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Definition of isoform (MSO:3000321) #1

Open
nataled opened this issue Mar 15, 2018 · 5 comments
Open

Definition of isoform (MSO:3000321) #1

nataled opened this issue Mar 15, 2018 · 5 comments

Comments

@nataled
Copy link

nataled commented Mar 15, 2018

The definition states, at the end, that an isoform may differ from other variants owing to post-translational modification. Is this intended to mean exclusively proteolytic cleavages, or does it include covalent attachments? Either way, I think it would be good to bring some clarity to terms like 'isoform' which have been used to mean just about anything that differs in some way from anything else. Is that term used to mean anything other than splice variants (mRNA and the resulting protein) these days?

@msinclair2
Copy link
Collaborator

Good point, it is important that definitions reflect current usage and bring implicit distinctions to light. Personally I have also heard the word isoform being used mainly of splice variants. Polymorphism should probably be its own class to reflect the different usage. I'm not sure about post-translational modifications.

@mikebada
Copy link
Collaborator

This is an excellent ontological question, and I'm also unsure of how broadly it should be defined. The Human Protein Atlas, in discussing the human isoform proteome, says: "The structural space of the human proteome is large and diverse due to the presence of various protein variants (isoforms), including post-translational modifications, splice variants, proteolytic products, genetic variations and somatic recombination." This seems to be pretty close to anything that differs in some way from anything else, as you say. Note that we also have a variant class, essentially carried over from the current SO:sequence_variant class, so we'll have to decide if we want to merge these or differentiate them.

@nataled
Copy link
Author

nataled commented Mar 17, 2018

I can point you to this: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4114032/ which describes a term that--at least for proteins--has that all-inclusive meaning. In PRO we have 'categories' of terms that give an indication of origin of differences. Terms belonging to the 'gene' category will differ from its siblings based on the gene that encodes them. Terms belonging to the 'sequence' category will differ based on sequence differences even if the proteins are translation products of the same gene. Within that category we distinguish between 'isoforms' (which we consider solely as deriving from distinct splice variants) and 'sequence variant' (which would be derived from distinct alleles, for example). Finally, we have the 'modification' category, in which sibling classes can come from the exact same gene and allele thereof and even the same splice variant (if you consider that to be based on exons), but differ in some other post-translational modification (which includes covalent attachments and cleavage events. That just gives an idea of what my thinking is in terms of providing some clarity.

@cmungall
Copy link

The most straightforward approach is for MSO to be analogous to PRO, and to model this as a metaclass (in PRO these are represented as subsets, but really they are metaclasses). But the challenge here is the inability to answer questions such as 'how many distinct {reference proteins, structural isoforms} are there in the human genome' in a straightforward way. Perhaps these are actually GDC questions, in which case SO would not parallel MSO here.

@mikebada
Copy link
Collaborator

@cmungall I'm surprised to hear you say that metaclasses would be the most straightforward approach. Why not just create different subclasses of isoforms/variants? I use metaclasses for my work, and they're useful for me, but they don't seem to be straightforward for less-ontologically-minded folks.

Also, I'm not sure if I'm seeing everything, but the biolink-model document doesn't seem to really address the ontology of isoforms/variants briefly discussed above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants