Annotate all transcripts #195

joaoe · 2016-10-04T14:46:58Z

One of the differences between VEP and varcode is that VEP is happy to annotate ALL transcripts in its database, including pseudo-genes. Varcode will limit itself to transcripts which have Transcript.is_protein_coding returning True. Also see openvax/pyensembl#169.
As such, in the case of VEP, it's up for the developer/user to filter out which transcripts he/she finds useful.

I'd like for there to be a way to tell varcode which biotypes should be accepted. A possibility would be to have an optional callback method ('transcript_filter') when calling predict_variant_effects() which returns False or True if a transcript should be annotated. The developer/user would then implement his/her filtering logic, perhaps even filtering transcripts by ID. That way, uninteresting transcripts can be skipped (saving time and CPU cycles), non coding transcripts of interest can be returned.

Another challenge is that VEP also annotates incomplete transcripts. But supporting this might be a bit more laborsome. Perhaps something for a different task.

Thoughts ?

The text was updated successfully, but these errors were encountered:

iskandr · 2016-10-05T18:26:06Z

What kinds of annotations are you interested in for non-coding transcripts? We've been fairly narrowly focused on coding effects so I don't know what you can say about a non-coding transcript.

joaoe · 2016-10-05T18:47:01Z

I'm interested in comparing tools and get them to perform as similar as possible. But I'm not that interested in non coding transcripts. Other people might though. But, by checking for biotype = "protein_coding" you're skipping a bunch of coding biotypes. If there is a generic API for people to pick whichever transcripts their want, then I guess varcode becomes more useful.

iskandr · 2016-10-05T19:03:12Z

If a transcript is already annotated as triggering NMD due to an early stop codon, is it useful to predict some other effect in its protein sequence (e.g. single amino acid substitution)? It might be but I can't currently think of the use-case.

I can try adding a parameter for a set of biotypes on which we perform predictions but it's not clear to me that those predictions will always be meaningful.

joaoe · 2016-10-05T20:15:07Z

Everything you said is quite valid. But that's not the issue. The issue is just letting the user pick and choose which transcripts/biotypes he/she wants, still keeping the current behavior as default. Like, right now, IG* and TR* transcripts are too ignored.

For instance https://github.com/joaoe/varcode/commit/fe02769f199f9e6c6d2a6e8075786cd2a19d2f89

joaoe mentioned this issue Oct 5, 2016

Support all protein coding biotypes openvax/pyensembl#169

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Annotate all transcripts #195

Annotate all transcripts #195

joaoe commented Oct 4, 2016 •

edited

Loading

iskandr commented Oct 5, 2016

joaoe commented Oct 5, 2016 •

edited

Loading

iskandr commented Oct 5, 2016

joaoe commented Oct 5, 2016 •

edited

Loading

Annotate all transcripts #195

Annotate all transcripts #195

Comments

joaoe commented Oct 4, 2016 • edited Loading

iskandr commented Oct 5, 2016

joaoe commented Oct 5, 2016 • edited Loading

iskandr commented Oct 5, 2016

joaoe commented Oct 5, 2016 • edited Loading

joaoe commented Oct 4, 2016 •

edited

Loading

joaoe commented Oct 5, 2016 •

edited

Loading

joaoe commented Oct 5, 2016 •

edited

Loading