Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Steps on the road from RNAseq to variant protein sequence #3

Closed
iskandr opened this issue Feb 4, 2016 · 1 comment
Closed

Steps on the road from RNAseq to variant protein sequence #3

iskandr opened this issue Feb 4, 2016 · 1 comment

Comments

@iskandr
Copy link
Contributor

iskandr commented Feb 4, 2016

The handful of algorithms I've sketched out and discussed with @armish and @ryan-williams, there are broadly two steps:

  1. Gather variant RNAseq reads: Either using gapped alignment to the reference or by generating a set of phased candidate haplotype sequences around the variant (all combinations of nearby variants) and using either k-mer lookups or an FM-index to find matching reads.
  2. Assemble partial transcript sequence(s): In the "naive" version @armish is working on we get a sequence by taking the most common nucleotide at each offset from the variant. We could also build an overlap graph of the filtered reads and assemble multiple candidate sequences. Each sequence should be accompanied by an abundance estimate.

Third step (which may live in this package or in PGV):

To determine a partial protein sequence from an assembled sequence, it needs to be placed in a reading frame. The easiest way I can think of doing this is to use the known reading frame of annotated transcripts overlapping the variant locus which match the assembled sequence before the variant. We can't expect to match after the variant due to exon truncation or intron retention (i.e. variant splicing).

@iskandr
Copy link
Contributor Author

iskandr commented Mar 30, 2016

Abandoning full-blown assembly in favor of only doing local phasing within a single read length. Briefly discussed with @JPFinnigan the possibility of trying to do limited assembly of the sequence between mate pairs but I'll leave that for future work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant