You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The handful of algorithms I've sketched out and discussed with @armish and @ryan-williams, there are broadly two steps:
Gather variant RNAseq reads: Either using gapped alignment to the reference or by generating a set of phased candidate haplotype sequences around the variant (all combinations of nearby variants) and using either k-mer lookups or an FM-index to find matching reads.
Assemble partial transcript sequence(s): In the "naive" version @armish is working on we get a sequence by taking the most common nucleotide at each offset from the variant. We could also build an overlap graph of the filtered reads and assemble multiple candidate sequences. Each sequence should be accompanied by an abundance estimate.
Third step (which may live in this package or in PGV):
To determine a partial protein sequence from an assembled sequence, it needs to be placed in a reading frame. The easiest way I can think of doing this is to use the known reading frame of annotated transcripts overlapping the variant locus which match the assembled sequence before the variant. We can't expect to match after the variant due to exon truncation or intron retention (i.e. variant splicing).
The text was updated successfully, but these errors were encountered:
Abandoning full-blown assembly in favor of only doing local phasing within a single read length. Briefly discussed with @JPFinnigan the possibility of trying to do limited assembly of the sequence between mate pairs but I'll leave that for future work.
The handful of algorithms I've sketched out and discussed with @armish and @ryan-williams, there are broadly two steps:
Third step (which may live in this package or in PGV):
To determine a partial protein sequence from an assembled sequence, it needs to be placed in a reading frame. The easiest way I can think of doing this is to use the known reading frame of annotated transcripts overlapping the variant locus which match the assembled sequence before the variant. We can't expect to match after the variant due to exon truncation or intron retention (i.e. variant splicing).
The text was updated successfully, but these errors were encountered: