Skip to content

AlexanderLabWHOI/ehux_annotation_consensus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multi-annotation consensus using dynamic programming

This repository contains a dynamic programming implementation to consolidate the results of multiple genome annotation methods into a single gtf file for a marine microbial eukaryote, the haptophyte Emiliania huxleyi (Gephyrocapsa huxleyi). For this project, we had gtf input generated through homology to an existing transcriptome, RNA-Seq read mapping, and the BRAKER3 pipeline. The problem, to generate a single gtf file with the highest possible support from the three input evidence types, can be approached through traversing a score matrix where scores represent the degree of support from input evidence and states in the matrix can be either intergenic, frame 1 exon, frame 2 exon, frame 3 exon, or intron. When traversing through a matrix, transitions between states can only occur at points supported by the input evidence, and the increase in score at each transition corresponds to the number of evidence types supporting this move. This strategy is very similar to the score matrix traversal used in Needleman-Wunsch alignment.

Above is a simple example of a chromosome containing two genes. When traversing along the chromosome, a score matrix is created as soon as a gene start is found. For the first three positions, the algorithm selects to stay in the frame 1 exon, which is supported by three evidence types and thus has a score increase of 3. Transitioning to the intron state at position 4 is supported by two evidence types and thus has a score increase of 2. The last two positions in the gene are all indicated as a frame 1 exon and thus again have a score increase of 3. All three evidence types then support transitioning to an intergenic state i.e. ending the gene. Once the gene has concluded, the algorithm backtracks from the highest score found in order to determine the optimal exon-intron transition path. For the second gene, the top evidence is in a different frame than the other two evidence types, and thus is associated with a different, suboptimal path.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages