You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Many false positives seem to be a result of poor alignment of contigs to the genome, which are resulting is bad annoation. e.g.
k49_1134199 0 chr21 44220603 3 281S16M6810N5M2435N6M6199N13M50029N16M1I6M32I32M21I4M4911N3M16052N8M8047N10M2202N2I11M1I25M3I22M6D11M750N10M1163N4M1809N3M1264N7M24156N4M6825N5M3920N2M12899N7M4539N3M7819N8M10199N7M1D1M4453N8M5667N2M11730N3M55386N4M402730N2M6445N3I6M15389N1M13514N9M1I47M2N4I126M15D1279M * 00 CAGCGCTCCTGGCCCCCCGAAGTCCCAGAGCTGCTGACCCCCACCCCAGCTGCATCAGAGAGCCTGTCTGGGGCCAAGGTTGCCAGAGATTTCTGAAGACACAGCTTGTTCCTTGTTCTTGGCTGGTGGGTGCACAAGGACTTCTGGAAGGGATTTAGACGGGGCTGAGTGCTAGGATTAAAGTGGGGATGGGAGTACGGCAACAGAAAAACCTGGGAGCTAGCAATGCACCCAGCCCTTGACTGTGCCCTGGTGGACAGCCGAGCTGTGGCTCTAGCGTGAGCCAGTGCCTTCCTGTCCCTGCCAAGGGTGAGGCCAGAGTTGGCCCCGAGGCTAATGTTTCAGTGGGTGAGATTAGGTCGGCCGTACAGAGGCCGGTGGGCTCCCTGACATCCCTTCCAGGCAACCTGAAAGCACTGAAATAGCTTATGGCCCTGTGCCAGGGACCTTGGCCCAAGCTGCTGACCTCCAGGGTGGGGAGGGAGCTACCCCCAGGAGAAGAGTCACTCAGACAGCAGTATGAGCAAGCCAGCCAGCAGCTCCGTGCCTGCACCCAGCTCAGGGGAATCCCAGGGGGTTCAGATGCCCAGGAAGGAAAAGGGGACAGCGCTACTGCTATGGAATGAGACCACCACTTCTCCTGTTGTCCTTCCCAGCTTCTCCCCAACCTCCCCTTTTCCCTAGTTTATAAGACAGGAGAAAAGGGAGAAAGCAAAAAGCTGGAAAGAAACAGAAGTAAGATAAATAGCTAGACGACCTTGGCGCCACCACCTGGCCCTGGTGGTTAAAATGATAATAATATTAACCCCTGACCAAAACGACTGGTGTTATCTGTAAATCCCAGACATTGTGTGAGAAAGCACCGTAAAACTTTTTGTCCTATTAGCTGATGTGTGTAGCCCCCAGTCACGTTCCTCACGCTTACTTGATCTATTATGACCCTTTCACGTGGACCCCTTAGAGTTGTAAGCTCTTAAAAGGGCTAGGAATTTCTTTTTCGGGGAGCTCGGCTCTTAAGACGCAAGTCTGCTGACACTCCTGGCCAAATAAAGCCCTTCCTTCTTTAACCGAGTGTCTGAGGAATTCTGTCTGCGGCTTGTCCGGCTACAACGGTGCTGGAGCCCAGACTCTCAGGGAAAGGAACCCGAGCCGTCAGAAAACCATCTGATTCCAGGCTGGGGCAAGGGACATGGAGATGGGCCTGCAGCATCATGTTGCTCCAGAAAGCAAGAAAGTGCTCAGAACGGTAGAACGGGGATGCATGGACAGGACACGCAGCCAGACCTAGCGGATTTGAGCATCTCGGGGAAGAAAGGACAGCCACAGATCATGCACTACTGAACAAAATAAAACTGTGGGTCACGCTGATGAGAGAGAGGCTGCAGAGAAGGAGAGACCCTTCCTTAGGTTGGCAGCCGTGAGTGGCAGGCGGGGACCAGCACGGCACCAATCTGCAGCCATCGCAGTGATGGCGGCTTCAGGCGGGGACCTCCGCGGATGCTGAGCCTGCGGGTGCGATTTGATGAGGGCAGAACCTCACCAGCCCACAGTGGCTGCGAGGGGATCATGCAGCGGGATGGGGAGGCCGGGGGGATGCCGTCTCAGCAGAGCCGTCCACGCTGACCTCATCAAGACTGGGACGGGGCCACAGCAGTGCCTCTCATGGGCACTTAGGACACCGTCACTGAGGGGCTCCTGCCAAAGCACACCTGAGTCCAGGCAGAGGAAACTCCAGACAAGACCCCCGAGGGTCATGCTACAAAGCTGCTCTCCTGACTTCCTCAGAAACGCCCAAGGACAGGAAAGACAAAGAAAGCTGAGGACTTGTCCAGATTCAAGAAGCCCAAGGAGACGGCTGAGCGTAGGGCGAGCCTGGGTGAGGAGATTCAGAGCGTTAGACGGCTGAGCGCAGTGTGTGAACCTGGGTTAGGAGATTTGGGGCCTGAGATGGCTGAGTGCAGGGTGAGCCTGAGTGAGGAGATTCTGAGCCTGAGACAGCTGAGCACAGGGTGAGCCTGGGTGACAAAATCCACCAGGAAAATATGCTCACGAAGACATCATTGGGACAACCAATAAAATATGCGT * MD:Z:35AG4G1T8AC19C1C1CG3CC1GCT2CC2A7G24T4TCT3A2CCTC2GCT1A1T1T6C1T2TGAGGG2C1^GGGACA1CA1G48G17^G4A1G43C2C3C3TT7A1CC33A14T29C3T18C16C1A6T5TT^ATTATTATTATTAAC13T19T11A3A6TT1C8C3T2G1C8CA4A10A5C3A3G2CT6C1CA9T23C313C14A784 NH:i:1HI:i:1 NM:i:175 SM:i:40 XQ:i:40 X2:i:0 XO:Z:UU
The read should align to chr21:43268915-43270392 and chr2:231884096-231893280
Replace the following stage with minimap2 could be a simple improvement (but require a bunch of validation work). Keen on your thoughts @mcmero
Thanks Nadia, I think replacing GMAP with minimap2 could work nicely to improve the contig alignments. Happy to implement in a separate branch. As you say, validation would be a lot of work, so would need to discuss further.
Many false positives seem to be a result of poor alignment of contigs to the genome, which are resulting is bad annoation. e.g.
k49_1134199 0 chr21 44220603 3 281S16M6810N5M2435N6M6199N13M50029N16M1I6M32I32M21I4M4911N3M16052N8M8047N10M2202N2I11M1I25M3I22M6D11M750N10M1163N4M1809N3M1264N7M24156N4M6825N5M3920N2M12899N7M4539N3M7819N8M10199N7M1D1M4453N8M5667N2M11730N3M55386N4M402730N2M6445N3I6M15389N1M13514N9M1I47M2N4I126M15D1279M * 00 CAGCGCTCCTGGCCCCCCGAAGTCCCAGAGCTGCTGACCCCCACCCCAGCTGCATCAGAGAGCCTGTCTGGGGCCAAGGTTGCCAGAGATTTCTGAAGACACAGCTTGTTCCTTGTTCTTGGCTGGTGGGTGCACAAGGACTTCTGGAAGGGATTTAGACGGGGCTGAGTGCTAGGATTAAAGTGGGGATGGGAGTACGGCAACAGAAAAACCTGGGAGCTAGCAATGCACCCAGCCCTTGACTGTGCCCTGGTGGACAGCCGAGCTGTGGCTCTAGCGTGAGCCAGTGCCTTCCTGTCCCTGCCAAGGGTGAGGCCAGAGTTGGCCCCGAGGCTAATGTTTCAGTGGGTGAGATTAGGTCGGCCGTACAGAGGCCGGTGGGCTCCCTGACATCCCTTCCAGGCAACCTGAAAGCACTGAAATAGCTTATGGCCCTGTGCCAGGGACCTTGGCCCAAGCTGCTGACCTCCAGGGTGGGGAGGGAGCTACCCCCAGGAGAAGAGTCACTCAGACAGCAGTATGAGCAAGCCAGCCAGCAGCTCCGTGCCTGCACCCAGCTCAGGGGAATCCCAGGGGGTTCAGATGCCCAGGAAGGAAAAGGGGACAGCGCTACTGCTATGGAATGAGACCACCACTTCTCCTGTTGTCCTTCCCAGCTTCTCCCCAACCTCCCCTTTTCCCTAGTTTATAAGACAGGAGAAAAGGGAGAAAGCAAAAAGCTGGAAAGAAACAGAAGTAAGATAAATAGCTAGACGACCTTGGCGCCACCACCTGGCCCTGGTGGTTAAAATGATAATAATATTAACCCCTGACCAAAACGACTGGTGTTATCTGTAAATCCCAGACATTGTGTGAGAAAGCACCGTAAAACTTTTTGTCCTATTAGCTGATGTGTGTAGCCCCCAGTCACGTTCCTCACGCTTACTTGATCTATTATGACCCTTTCACGTGGACCCCTTAGAGTTGTAAGCTCTTAAAAGGGCTAGGAATTTCTTTTTCGGGGAGCTCGGCTCTTAAGACGCAAGTCTGCTGACACTCCTGGCCAAATAAAGCCCTTCCTTCTTTAACCGAGTGTCTGAGGAATTCTGTCTGCGGCTTGTCCGGCTACAACGGTGCTGGAGCCCAGACTCTCAGGGAAAGGAACCCGAGCCGTCAGAAAACCATCTGATTCCAGGCTGGGGCAAGGGACATGGAGATGGGCCTGCAGCATCATGTTGCTCCAGAAAGCAAGAAAGTGCTCAGAACGGTAGAACGGGGATGCATGGACAGGACACGCAGCCAGACCTAGCGGATTTGAGCATCTCGGGGAAGAAAGGACAGCCACAGATCATGCACTACTGAACAAAATAAAACTGTGGGTCACGCTGATGAGAGAGAGGCTGCAGAGAAGGAGAGACCCTTCCTTAGGTTGGCAGCCGTGAGTGGCAGGCGGGGACCAGCACGGCACCAATCTGCAGCCATCGCAGTGATGGCGGCTTCAGGCGGGGACCTCCGCGGATGCTGAGCCTGCGGGTGCGATTTGATGAGGGCAGAACCTCACCAGCCCACAGTGGCTGCGAGGGGATCATGCAGCGGGATGGGGAGGCCGGGGGGATGCCGTCTCAGCAGAGCCGTCCACGCTGACCTCATCAAGACTGGGACGGGGCCACAGCAGTGCCTCTCATGGGCACTTAGGACACCGTCACTGAGGGGCTCCTGCCAAAGCACACCTGAGTCCAGGCAGAGGAAACTCCAGACAAGACCCCCGAGGGTCATGCTACAAAGCTGCTCTCCTGACTTCCTCAGAAACGCCCAAGGACAGGAAAGACAAAGAAAGCTGAGGACTTGTCCAGATTCAAGAAGCCCAAGGAGACGGCTGAGCGTAGGGCGAGCCTGGGTGAGGAGATTCAGAGCGTTAGACGGCTGAGCGCAGTGTGTGAACCTGGGTTAGGAGATTTGGGGCCTGAGATGGCTGAGTGCAGGGTGAGCCTGAGTGAGGAGATTCTGAGCCTGAGACAGCTGAGCACAGGGTGAGCCTGGGTGACAAAATCCACCAGGAAAATATGCTCACGAAGACATCATTGGGACAACCAATAAAATATGCGT * MD:Z:35AG4G1T8AC19C1C1CG3CC1GCT2CC2A7G24T4TCT3A2CCTC2GCT1A1T1T6C1T2TGAGGG2C1^GGGACA1CA1G48G17^G4A1G43C2C3C3TT7A1CC33A14T29C3T18C16C1A6T5TT^ATTATTATTATTAAC13T19T11A3A6TT1C8C3T2G1C8CA4A10A5C3A3G2CT6C1CA9T23C313C14A784 NH:i:1HI:i:1 NM:i:175 SM:i:40 XQ:i:40 X2:i:0 XO:Z:UU
The read should align to chr21:43268915-43270392 and chr2:231884096-231893280
Replace the following stage with minimap2 could be a simple improvement (but require a bunch of validation work). Keen on your thoughts @mcmero
align_contigs_against_genome = {
def sample_name = branch.name
output.dir = sample_name
produce('aligned_contigs_against_genome.sam'){
exec """
$gmap -D $gmap_refdir -d $gmap_genome -f samse -t $threads -x $min_gap --max-intronlength-ends=500000 -n 0 $input.fasta > $output
""", "align_contigs_against_genome"
}
}
The text was updated successfully, but these errors were encountered: