Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace gmap with minimap2 in align_contigs_against_genome #37

Open
nadiadavidson opened this issue Aug 10, 2023 · 2 comments
Open

Replace gmap with minimap2 in align_contigs_against_genome #37

nadiadavidson opened this issue Aug 10, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@nadiadavidson
Copy link
Member

Many false positives seem to be a result of poor alignment of contigs to the genome, which are resulting is bad annoation. e.g.
k49_1134199 0 chr21 44220603 3 281S16M6810N5M2435N6M6199N13M50029N16M1I6M32I32M21I4M4911N3M16052N8M8047N10M2202N2I11M1I25M3I22M6D11M750N10M1163N4M1809N3M1264N7M24156N4M6825N5M3920N2M12899N7M4539N3M7819N8M10199N7M1D1M4453N8M5667N2M11730N3M55386N4M402730N2M6445N3I6M15389N1M13514N9M1I47M2N4I126M15D1279M * 00 CAGCGCTCCTGGCCCCCCGAAGTCCCAGAGCTGCTGACCCCCACCCCAGCTGCATCAGAGAGCCTGTCTGGGGCCAAGGTTGCCAGAGATTTCTGAAGACACAGCTTGTTCCTTGTTCTTGGCTGGTGGGTGCACAAGGACTTCTGGAAGGGATTTAGACGGGGCTGAGTGCTAGGATTAAAGTGGGGATGGGAGTACGGCAACAGAAAAACCTGGGAGCTAGCAATGCACCCAGCCCTTGACTGTGCCCTGGTGGACAGCCGAGCTGTGGCTCTAGCGTGAGCCAGTGCCTTCCTGTCCCTGCCAAGGGTGAGGCCAGAGTTGGCCCCGAGGCTAATGTTTCAGTGGGTGAGATTAGGTCGGCCGTACAGAGGCCGGTGGGCTCCCTGACATCCCTTCCAGGCAACCTGAAAGCACTGAAATAGCTTATGGCCCTGTGCCAGGGACCTTGGCCCAAGCTGCTGACCTCCAGGGTGGGGAGGGAGCTACCCCCAGGAGAAGAGTCACTCAGACAGCAGTATGAGCAAGCCAGCCAGCAGCTCCGTGCCTGCACCCAGCTCAGGGGAATCCCAGGGGGTTCAGATGCCCAGGAAGGAAAAGGGGACAGCGCTACTGCTATGGAATGAGACCACCACTTCTCCTGTTGTCCTTCCCAGCTTCTCCCCAACCTCCCCTTTTCCCTAGTTTATAAGACAGGAGAAAAGGGAGAAAGCAAAAAGCTGGAAAGAAACAGAAGTAAGATAAATAGCTAGACGACCTTGGCGCCACCACCTGGCCCTGGTGGTTAAAATGATAATAATATTAACCCCTGACCAAAACGACTGGTGTTATCTGTAAATCCCAGACATTGTGTGAGAAAGCACCGTAAAACTTTTTGTCCTATTAGCTGATGTGTGTAGCCCCCAGTCACGTTCCTCACGCTTACTTGATCTATTATGACCCTTTCACGTGGACCCCTTAGAGTTGTAAGCTCTTAAAAGGGCTAGGAATTTCTTTTTCGGGGAGCTCGGCTCTTAAGACGCAAGTCTGCTGACACTCCTGGCCAAATAAAGCCCTTCCTTCTTTAACCGAGTGTCTGAGGAATTCTGTCTGCGGCTTGTCCGGCTACAACGGTGCTGGAGCCCAGACTCTCAGGGAAAGGAACCCGAGCCGTCAGAAAACCATCTGATTCCAGGCTGGGGCAAGGGACATGGAGATGGGCCTGCAGCATCATGTTGCTCCAGAAAGCAAGAAAGTGCTCAGAACGGTAGAACGGGGATGCATGGACAGGACACGCAGCCAGACCTAGCGGATTTGAGCATCTCGGGGAAGAAAGGACAGCCACAGATCATGCACTACTGAACAAAATAAAACTGTGGGTCACGCTGATGAGAGAGAGGCTGCAGAGAAGGAGAGACCCTTCCTTAGGTTGGCAGCCGTGAGTGGCAGGCGGGGACCAGCACGGCACCAATCTGCAGCCATCGCAGTGATGGCGGCTTCAGGCGGGGACCTCCGCGGATGCTGAGCCTGCGGGTGCGATTTGATGAGGGCAGAACCTCACCAGCCCACAGTGGCTGCGAGGGGATCATGCAGCGGGATGGGGAGGCCGGGGGGATGCCGTCTCAGCAGAGCCGTCCACGCTGACCTCATCAAGACTGGGACGGGGCCACAGCAGTGCCTCTCATGGGCACTTAGGACACCGTCACTGAGGGGCTCCTGCCAAAGCACACCTGAGTCCAGGCAGAGGAAACTCCAGACAAGACCCCCGAGGGTCATGCTACAAAGCTGCTCTCCTGACTTCCTCAGAAACGCCCAAGGACAGGAAAGACAAAGAAAGCTGAGGACTTGTCCAGATTCAAGAAGCCCAAGGAGACGGCTGAGCGTAGGGCGAGCCTGGGTGAGGAGATTCAGAGCGTTAGACGGCTGAGCGCAGTGTGTGAACCTGGGTTAGGAGATTTGGGGCCTGAGATGGCTGAGTGCAGGGTGAGCCTGAGTGAGGAGATTCTGAGCCTGAGACAGCTGAGCACAGGGTGAGCCTGGGTGACAAAATCCACCAGGAAAATATGCTCACGAAGACATCATTGGGACAACCAATAAAATATGCGT * MD:Z:35AG4G1T8AC19C1C1CG3CC1GCT2CC2A7G24T4TCT3A2CCTC2GCT1A1T1T6C1T2TGAGGG2C1^GGGACA1CA1G48G17^G4A1G43C2C3C3TT7A1CC33A14T29C3T18C16C1A6T5TT^ATTATTATTATTAAC13T19T11A3A6TT1C8C3T2G1C8CA4A10A5C3A3G2CT6C1CA9T23C313C14A784 NH:i:1HI:i:1 NM:i:175 SM:i:40 XQ:i:40 X2:i:0 XO:Z:UU

The read should align to chr21:43268915-43270392 and chr2:231884096-231893280

Replace the following stage with minimap2 could be a simple improvement (but require a bunch of validation work). Keen on your thoughts @mcmero

align_contigs_against_genome = {
def sample_name = branch.name
output.dir = sample_name
produce('aligned_contigs_against_genome.sam'){
exec """
$gmap -D $gmap_refdir -d $gmap_genome -f samse -t $threads -x $min_gap --max-intronlength-ends=500000 -n 0 $input.fasta > $output
""", "align_contigs_against_genome"
}
}

@nadiadavidson nadiadavidson added the enhancement New feature or request label Aug 10, 2023
@mcmero
Copy link
Collaborator

mcmero commented Aug 21, 2023

Thanks Nadia, I think replacing GMAP with minimap2 could work nicely to improve the contig alignments. Happy to implement in a separate branch. As you say, validation would be a lot of work, so would need to discuss further.

@nadiadavidson
Copy link
Member Author

Thanks Marek, we'll have a go on a small dataset and then chat with you more if it looks promising. No need to make a separate branch at this stage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants