Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force global (a.k.a. end-to-end) alignment #203

Open
JingaJenga opened this issue Nov 3, 2023 · 10 comments
Open

Force global (a.k.a. end-to-end) alignment #203

JingaJenga opened this issue Nov 3, 2023 · 10 comments

Comments

@JingaJenga
Copy link

Hello!

I'm using wfmash to align together the two assembled haplotype sequences of a genomic region. I get an alignment that is good but does not go all the way to the ends of the two sequences. This is especially strange because I know the sequences match at their ends (42kbp perfect match.) Is there any way I can force wfmash to align end-to-end?

Cheers,
-- Josh

@ekg
Copy link
Collaborator

ekg commented Nov 4, 2023 via email

@ekg
Copy link
Collaborator

ekg commented Nov 4, 2023 via email

@JingaJenga
Copy link
Author

Thanks for your quick response!

I discovered the -N flag already and it sorta helps, but not fully. Without it I get two separate alignments, which don't cover the whole sequence even combined. With it the alignments are merged.

How would I construct a full length mapping in PAF format? Would I run wfmash once with --approx-map, then manually edit the paf file's CIGAR string before running wfmash again with -i?

@JingaJenga
Copy link
Author

Here's an example of a failure. See the attached files (you can remove the .txt extension; I added it so github would upload the files.) The fasta has 2 sequences, which are ~800kbp each and which are identical for the first 42 kbp. When I align them as follows:

samtools faidx wfmash_fail.fasta
wfmash -N -s 5000 -l 25000 -p 90 -n 1 -k 19 -H 0.001 -X -t 32 wfmash_fail.fasta

I get the attached PAF file, in which the alignment starts at 1408 on both sequences. It seems like, even if wfmash were not explicitly aiming for an end-to-end-match, it should extend the alignment to the start of the sequences.

Cheers,
-- Josh

wfmash_fail.fasta.txt
wfmash_fail.paf.txt

@JingaJenga
Copy link
Author

By the way I'm using wfmash v0.10.3-3-g8ba3c53 if that helps.

@ekg
Copy link
Collaborator

ekg commented Nov 4, 2023 via email

@JingaJenga
Copy link
Author

Setting -p 70 worked, thank you! It looks like the lower I set -p, the earlier the alignment starts:

-p 90 -> 1408
-p 80 -> 256
-p 70 -> 0

This is great, but I don't understand how it could follow logically - is this just my lack of understanding of the mashmap algorithm?

@JingaJenga
Copy link
Author

Returning to my original question - how can I guarantee that the alignment will be end-to-end? It seems like reducing -p 70 (or even low as 50) fails to produce end-to-end alignments in some other situations.

@biomonika
Copy link

I have the same question -- would love to have an option to force end-to-end alignments.

@ekg
Copy link
Collaborator

ekg commented Apr 1, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants