Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merges neighbouring I and D ops into one op within pileup. #1552

Merged
merged 2 commits into from
Jan 24, 2023

Conversation

jkbonfield
Copy link
Contributor

This means 4M1D1D1D3M is reported as 4M3D3M instead as far as mpileup is concerned (although samtools view will report as-is) and importantly p->indel=-3 for the first 1D and the 2nd 1D has p->indel=0 (with p->is_del=1, the same as it would for the 2nd base in a 3D cigar op).

Previously samtools mpileup would produce incorrect looking output for the 1D1D scenario. Fixing this in sam.c means not only is samtools mpileup now looking better, but any tool using the mpileup API will be getting consistent results.

Note that samtools mpileup already resolved the ...1I1I1I... case, but it did this within the samools bam_plcmd.c code itself. Hence while the pileup API works, it left p->indel=1 instead of p->indel=3 for this situation. So we also resolve that in a similar fashion.

Note 2P1I1I is reported as p->indel=2 (a 2bp indel) even though bam_plp_insertion would return e.g. +4**AC, as we're reporting the number of bases inserted in this sequence rather than the padded alignment size.

Fixes samtools/samtools#139, or at least the remaining part of the puzzle. Most had previously been fixed already back in 2014.

jkbonfield and others added 2 commits January 17, 2023 17:19
This means 4M1D1D1D3M is reported as 4M3D3M instead, and importantly
"p->indel=-3" for the first 1D and the 2nd 1D has "p->indel=0" (with
p->is_del=1, the same as it would for the 2nd base in a 3D cigar op).

Previously samtools mpileup would produce incorrect looking output for
the 1D1D scenario.  Fixing this in sam.c means not only is samtools
mpileup now looking better, but any tool using the mpileup API will be
getting consistent results.

Note that samtools mpileup already resolved the ...1I1I1I... case, but
it did this within the samools bam_plcmd.c code itself.  Hence while the
pileup API works, it left p->indel=1 instead of p->indel=3 for this
situation.  So we also resolve that in a similar fashion.

Note 2P1I1I is reported as p->indel=2 (a 2bp indel) even though
bam_plp_insertion would return e.g. +4**AC, as we're reporting the
number of bases inserted in this sequence rather than the padded
alignment size.

Fixes samtools/samtools#139, or at least the remaining part of the
puzzle.  Most had previously been fixed already back in 2014.
@daviesrob daviesrob merged commit 1a222ef into samtools:develop Jan 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mpileup on 1I1I vs 2I cigar ops
2 participants