forked from samtools/htslib
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Make mpileup's overlap removal choose a random sequence.
Currently it always chooses the second sequence (except for the circumstance of differing base calls). This is essentially random strand and random coordinate in most library strategies, but some targetted sequencing methods have a very strong strand bias (first is + strand, second is - strand) or positional bias (eg PCR amplicons). Given SNPs near the end of sequences can give rise to poor BAQ scores, both position and strand bias are detrimental. This change makes it select either read 'a' or 'b' based on a hash of the read name. Unlike using a traditional random number generator, this gives it consistent behaviour regardless of how many sequences have gone before. An example from SynDip region 1:185M-200M: No overlap removal: SNP Q>0 / Filtered SNP TP 18830 / 18803 SNP FP 264 / 238 SNP GT 56 / 53 SNP FN 459 / 486 InDel TP 2788 / 2697 InDel FP 1022 / 86 InDel GT 353 / 345 InDel FN 596 / 687 Old removal strategy: SNP Q>0 / Filtered SNP TP 18841 / 18813 SNP FP 270 / 243 SNP GT 56 / 54 SNP FN 448 / 476 InDel TP 2754 / 2663 InDel FP 985 / 83 InDel GT 413 / 404 InDel FN 630 / 721 This PR: SNP Q>0 / Filtered SNP TP 18841 / 18814 SNP FP 272 / 242 SNP GT 55 / 53 SNP FN 448 / 475 InDel TP 2765 / 2679 InDel FP 996 / 85 InDel GT 382 / 375 InDel FN 619 / 705 The CPU cost on bcftools mpileup | bcftools call between the latter two tests was 0.4% (which may also just be random fluctuation). Vs the old removal system, this is a marginal improvement for SNPs and, oddly, a significant improvement to Indels. (It's still behind no overlap removal for indels, but I'm unsure on the veracity of small indels in that truth set). Fixes samtools/bcftools#1459
- Loading branch information
1 parent
c3ba302
commit 662227a
Showing
2 changed files
with
77 additions
and
46 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters