-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
covmed overestimates coverage? #19
Comments
yeah, I've noticed this as well. I'll have a look today. Picard and bedtools are doing actual coverage calculations across the whole bam (I'm pretty sure, anyway) while covmed is estimating based on a sample, but it still should be able to have a pretty good estimate. |
@hdashnow would you give one of the attached binaries a try (I have to gzip to attach here so you'll have to unzip and chmod +x). You can now do : |
do this by scaling the coverage estimate by the proportion of reads that are dup|qcfail|secondary|unmapped. see #19
and a caveat is that goleft is likely to be inaccurate for exome or targetted, but I'll improve that a bit more in the future. |
do this by scaling the coverage estimate by the proportion of reads that are dup|qcfail|secondary|unmapped. see #19
Good idea adding that filter. It made the estimates slightly smaller. e.g. 33.04 instead of 33.4. Still nowhere near Picard. |
I've noticed that covmed estimates higher median coverage than other tools. For example for a particular whole genome covmed estimates 33.4, while Picard CollectWgsMetrics estimates 27.
I've performed similar calculations on exomes where I get median coverage of 199.71 with covmed (using the region argument) compared with 189 using bedtools (take the median of counts per base over target region). I've found consistently higher results from covmed compared with picard and bedtools across a number of exomes and genomes. The size of the difference is variable.
I wonder if you have any idea why this is occurring?
One possibility that springs to mind for exomes in particular is that reads outside the target region could be counted and so cause it to overestimate the coverage.
The text was updated successfully, but these errors were encountered: