Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

covmed overestimates coverage? #19

Open
hdashnow opened this issue Jan 24, 2017 · 4 comments
Open

covmed overestimates coverage? #19

hdashnow opened this issue Jan 24, 2017 · 4 comments

Comments

@hdashnow
Copy link

I've noticed that covmed estimates higher median coverage than other tools. For example for a particular whole genome covmed estimates 33.4, while Picard CollectWgsMetrics estimates 27.
I've performed similar calculations on exomes where I get median coverage of 199.71 with covmed (using the region argument) compared with 189 using bedtools (take the median of counts per base over target region). I've found consistently higher results from covmed compared with picard and bedtools across a number of exomes and genomes. The size of the difference is variable.

I wonder if you have any idea why this is occurring?

One possibility that springs to mind for exomes in particular is that reads outside the target region could be counted and so cause it to overestimate the coverage.

@brentp
Copy link
Owner

brentp commented Jan 24, 2017

yeah, I've noticed this as well. I'll have a look today. Picard and bedtools are doing actual coverage calculations across the whole bam (I'm pretty sure, anyway) while covmed is estimating based on a sample, but it still should be able to have a pretty good estimate.

@brentp
Copy link
Owner

brentp commented Jan 24, 2017

@hdashnow would you give one of the attached binaries a try (I have to gzip to attach here so you'll have to unzip and chmod +x).
This should give a more accurate estimate, but I'd like to see how it performs for your cases.

You can now do : goleft covmed *.bam
so it's easier to run on a group of bams.
goleft_osx.gz
goleft_linux64.gz

brentp added a commit that referenced this issue Jan 24, 2017
do this by scaling the coverage estimate by the proportion of reads
that are dup|qcfail|secondary|unmapped.

see #19
@brentp
Copy link
Owner

brentp commented Jan 24, 2017

and a caveat is that goleft is likely to be inaccurate for exome or targetted, but I'll improve that a bit more in the future.

brentp added a commit that referenced this issue Jan 24, 2017
do this by scaling the coverage estimate by the proportion of reads
that are dup|qcfail|secondary|unmapped.

see #19
@hdashnow
Copy link
Author

hdashnow commented Jan 24, 2017

Good idea adding that filter. It made the estimates slightly smaller. e.g. 33.04 instead of 33.4. Still nowhere near Picard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants