Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deepTools like (gatk) DepthOfCoverage tool #828

Closed
steffenheyne opened this issue Apr 2, 2019 · 9 comments
Closed

deepTools like (gatk) DepthOfCoverage tool #828

steffenheyne opened this issue Apr 2, 2019 · 9 comments

Comments

@steffenheyne
Copy link
Collaborator

steffenheyne commented Apr 2, 2019

it would be nice if one could get a per base (per target region) coverage information from DeepTools in the spirit of GATK DepthOfCoverage or samtools depth or bedtools coverage like "multiBamCoverage"

bedtools coverage is missing filtering options
bedtools multicov is only per region
samtools depth is slow for many bam files/regions
GATK has license issues while installation and also not fast
deepTools so far has only tools for regions, bamCoverage is always full genome

only fast alternative is sambamba depth base/region, like samtools depth, but multi-threaded, only no summary statistics

It should allow for the usual DeepTools read/mapping filtering options.
It should also print out 0-coverage bases.

Summary statistics per base per region would be another nice feature.

@dpryan79
Copy link
Collaborator

dpryan79 commented Apr 8, 2019

Should this accept multiple equivalents of -ct like with GATK? That is --threshold 5 10 15 20 to get outputs at 0:21:5 in python parlance?

@steffenheyne
Copy link
Collaborator Author

yeah, either some reasonable defaults or the user can give the -ct thresholds

@steffenheyne
Copy link
Collaborator Author

Would it be possible to correct/check for mate overlaps for the per-base coverages or the average region coverages?

@dpryan79
Copy link
Collaborator

dpryan79 commented Apr 8, 2019

Unlikely

@dpryan79
Copy link
Collaborator

dpryan79 commented Apr 8, 2019

One would need to extend the reads and filter out one mate.

@dpryan79
Copy link
Collaborator

At the moment I have this writing something like the following to a file:

Sample  Threshold       Percent
RNAseq  0       100.00
H3K4Me3 0       100.00
RNAseq  5        0.94
H3K4Me3 5        0.08
RNAseq  10       0.66
H3K4Me3 10       0.06
RNAseq  20       0.45
H3K4Me3 20       0.03
RNAseq  30       0.36
H3K4Me3 30       0.03
RNAseq  50       0.27
H3K4Me3 50       0.01

That's basically a tidy dataframe, so it can be easily plotted later in R. The thresholds can be specified with -ct (-ct 0 -ct 5 -ct 10 -ct 20 -ct 30 -ct 50 in this case).

@dpryan79
Copy link
Collaborator

dpryan79 commented May 16, 2019

Note that this is "bases/regions with at least the given coverage", which isn't exactly the same as a real depth of coverage.

@dpryan79
Copy link
Collaborator

I'll see if I can convert the regions to a set of 1 base positions, in which case the output would be exact and correct even when not using bins.

@dpryan79
Copy link
Collaborator

This is now implemented for the 3.3.0 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants