Run Picard tools and collate multiple metrics files.
picardmetrics
runs 10 Picard tools on a BAM file:
- SortSam
- MarkDuplicates
- CollectMultipleMetrics
- CollectRnaSeqMetrics
- CollectGcBiasMetrics
- EstimateLibraryComplexity
You can find additional scripts in the scripts/ folder:
-
make_refFlat
creates arefFlat
file with (human) Gencode v19 gene annotations. -
make_rRNA_intervals
creates anintervals_list
file with all human ribosomal RNA genes. -
plot_picardmetrics.R
shows how to read and plot the metrics.
$ picardmetrics
Usage: picardmetrics COMMAND
run Run the Picard tools on a given BAM file.
collate Collate metrics files for multiple BAM files.
$ picardmetrics run
Usage: picardmetrics run [-f FILE] [-r] <file.bam>
-f FILE The configuration file. (Default: ~/.picardmetricsrc)
-r The BAM file has RNA-seq reads. (Default: false)
$ picardmetrics collate
Usage: picardmetrics collate PREFIX <file.bam> [<file.bam> ...]
# Download the code.
git clone git@github.com:slowkow/picardmetrics.git
# Install the script to your preferred location.
cd picardmetrics
cp picardmetrics ~/bin/
# Copy and edit the configuration file to match your system.
cp picardmetricsrc ~/.picardmetricsrc
vim ~/.picardmetricsrc
You also need to install these dependencies:
I've included two BAM files, each with 10,000 mapped reads, to illustrate the
usage of picardmetrics
. Please see the data/ folder.
Here are three examples of how you can run the program:
-
Run
picardmetrics
sequentially (in a for loop) on multiple BAM files. -
Run in parallel with GNU parallel, using multiple processors or multiple servers.
-
Run in parallel with an LSF queue, distributing jobs to multiple servers.
Run the Picard tools on the provided example BAM files:
$ for f in data/project1/sample?/sample?.bam; do picardmetrics run -r $f; done
Collate the generated metrics files:
$ picardmetrics collate data/project1 data/project1/sample?/sample?.sorted.bam
Run 2 jobs in parallel:
$ parallel -j2 picardmetrics run -r {} ::: data/project1/sample?/sample?.bam
If you have many files, or if you want to run jobs on multiple servers, it's a good idea to put the full paths in a text file.
Here, we have ssh access to server1
and server2
. We're launching 16 jobs
on server1
and 8 jobs on server2
. You'll have to make sure that
picardmetrics
is in your PATH
on all servers.
$ ls /full/path/to/data/project1/sample*/sample*.bam > bams.txt
$ parallel -S 16/server1,8/server2 picardmetrics :::: bams.txt
I recommend you install and use asub to submit jobs easily. This command
will submit a job for each BAM file to the myqueue
LSF queue.
$ cat bams.txt | xargs -i echo picardmetrics run -r {} | asub -j picardmetrics -q myqueue