picardmetrics

Run Picard tools and collate multiple metrics files.

Summary

picardmetrics runs 10 Picard tools on a BAM file:

You can find additional scripts in the scripts/ folder:

make_refFlat creates a refFlat file with (human) Gencode v19 gene annotations.
make_rRNA_intervals creates an intervals_list file with all human ribosomal RNA genes.
plot_picardmetrics.R shows how to read and plot the metrics.

Commands

$ picardmetrics
Usage: picardmetrics COMMAND
  run         Run the Picard tools on a given BAM file.
  collate     Collate metrics files for multiple BAM files.

$ picardmetrics run
Usage: picardmetrics run [-f FILE] [-r] <file.bam>
  -f FILE   The configuration file. (Default: ~/.picardmetricsrc)
  -r        The BAM file has RNA-seq reads. (Default: false)

$ picardmetrics collate
Usage: picardmetrics collate PREFIX <file.bam> [<file.bam> ...]

Installation

# Download the code.
git clone git@github.com:slowkow/picardmetrics.git

# Install the script to your preferred location.
cd picardmetrics
cp picardmetrics ~/bin/

# Copy and edit the configuration file to match your system.
cp picardmetricsrc ~/.picardmetricsrc
vim ~/.picardmetricsrc

You also need to install these dependencies:

Picard
samtools, which depends on htslib
stats

Examples

I've included two BAM files, each with 10,000 mapped reads, to illustrate the usage of picardmetrics. Please see the data/ folder.

Here are three examples of how you can run the program:

Run picardmetrics sequentially (in a for loop) on multiple BAM files.
Run in parallel with GNU parallel, using multiple processors or multiple servers.
Run in parallel with an LSF queue, distributing jobs to multiple servers.

Example 1: Sequential

Run the Picard tools on the provided example BAM files:

$ for f in data/project1/sample?/sample?.bam; do picardmetrics run -r $f; done

Collate the generated metrics files:

$ picardmetrics collate data/project1 data/project1/sample?/sample?.sorted.bam

Example 2: GNU parallel

Run 2 jobs in parallel:

$ parallel -j2 picardmetrics run -r {} ::: data/project1/sample?/sample?.bam

If you have many files, or if you want to run jobs on multiple servers, it's a good idea to put the full paths in a text file.

Here, we have ssh access to server1 and server2. We're launching 16 jobs on server1 and 8 jobs on server2. You'll have to make sure that picardmetrics is in your PATH on all servers.

$ ls /full/path/to/data/project1/sample*/sample*.bam > bams.txt
$ parallel -S 16/server1,8/server2 picardmetrics :::: bams.txt

Example 3: LSF

I recommend you install and use asub to submit jobs easily. This command will submit a job for each BAM file to the myqueue LSF queue.

$ cat bams.txt | xargs -i echo picardmetrics run -r {} | asub -j picardmetrics -q myqueue

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
scripts		scripts
test		test
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
picardmetrics		picardmetrics
picardmetricsrc		picardmetricsrc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

picardmetrics

Summary

Commands

Installation

Examples

Example 1: Sequential

Example 2: GNU parallel

Example 3: LSF

About

Releases

Packages

Languages

License

harmjanwestra/picardmetrics

Folders and files

Latest commit

History

Repository files navigation

picardmetrics

Summary

Commands

Installation

Examples

Example 1: Sequential

Example 2: GNU parallel

Example 3: LSF

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages