Skip to content

Run Picard tools and collate multiple metrics files.

License

Notifications You must be signed in to change notification settings

harmjanwestra/picardmetrics

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

picardmetrics

Run Picard tools and collate multiple metrics files.

Summary

picardmetrics runs 10 Picard tools on a BAM file:

You can find additional scripts in the scripts/ folder:

  • make_refFlat creates a refFlat file with (human) Gencode v19 gene annotations.

  • make_rRNA_intervals creates an intervals_list file with all human ribosomal RNA genes.

  • plot_picardmetrics.R shows how to read and plot the metrics.

Commands

$ picardmetrics
Usage: picardmetrics COMMAND
  run         Run the Picard tools on a given BAM file.
  collate     Collate metrics files for multiple BAM files.

$ picardmetrics run
Usage: picardmetrics run [-f FILE] [-r] <file.bam>
  -f FILE   The configuration file. (Default: ~/.picardmetricsrc)
  -r        The BAM file has RNA-seq reads. (Default: false)

$ picardmetrics collate
Usage: picardmetrics collate PREFIX <file.bam> [<file.bam> ...]

Installation

# Download the code.
git clone git@github.com:slowkow/picardmetrics.git

# Install the script to your preferred location.
cd picardmetrics
cp picardmetrics ~/bin/

# Copy and edit the configuration file to match your system.
cp picardmetricsrc ~/.picardmetricsrc
vim ~/.picardmetricsrc

You also need to install these dependencies:

Examples

I've included two BAM files, each with 10,000 mapped reads, to illustrate the usage of picardmetrics. Please see the data/ folder.

Here are three examples of how you can run the program:

  1. Run picardmetrics sequentially (in a for loop) on multiple BAM files.

  2. Run in parallel with GNU parallel, using multiple processors or multiple servers.

  3. Run in parallel with an LSF queue, distributing jobs to multiple servers.

Example 1: Sequential

Run the Picard tools on the provided example BAM files:

$ for f in data/project1/sample?/sample?.bam; do picardmetrics run -r $f; done

Collate the generated metrics files:

$ picardmetrics collate data/project1 data/project1/sample?/sample?.sorted.bam

Example 2: GNU parallel

Run 2 jobs in parallel:

$ parallel -j2 picardmetrics run -r {} ::: data/project1/sample?/sample?.bam

If you have many files, or if you want to run jobs on multiple servers, it's a good idea to put the full paths in a text file.

Here, we have ssh access to server1 and server2. We're launching 16 jobs on server1 and 8 jobs on server2. You'll have to make sure that picardmetrics is in your PATH on all servers.

$ ls /full/path/to/data/project1/sample*/sample*.bam > bams.txt
$ parallel -S 16/server1,8/server2 picardmetrics :::: bams.txt

Example 3: LSF

I recommend you install and use asub to submit jobs easily. This command will submit a job for each BAM file to the myqueue LSF queue.

$ cat bams.txt | xargs -i echo picardmetrics run -r {} | asub -j picardmetrics -q myqueue

About

Run Picard tools and collate multiple metrics files.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 87.2%
  • R 12.8%