Skip to content
udo-stenzel edited this page Sep 10, 2012 · 4 revisions

This is BWA from http://bio-bwa.sourceforge.net/, extended with a unified workflow that is much easier to use and can make use of a networked compute cluster.

In short, the idea is to store everything in BAM files, especially unaligned(!) reads. Now in the general case, to get from an unaligned BAM file to an aligned one, you'd have to run something along the lines of:

bwa aln path_to_genome -b0 input.bam > 0.sai
bwa aln path_to_genome -b1 input.bam > 1.sai
bwa aln path_to_genome -b2 input.bam > 2.sai
bwa samse path_to_genome 0.sai input.bam | samtools view -bS - > se.bam
bwa sampe path_to_genome 1.sai 2.sai input.bam input.bam | samtools view -bS - > pe.bam
samtools cat se.bam pe.bam > out.bam

If you want to run on more than one machine, you also have to split the input and manage lots of files, which is no fun. Instead, you can now run

bwa bam2bam -g path_to_genome -f output.bam input.bam -p 6969

As a bonus, samse and sampe run in multiple threads, input, computation and output overlap and you can add more workers to the cooperating network by running anywhere

bwa worker numthreads hostname 6969

The current version runs fine and is used in production, despite some limitations .

Clone this wiki locally