Skip to content

Coverage report using sambamba

Hassan Foroughi edited this page Jul 12, 2018 · 3 revisions

Coverage report using sambamba is done using sambamba depth command. It's installed within Cancer-Core conda environment in BALSAMIC. It is important to sort the input bed file according to coordinates, otherwise sambamba might report zero coverage for unsorted regions.

For coverage analysis and reporting, the following parameters are used for Sambamba:

  1. --fix-mate-overlaps: This is to fix the mates that overlapping over a base, and they are essentially counted twice. It is mentioned in the following thread: https://github.com/biod/sambamba/issues/204 which was later added to sambamba.

  2. --min-base-quality: Most of the reads are already high-quality, and they usually get checked during variant calling, base recalibration, etc. But since this is done before those steps, we apply a minimum base quality score of 10.

  3. --filter string: Sambamba depth a default filter string which includes --min-base-quality above and some basic stuff such as not duplicate etc. Here we include couple of more filter strings to make it more accurate. Later, these value come handy, when calculating actual coverage for each single base of SNVs and INDELs. The filter string is: 'not (unmapped or mate_is_unmapped) and not duplicate and not failed_quality_control' which means each read should not be unmapped, should not have unmapped mate, should be a duplicate and should not have failed quality control.

Resources:

1.Sambamba filter expression

2.Samfile tags

3.bwa tags

4.Samfile format guide