Skip to content

sage v3.0

Compare
Choose a tag to compare
@charlesshale charlesshale released this 09 Apr 05:59
· 7132 commits to master since this release

Functional:

  • variants can be present in multiple local phase sets (allowing representation of partial phasing) and phasing is extended to the full read length allowing phasing of up to 150 bases or further where extrapolation is possible
  • MNV and INDEL deduplication rules improved (see read-me for details)
  • added strand bias soft filter
  • multiple small changes to read filtering and base qual trimming to reduce false positives and increase precision, allow low-qual matching in flanks
  • mitochondrial DNA reading is switched off by default
  • changed panel minTumorVAF from 0.015 to 0.02
  • use soft-clipped bases in read events penalty
  • added extra normal alt support condition for max germline VAF filter
  • filter candidate reads with low MQ after penalties
  • increased hard min tumor qual from 30 to 50
  • added hard min tumor AF filter
  • call the same variant with different read context if has diff core and 3+ reads and exceeds 25% of top candidate
  • use normal raw BQ in germline VAF test for non-indels
  • removed right-align by microhomology since now handled by Pave

Technical:

  • reduced memory footprint by 30-50%
  • improved performance in poorly mappable high depth regions
  • removed local realignment set (LRS) and phased inframe INDEL (PII) from VCF, no longer used
  • use global local phase set instead of per-chromosome
  • log warning rather than exception when BAM read positions are out of order

Config:

  • resource_dir (optional) - path to all resource files, in which case specify the file names only for ref_genome, hotspots, panel_bed and high_confidence_bed
  • ensembl_data_dir - path to Ensembl data cache is now required
  • assembly -> ref_genome_version - 37 (default) or 38
  • write_bqr_data - only write BQR file if present
  • bqr_plot -> write_bqr_plot
  • load_bqr_files - optionally reload previously written BQR data to avoid recomputation or if running on a slice of a BAM or specific_regions
  • include_mt - only read mitochondrial DNA if present

Debugging and performance:

  • config: chr -> specific_chr
  • config: specific_regions - limit sage to list of regions, separated by ';' in the form chromosome:positionStart:positionEnd
  • config: perf_warn_time - log a warning if any region (ie 100K partition by default) takes more than X seconds to complete