Skip to content

Generating a unified VCF

Hassan Foroughi edited this page Jul 26, 2018 · 1 revision

One of the major challenges in creating a unified VCF is the lack of complete agreement between variant callers in the list of identified variants, and in this case somatic variants. There are published resources on this matter, and the differences can be attributed to internal implementation algorithm, methodology, and read filters.

To properly merge VCF from different sources, they need to be standardized. The following parameters were chosen to achieve this:

  1. Variant ID: The identified variants are assigned a standard ID: chrom_pos_refAllele_altAllele .
  2. Variant INFO, FORMAT, FILTER tags: Filter status (Passed, low_depth, etc), INFO/Caller (Vardict, Mutect2, Strelka), INFO/DP or FORMAT/DP (combined reads reported by variant caller, FORMAT/AD for both samples (normal and tumor), and somatic status (if not reported by variant caller, those with PASS filter will be considered somatic).
  3. Sample names: "Tumor" and "Normal" in the same order.
  4. GT tag: FORMAT/GT tag is missing from Strelka's output see: https://github.com/Illumina/strelka/issues/16