bench

Quick start

Run this command where base is your 'truth set' SVs and comp is the comparison set of SVs.

truvari bench -b base_calls.vcf -c comp_calls.vcf -o output_dir/

Matching Parameters

Picking matching parameters can be more of an art than a science. It really depends on the precision of your callers and the tolerance you wish to allow them such that it is a fair comparison.

For example, depth of coverage callers (such as CNVnator) will have very 'fuzzy' boundaries, and don't report the exact deleted sequence but only varying regions. So thresholds of pctseq=0, pctsize=.5, pctovl=.5, refdist=1000 may seem fair.

BioGraph and many long-read callers report precise breakpoints and full alternate allele sequences. When benchmarking those results, we want to ensure our accuracy by using the stricter default thresholds.

If you're still having trouble picking thresholds, it may be beneficial to do a few runs of Truvari bench over different values. Start with the strict defaults and gradually increase the leniency. From there, you can look at the performance metrics and manually inspect differences between the runs to find out what level you find acceptable. Truvari is meant to be flexible for comparison. More importantly, Truvari helps one clearly report the thresholds used for reproducibility.

Here is a rundown of each matching parameter.

Parameter	Default	Definition
refdist	500	Maximum distance comparison calls must be within from base call's start/end
pctseq	0.7	Edit distance ratio between the REF/ALT haplotype sequences of base and comparison call. See "Comparing Sequences of Variants" below.
pctsize	0.7	Ratio of min(base_size, comp_size)/max(base_size, comp_size)
pctovl	0.0	Ratio of two calls' (overlapping bases)/(longest span)
typeignore	False	Types don't need to match to compare calls.

Below are matching parameter diagrams to illustrate (approximately) how they work.

 █ = Deletion ^ = Insertion

--refdist REFDIST (500)
  Max reference location distance

    ACTGATCATGAT
     |--████--|    
          █████      
  
  Calls are within reference distance of 2

--pctsize PCTSIZE (0.7)
  Min pct allele size similarity

    ACTGATCATGA    sizes
      █████     -> 5bp
        ████    -> 4bp

  variants have 0.8 size similarity


--pctovl PCTOVL (0.0)
  Min pct reciprocal overlap

    ACTGATCATGA  ranges
      █████      [2,7)
        ████     [4,8)

  variants have 0.6 reciprocial overlap


--pctseq PCTSEQ (0.7)
  Min percent allele sequence similarity

    A-CTG-ACTG
     ^   ^       haplotypes
     |   └ACTG -> CTGACTGA
     └CTGA     -> CTGACTGA

  haplotypes have 100% sequence similarity

Outputs

Truvari bench writes the following files to the --output directory.

File	Description
tp-base.vcf.gz	True positive calls form the base VCF
tp-comp.vcf.gz	True positive calls from the comparison VCF
fp.vcf.gz	False positive calls from comparison
fn.vcf.gz	False negative calls from base
summary.json	Json output of performance stats
params.json	Json output of parameters used
candidate.refine.bed	Bed file of regions for `refine`
log.txt	Run's log

summary.json

Stats generated by benchmarking are written to summary.json.

Metric	Definition
TP-base	Number of matching calls from the base vcf
TP-comp	Number of matching calls from the comp vcf
FP	Number of non-matching calls from the comp vcf
FN	Number of non-matching calls from the base vcf
precision	TP-comp / (TP-comp + FP)
recall	TP-base / (TP-base + FN)
f1	2 * ((recall * precision) / (recall + precision))
base cnt	Number of calls in the base vcf
comp cnt	Number of calls in the comp vcf
TP-comp_TP-gt	TP-comp with genotype match
TP-comp_FP-gt	TP-comp without genotype match
TP-base_TP-gt	TP-base with genotype match
TP-base_FP-gt	TP-base without genotype match
gt_concordance	TP-comp_TP-gt / (TP-comp_TP-gt + TP-comp_FP-gt)
gt_matrix	Base GT x Comp GT Matrix of all Base calls' best, TP match
weighted	Metrics weighed by variant sequence/size similarity

The gt_matrix is a table. For example:

"gt_matrix": {
    "(0, 1)": {
        "(0, 1)": 500,
        "(1, 1)": 10
    },
    "(1, 1)": {
        "(1, 1)": 800,
        "(0, 1)": 20
    }
}

Represents ->

comp    (0,1)   (1,1)
base
(0,1)   500     10
(1,1)   20      800

Added annotations

The output vcfs are annotated with INFO fields and then sorted, compressed, and indexed inside of the output directory.

Anno	Definition
TruScore	Truvari score for similarity of match. `((pctseq + pctsize + pctovl) / 3 * 100)`
PctSeqSimilarity	Pct sequence similarity between this variant and its closest match
PctSizeSimilarity	Pct size similarity between this variant and its closest match
PctRecOverlap	Percent reciprocal overlap percent of the two calls
StartDistance	Distance of the base call's start from comparison call's start
EndDistance	Distance of the base call's end from comparison call's end
SizeDiff	Difference in size of base and comp calls
GTMatch	Base/comp calls' Genotypes match
MatchId	Id to help tie base/comp calls together {chunkid}.{baseid}.{compid} See MatchIds wiki for details.

Refining bench output

As described in the refine wiki, a limitation of Truvari bench is 1-to-1 variant comparison. However, truvari refine can harmonize the variants to give them more consistent representations. A bed file named candidate.refine.bed is created by truvari bench and holds a set of regions which may benefit from refinement. To use it, simply run

truvari bench -b base.vcf.gz -c comp.vcf.gz -o result/
truvari refine --regions result/candidate.refine.bed \
               --reference reference.fasta \
               --recount --use-region-coords \
               result/

See refine wiki for details.

Comparing Sequences of Variants

Truvari has implemented two approaches to compare variant sequences. The default comparison is called 'unroll'. Optionally, a --reference can be provided and Truvari will use the reference context of a pair of variants for comparison.

Unroll

The method of giving a pair of calls the same reference context can be achieved using an 'unroll' method. For a formal description, see this gist.

The main idea is that in order to move variants upstream/downstream, the reference sequence flanking the variant will need to be moved downstream/upstream respectively. Or, to say this another way, we can think of the alternate sequences as being circular instead of linear. This means that in order to move the variant e.g. 1bp downstream for an INS, we could remove the first base from the ALT and append it to the end. So in the 'ab' example used to describe "Reference context" below, we only need to unroll the insertion at a position by the distance between it and another variant e.g. the INS ab at POS 2 becomes identical to the INS ba at POS 1 by rolling 2-1 = 1 bases from the start to the end.

This unroll method has a number of benefits and a couple of considerations, including:

not needing a --reference for comparison, which saves I/O time
increasing the number of correctly matching SVs
decreasing the number of 'suspect' matches in smaller size regimes
providing a simpler pattern between PctSizeSimilarity and PctSeqSimilarity

Reference context

For the reference context method, consider a hypothetical tandem repeat expansion of the reference sequence 'AB'. Here, we'll represent the 'insertion' sequence as lower-case 'ab', though it should be considered equivalent to 'AB'. Three equally valid descriptions of this variant would be:

#POS INS  Haplotype
  0  ab   abAB
  1  ba   AbaB
  2  ab   ABab

Therefore, to compare the sequence similarity, Truvari builds the haplotypes over the range of a pair of calls' min(starts):max(ends) before making the the sequence change introduced by the variants. In python, this line looks like:

hap1_seq = ref.get_seq(a1_chrom, start + 1, a1_start).seq + a1_seq + ref.get_seq(a1_chrom, a1_end + 1, end).seq

Where a1_seq1 is the longer of the REF or ALT allele.

SVs without sequences

If the base or comp vcfs do not have sequence resolved calls (symbolic alleles such as <DEL>), simply set --pctseq=0 to turn off sequence comparison. If --pctseq != 0 and a symbolic SV is encountered, a warning will be raised and the variant will not be compared.

Controlling the number of matches

How many matches a variant is allowed to participate in is controlled by the --pick parameter. The available pickers are single, ac, and multi.

single (the default option) allows each variant to participate in up to one match.
ac uses the genotype allele count to control how many matches a variant can have. This means a homozygous alternate variant can participate in two matches (its GT is 1/1 so AC=2). A heterozygous variant can only participate in one match (GT 0/1, AC=1). And, a homozygous reference variant cannot be matched. Note that missing genotypes are considered reference alleles and do not add to the AC e.g. (GT ./1, AC=1).
multi variants can participate in all matches available.

As an example, imagine we have three variants in a pVCF with two samples we want to compare.

CHROM POS      ID  REF ALT           base comp
chr20 17785968 ID1 A   ACGCGCGCGCG   1/1  1/0
chr20 17785968 ID2 A   ACGCGCGCGCGCG 0/0  0/1
chr20 17785969 ID3 C   CGCGCGCGCGCGC 0/0  1/1

To compare samples inside the same vcf, we would use the command:

truvari bench -b input.vcf.gz -c input.vcf.gz -o output/ --bSample base --cSample comp --no-ref a

This VCF makes different results depending on the --pick parameter

Parameter	ID1 State	ID2 State	ID3 State
single	TP	FP	FP
ac	TP	TP	FP
multi	TP	TP	TP

--dup-to-ins

Most SV benchmarks only report DEL and INS SVTYPEs. The flag --dup-to-ins will interpret SVs with SVTYPE == DUP to SVTYPE == INS. Note that DUPs generally aren't sequence resolved (i.e. the ALT isn't a sequence) like INS. Therefore, --dup-to-ins typically should be used without sequence comparison via --pctseq 0

Size filtering

--sizemax is the maximum size of a base or comparison call to be considered.

--sizemin is the minimum size of a base call to be considered.

--sizefilt is the minimum size of a comparison call that will be matched to base calls. It can be less than sizemin for edge case variants.

For example: Imagine sizemin is set at 50 and sizefilt at 30, and a 50bp base call is 98% similar to a 49bp comparison call at the same position.

These two calls could be considered matching. However, if we removed comparison calls less than sizemin, we'd incorrectly classify the 50bp base call as a false negative. Instead, we allow comparison calls between [sizefilt,sizemin) to find matches.

This has the side effect of artificially inflating specificity. For example, if that same 49bp call described above were below the similarity threshold, it would not be classified as a FP since it is below the sizemin threshold. So we're giving the call a better chance to be useful and less chance to be detrimental to final statistics.

Include Bed & VCF Header Contigs

If an --includebed is provided, only base and comp calls contained within the defined regions are used for comparison. This is similar to pre-filtering your base/comp calls using:

(zgrep "#" my_calls.vcf.gz && bedtools intersect -u -a my_calls.vcf.gz -b include.bed) | bgzip > filtered.vcf.gz

with the exception that Truvari requires the start and the end to be contained in the same includebed region whereas bedtools intersect does not.

If an --includebed is not provided, the comparison is restricted to only the contigs present in the base VCF header. Therefore, any comparison calls on contigs not in the base calls will not be counted toward summary statistics and will not be present in any output vcfs.

Extending an Include Bed

The option --extend extends the regions of interest (set in --includebed argument) by the given number of bases on each side, allowing base variants to match comparison variants that are just outside of the original region. If a comparison variant is in the extended regions it can potentially match a base variant that is in the original regions turning it to TP. Comparison variants in the extended regions that don't have a match are not counted as FP. This strategy is similar to the one implemented for size matching where only the base variants longer than sizemin (equal to 50 by default) are considered, but they are allowed to match shorter comparison variants sizefilt (30bp by default) or longer.

See this discussionfor details.

Methodology

Here is a high-level pseudocode description of the steps Truvari bench conducts to compare the two VCFs.

* zip the Base and Comp calls together in sorted order
* create chunks of all calls overlapping within ±`--chunksize` basepairs
* make a |BaseCall| x |CompCall| match matrix for each chunk
* build a Match for each call pair in the chunk - annotate as TP if >= all thresholds 
* if the chunk has no Base or Comp calls
** return them all as FNs/FPs 
* use `--pick` method to sort and annotate variants with their best match

Provide feedback

Saved searches

Use saved searches to filter your results more quickly