DeepVariant 0.8.0
With the v0.8.0 release, we introduce a new DeepVariant model for PacBio CCS data. This model can be run in the same manner as the Illumina WGS and WES models. For more details, see our manuscript with PacBio and our blog post.
This release also includes general improvements to DeepVariant and the Illumina WGS and WES models. These include:
- New script that lets the users run DeepVariant in one command. See Quick Start.
- Improved accuracy for NovaSeq samples, especially PCR-Free ones, achieved by adding NovaSeq samples to the training data. See DeepVariant training data.
- Improved accuracy for low coverage (30x and below), achieved by training on a broader mix of downsampled data. See DeepVariant training data.
- Overall speed improvements which reduce runtime by ~24% on WGS case study:
- Speed improvements in querying SAM files and doing calculations with Reads and Ranges.
- Fewer unnecessary copies when constructing DeBrujin graphs.
- Less memory usage when writing BED, FASTQ, GFF, SAM, and VCF files.
- Speed improvements in postprocess_variants when creating gVCFs - achieved by combining writing and merging for both VCF and gVCF.
- Improved support for CRAM files, allowing the use of a provided reference file instead of the embedded reference. See the
use_ref_for_cram
flag below.
New optional flags:
make_examples.py
use_ref_for_cram
:
Default is False (using the embedded reference in the CRAM file). If set to True,--ref
will be used as the reference instead. See CRAM support section for more details.parse_sam_aux_fields
anduse_original_quality_scores
:
Option to read base quality scores from OQ tag. To use this option, set both flags to true.
Standard GATK process includes a score re-calibration stage where base quality scores are re-calibrated using special software. DeepVariant produces a slightly better accuracy when original scores are used. Usually original scores are stored in a BAM file under OQ optional tag. This feature will allow to read quality scores from OQ tag instead of QUAL field.min_base_quality
:
Allowed users to try different thresholds for minimum base quality score.min_mapping_quality
:
Allowed users to try different thresholds for minimum mapping quality score.
call_variants.py
config_string
:
Allowed users to specify estimator session configuration through a flag when running on CPU and GPU, thanks to the contribution of @A-Tsai from ATGENOMIX in #159.num_mappers
:
Allowed users to modify the number of dataset mappers through a flag, thanks to the contribution of @fo40225 from National Taiwan University Hospital in #152.