Skip to content

4.1.5.0

Compare
Choose a tag to compare
@droazen droazen released this 28 Feb 23:01
· 642 commits to master since this release
7f9b849

Download release: gatk-4.1.5.0.zip
Docker image: https://hub.docker.com/r/broadinstitute/gatk/

Highlights of the 4.1.5.0 release:

  • A new, improved version of the --linked-de-bruijn-graph mode for HaplotypeCaller and Mutect2 that has better sensitivity compared to the previous linked DeBruijn graph implementation (#6394)

  • A new version of GenomicsDB that fixes many frequently-reported issues

  • LeftAlignIndels now works for multiple indels

  • VariantAnnotator and Concordance are now out of beta

  • A significant number of bug fixes to major tools like GenotypeGVCFs and SelectVariants

Full list of changes:

  • HaplotypeCaller

    • New, improved version of the --linked-de-bruijn-graph mode for HaplotypeCaller and Mutect2 that has better sensitivity compared to the previous linked DeBruijn graph implementation (#6394)
      • Running HaplotypeCaller in this mode will reduce the number of erroneous haplotypes discovered which can improve genotyping, phasing, and runtime.
      • Changed the haplotype recovery step to check that it covers all paths through the graph even if there are poorly supported paths in the JunctionTrees. Added the argument --disable-artificial-haplotype-recovery to disable this behavior.
      • Added the ability to expand graph kmer size after haplotype recovery in the event that there was a failure due to overcomplicated assembly graphs.
      • Added code to squeeze extra sensitivity out of the junction trees by tolerating SNP errors when threading the junction trees themselves
    • Realigning to best haplotype handles indels better (#6461)
    • Fixed issue #5434 on inconsistent selection of reads for the PL, AD, and DP calculations. (#6055)
    • Fixed bug where SNP and indel pseudocounts were swapped in the AlleleFrequencyCalculator (#6401)
    • The qual used in HaplotypeCaller's isActive() method now matches that of GenotypeGVCFs. That is, they both now use the new qual. (#6343)
    • Skip non-nucleotide alleles in force-calling mode, fixing bug (#6405)
    • Fixed the hidden/experimental --error-correct-reads argument to actually correct the bases and qualities (#6366)
    • Removed the deprecated and obsolete --use-new-qual-calculator argument (#6398)
    • Refactored code related to windows and padding for assembly and genotyping, with slight changes to HMM padding for indels (#6358)
  • Mutect2

    • Improved SomaticClusteringModel (#6337)
    • Sped up Mutect2 reference confidence model with fast likelihoods model (#6457)
    • Modified Fragment creation for Mutect2 to not fail for supplementary reads (#6327)
    • Uniqify PG IDs in FilterAlignmentArtifacts (#6304)
    • Fixed error in RealignmentEngine due to converting from exclusive to inclusive interval ends (#6404)
    • Added an error message for no callable sites in Mutect2 (#6445)
    • Changed filter reporting in Mutect2 (#6288)
    • Fixed force-calling mode in M2 mito WDL (#6359)
    • Pass the reference to the realignment filter in the Mutect2 WDL (#6360)
    • Deleted the old orientation bias filter (#6408)
    • Made callable sites a Long to avoid integer overflow (#6303)
  • GenomicsDB

    • Move to GenomicsDB 1.2.0 (#6305)
      • Fixes an issue with GenomicsDBImport erroring out due to duplicate fields in the Info, Format, and/or Filter fields. (#6158)
      • Fixes an issue with GenomicsDBImport not completing for mixed ploidy samples (#6275)
      • This version uses a 64-bit htslib to workaround overflow issues when computed annotation sizes exceed the 32-bit integer space
  • Joint Calling

    • GenotypeGVCFs: improved checking for upstream deletions in the GenotypingEngine (#6429)
      • Fixes rare cases where GenotypeGVCFs could emit a variant with a spanned allele (*), and a genotype that references the spanned allele, but fail to emit the upstream spanning variant.
    • GenotypeGVCFs: Don't call the NON_REF allele in genotypes or ADs (#6437)
    • Parse combined AS_QUALapprox values from older reblocked GVCFs properly (#6442)
    • Added a force output sites argument to GenotypeGVCFs (#6263)
    • Remove extraneous alleles in GenotypeGVCFs force-output mode (#6406)
  • CNV Calling

    • Copy temporary files early in gcnvkernel to avoid inadvertent temporary directory cleanup. (#6297)
    • Enabled streaming of counts.tsv/counts.tsv.gz files in gCNV CLIs. (#6266)
    • Fixed shard index in PostprocessGermlineCNVCalls log message. (#6313)
    • gCNV vcf cleanup (#6352)
    • Index output VCFs for GCNV postprocessing (#6330)
  • Notable Enhancements

    • VariantAnnotator is now out of beta (#6402)
    • Concordance is out of beta (#6397)
    • LeftAlignIndels now works for multiple indels (#6427)
    • FilterVariantTranches can now handle cases where there are only SNPs or only indels, and not both (#6411)
    • Added new read filters for NotProperlyPaired and for MateDistant (#6295)
    • Made the .git directory optional during build (#6450)
  • Bug Fixes

    • Handle zero-weight Gaussians correctly in VariantRecalibrator (#6425)
    • Fixed the --invalidate-previous-filters argument in VariantFiltration to work as intended (ie., roll back all variants to unfiltered status) (#6412)
    • Fixed a bug where SelectVariants takes forever on many-allelic somatic samples (#6446)
    • Make sure SelectVariants outputs variants in correct order (assuming input vcf is correctly sorted) (#6444)
    • Fixed a NPE crash in VariantEval when run with no intervals/reference (#6283)
    • Fixed a NPE crash in FastaReferenceMaker (#6435)
    • Fixed an out-of-bounds error in CountNs annotation (#6355)
    • Fixed a bug in hardClipCigar function that caused incorrect cigar calculation (#6280)
    • AnalyzeSaturationMutagenesis: fixed bug in codon calling for in-frame inserts (#6332)
  • Miscellaneous Changes

    • Collect split read and paired end evidence files for GATK-SV pipeline (#6356)
    • Add "PASS" filter line for ApplyVQSR and FilterMutectCalls (#6436)
    • Added engine functionality for accessing the user defined intervals without merging them (#5887)
    • Trim intervals loaded from interval files. (#6375)
    • Propagate read group filters in ReadGroupBlackListReadFilter. (#6300)
    • Modified ANDed read filter output message for readability (#6315)
    • Clearly label the number of reads processed in the BaseRecalibrator log output (#6447)
    • Clearly label the CountReads tool output (#6449)
    • Improved the error messages for missing contigs in the reference (#6469)
    • Avoid a copy and reverse operation in CigarUtils.isGood() (#6439)
    • Fixed GenotypeAlleleCount's toString() method (#6376)
    • Minor Funcotator WDL updates. (#6326)
    • Added a getPairOrientation() method to GATKRead (#6420)
    • Merged GATKProtectedVariantContextUtils methods into other classes (#6409)
    • Deleted a lot of unused VCF constants (#6361)
    • Deleted some unused genotyping code (#6354)
    • Fixed incoherent unit test cases in allele subsetting utils (#6448)
    • Add Python script executor error message for SIGKILL exit code 137. (#6414)
    • Pip install pinned numpy. (#6413)
    • Do not install R on travis, and only run the R tests on the Docker. (#6454)
    • Fixes for IndexFeatureFile error reporting. (#6367)
    • Temporarily remove dead Berkeley mirror to unblock builds. (#6422)
    • Disable CNNVariantPipelineTest.testTrainingReadModel until failures are resolved. (#6331)
    • Delete unused JsonSerializer (#6415)
    • Delete empty file SparkToggleCommandLineProgram.java. (#6311)
  • Documentation

    • Clarify the definition of the NON_REF allele (#6431)
    • Clarify behavior of SplitIntervals for lists of adjacent intervals (#6423)
    • Update docs to reflect the fact that TandemRepeat works with HaplotypeCaller (#5943)
    • Update LeftAlignIndels documentation (#6177)
    • Update hyperlink to new GATK forum page in the README (#6381)
    • Add minValue/minRecommended value to ApplyBQSRArgumentCollection (#6438)
    • Small README fixes (#6451)
    • Fix some GATK doc issues (#6318)
    • Update copyright date in LICENSE.TXT (#6383)
  • Dependencies

    • Updated HTSJDK to 2.21.2 (#6462)
    • Updated Picard to 2.21.9 (#6462)
    • Updated Disq to 0.3.5 (#6323)
    • Updated GenomicsDB to 1.2.0 (#6305)
    • Updated TestNG to 7.0.0 (#5787)