4.1.5.0
Download release: gatk-4.1.5.0.zip
Docker image: https://hub.docker.com/r/broadinstitute/gatk/
Highlights of the 4.1.5.0 release:
-
A new, improved version of the
--linked-de-bruijn-graph
mode forHaplotypeCaller
andMutect2
that has better sensitivity compared to the previous linked DeBruijn graph implementation (#6394) -
A new version of
GenomicsDB
that fixes many frequently-reported issues -
LeftAlignIndels
now works for multiple indels -
VariantAnnotator
andConcordance
are now out of beta -
A significant number of bug fixes to major tools like
GenotypeGVCFs
andSelectVariants
Full list of changes:
-
HaplotypeCaller
- New, improved version of the
--linked-de-bruijn-graph
mode forHaplotypeCaller
andMutect2
that has better sensitivity compared to the previous linked DeBruijn graph implementation (#6394)- Running
HaplotypeCaller
in this mode will reduce the number of erroneous haplotypes discovered which can improve genotyping, phasing, and runtime. - Changed the haplotype recovery step to check that it covers all paths through the graph even if there are poorly supported paths in the JunctionTrees. Added the argument
--disable-artificial-haplotype-recovery
to disable this behavior. - Added the ability to expand graph kmer size after haplotype recovery in the event that there was a failure due to overcomplicated assembly graphs.
- Added code to squeeze extra sensitivity out of the junction trees by tolerating SNP errors when threading the junction trees themselves
- Running
- Realigning to best haplotype handles indels better (#6461)
- Fixed issue #5434 on inconsistent selection of reads for the PL, AD, and DP calculations. (#6055)
- Fixed bug where SNP and indel pseudocounts were swapped in the
AlleleFrequencyCalculator
(#6401) - The qual used in
HaplotypeCaller
'sisActive()
method now matches that ofGenotypeGVCFs
. That is, they both now use the new qual. (#6343) - Skip non-nucleotide alleles in force-calling mode, fixing bug (#6405)
- Fixed the hidden/experimental
--error-correct-reads
argument to actually correct the bases and qualities (#6366) - Removed the deprecated and obsolete
--use-new-qual-calculator
argument (#6398) - Refactored code related to windows and padding for assembly and genotyping, with slight changes to HMM padding for indels (#6358)
- New, improved version of the
-
Mutect2
- Improved
SomaticClusteringModel
(#6337) - Sped up Mutect2 reference confidence model with fast likelihoods model (#6457)
- Modified Fragment creation for Mutect2 to not fail for supplementary reads (#6327)
- Uniqify PG IDs in
FilterAlignmentArtifacts
(#6304) - Fixed error in RealignmentEngine due to converting from exclusive to inclusive interval ends (#6404)
- Added an error message for no callable sites in Mutect2 (#6445)
- Changed filter reporting in Mutect2 (#6288)
- Fixed force-calling mode in M2 mito WDL (#6359)
- Pass the reference to the realignment filter in the Mutect2 WDL (#6360)
- Deleted the old orientation bias filter (#6408)
- Made callable sites a Long to avoid integer overflow (#6303)
- Improved
-
GenomicsDB
- Move to
GenomicsDB
1.2.0 (#6305)- Fixes an issue with
GenomicsDBImport
erroring out due to duplicate fields in the Info, Format, and/or Filter fields. (#6158) - Fixes an issue with
GenomicsDBImport
not completing for mixed ploidy samples (#6275) - This version uses a 64-bit htslib to workaround overflow issues when computed annotation sizes exceed the 32-bit integer space
- Fixes an issue with
- Move to
-
Joint Calling
GenotypeGVCFs
: improved checking for upstream deletions in theGenotypingEngine
(#6429)- Fixes rare cases where
GenotypeGVCFs
could emit a variant with a spanned allele (*), and a genotype that references the spanned allele, but fail to emit the upstream spanning variant.
- Fixes rare cases where
GenotypeGVCFs
: Don't call the NON_REF allele in genotypes or ADs (#6437)- Parse combined
AS_QUALapprox
values from older reblocked GVCFs properly (#6442) - Added a force output sites argument to
GenotypeGVCFs
(#6263) - Remove extraneous alleles in GenotypeGVCFs force-output mode (#6406)
-
CNV Calling
- Copy temporary files early in gcnvkernel to avoid inadvertent temporary directory cleanup. (#6297)
- Enabled streaming of counts.tsv/counts.tsv.gz files in gCNV CLIs. (#6266)
- Fixed shard index in PostprocessGermlineCNVCalls log message. (#6313)
- gCNV vcf cleanup (#6352)
- Index output VCFs for GCNV postprocessing (#6330)
-
Notable Enhancements
VariantAnnotator
is now out of beta (#6402)Concordance
is out of beta (#6397)LeftAlignIndels
now works for multiple indels (#6427)FilterVariantTranches
can now handle cases where there are only SNPs or only indels, and not both (#6411)- Added new read filters for
NotProperlyPaired
and forMateDistant
(#6295) - Made the
.git
directory optional during build (#6450)
-
Bug Fixes
- Handle zero-weight Gaussians correctly in
VariantRecalibrator
(#6425) - Fixed the
--invalidate-previous-filters
argument inVariantFiltration
to work as intended (ie., roll back all variants to unfiltered status) (#6412) - Fixed a bug where
SelectVariants
takes forever on many-allelic somatic samples (#6446) - Make sure
SelectVariants
outputs variants in correct order (assuming input vcf is correctly sorted) (#6444) - Fixed a NPE crash in
VariantEval
when run with no intervals/reference (#6283) - Fixed a NPE crash in
FastaReferenceMaker
(#6435) - Fixed an out-of-bounds error in
CountNs
annotation (#6355) - Fixed a bug in hardClipCigar function that caused incorrect cigar calculation (#6280)
AnalyzeSaturationMutagenesis
: fixed bug in codon calling for in-frame inserts (#6332)
- Handle zero-weight Gaussians correctly in
-
Miscellaneous Changes
- Collect split read and paired end evidence files for GATK-SV pipeline (#6356)
- Add "PASS" filter line for
ApplyVQSR
andFilterMutectCalls
(#6436) - Added engine functionality for accessing the user defined intervals without merging them (#5887)
- Trim intervals loaded from interval files. (#6375)
- Propagate read group filters in
ReadGroupBlackListReadFilter
. (#6300) - Modified ANDed read filter output message for readability (#6315)
- Clearly label the number of reads processed in the
BaseRecalibrator
log output (#6447) - Clearly label the
CountReads
tool output (#6449) - Improved the error messages for missing contigs in the reference (#6469)
- Avoid a copy and reverse operation in
CigarUtils.isGood()
(#6439) - Fixed
GenotypeAlleleCount
's toString() method (#6376) - Minor Funcotator WDL updates. (#6326)
- Added a
getPairOrientation()
method toGATKRead
(#6420) - Merged
GATKProtectedVariantContextUtils
methods into other classes (#6409) - Deleted a lot of unused VCF constants (#6361)
- Deleted some unused genotyping code (#6354)
- Fixed incoherent unit test cases in allele subsetting utils (#6448)
- Add Python script executor error message for SIGKILL exit code 137. (#6414)
- Pip install pinned numpy. (#6413)
- Do not install R on travis, and only run the R tests on the Docker. (#6454)
- Fixes for
IndexFeatureFile
error reporting. (#6367) - Temporarily remove dead Berkeley mirror to unblock builds. (#6422)
- Disable CNNVariantPipelineTest.testTrainingReadModel until failures are resolved. (#6331)
- Delete unused JsonSerializer (#6415)
- Delete empty file SparkToggleCommandLineProgram.java. (#6311)
-
Documentation
- Clarify the definition of the
NON_REF
allele (#6431) - Clarify behavior of
SplitIntervals
for lists of adjacent intervals (#6423) - Update docs to reflect the fact that
TandemRepeat
works withHaplotypeCaller
(#5943) - Update LeftAlignIndels documentation (#6177)
- Update hyperlink to new GATK forum page in the README (#6381)
- Add minValue/minRecommended value to ApplyBQSRArgumentCollection (#6438)
- Small README fixes (#6451)
- Fix some GATK doc issues (#6318)
- Update copyright date in LICENSE.TXT (#6383)
- Clarify the definition of the
-
Dependencies