Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutect2 error getNumTandemRepeatUnits String index out of range #6516

Closed
nalcala opened this issue Mar 23, 2020 · 6 comments
Closed

Mutect2 error getNumTandemRepeatUnits String index out of range #6516

nalcala opened this issue Mar 23, 2020 · 6 comments

Comments

@nalcala
Copy link

nalcala commented Mar 23, 2020

Bug Report

Affected tool(s) or class(es)

Mutect2, multi-sample (2 samples) in Tumor-only mode

Affected version(s)

  • version 4.1.5.0, works fine on 4.1.4.1 and 4.1.4.0

Description

Among my cohort of ~100 samples, mutect2 calling using reference genome hg38+alt+decoy (e.g. as provided in the gatk bundle) fails for one sample at a very specific location (chrUn_KI270748v1:61595-61748), returning an index out of range error. Slightly reducing the range removes the issue (e.g., calling on chrUn_KI270748v1:61596-61748), so it looks like an issue with the estimation of the number of repeats.

This is not the most important location, but the error could affect more important calls for other people. The log is the following:

Using GATK jar /home/alcalan/.conda/mutect2-cd161e2f51ff2240ce6390abc942bbdd/share/gatk4-4.1.5.0-1/gatk-package-4.1.5.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx15G -jar /home/alcalan/.conda/mutect2-cd161e2f51ff2240ce6390abc942bbdd/share/gatk4-4.1.5.0-1/gatk-package-4.1.5.0-local.jar Mutect2 -R /data/references/Homo_sapiens/GATK/hg38/Homo_sapiens_assembly38.fasta -I test1.bam -I test2.bam -O tests.vcf -L test_err.bed
10:34:24.578 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/alcalan/.conda/mutect2-cd161e2f51ff2240ce6390abc942bbdd/share/gatk4-4.1.5.0-1/gatk-package-4.1.5.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Mar 23, 2020 10:34:24 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
10:34:24.819 INFO  Mutect2 - ------------------------------------------------------------
10:34:24.820 INFO  Mutect2 - The Genome Analysis Toolkit (GATK) v4.1.5.0
10:34:24.820 INFO  Mutect2 - For support and documentation go to https://software.broadinstitute.org/gatk/
10:34:24.820 INFO  Mutect2 - Executing as alcalan@hn.pioneerx on Linux v3.10.0-1062.4.3.el7.x86_64 amd64
10:34:24.820 INFO  Mutect2 - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_192-b01
10:34:24.820 INFO  Mutect2 - Start Date/Time: March 23, 2020 10:34:24 AM CET
10:34:24.820 INFO  Mutect2 - ------------------------------------------------------------
10:34:24.820 INFO  Mutect2 - ------------------------------------------------------------
10:34:24.821 INFO  Mutect2 - HTSJDK Version: 2.21.2
10:34:24.821 INFO  Mutect2 - Picard Version: 2.21.9
10:34:24.821 INFO  Mutect2 - HTSJDK Defaults.COMPRESSION_LEVEL : 2
10:34:24.821 INFO  Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
10:34:24.821 INFO  Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
10:34:24.822 INFO  Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
10:34:24.822 INFO  Mutect2 - Deflater: IntelDeflater
10:34:24.822 INFO  Mutect2 - Inflater: IntelInflater
10:34:24.822 INFO  Mutect2 - GCS max retries/reopens: 20
10:34:24.822 INFO  Mutect2 - Requester pays: disabled
10:34:24.823 INFO  Mutect2 - Initializing engine
10:34:25.945 INFO  FeatureManager - Using codec BEDCodec to read file file:///scratch/alcalan/nextflow_work/e9/a28e7174a34d0d29fe9a0d8a506d46/test_err.bed
10:34:25.960 INFO  IntervalArgumentCollection - Processing 153 bp from intervals
10:34:25.987 INFO  Mutect2 - Done initializing engine
10:34:26.188 INFO  NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/home/alcalan/.conda/mutect2-cd161e2f51ff2240ce6390abc942bbdd/share/gatk4-4.1.5.0-1/gatk-package-4.1.5.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
10:34:26.190 INFO  NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/home/alcalan/.conda/mutect2-cd161e2f51ff2240ce6390abc942bbdd/share/gatk4-4.1.5.0-1/gatk-package-4.1.5.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
10:34:26.264 INFO  IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
10:34:26.267 INFO  IntelPairHmm - Available threads: 8
10:34:26.267 INFO  IntelPairHmm - Requested threads: 4
10:34:26.267 INFO  PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
10:34:26.375 INFO  ProgressMeter - Starting traversal
10:34:26.375 INFO  ProgressMeter -        Current Locus  Elapsed Minutes     Regions Processed   Regions/Minute
10:34:26.950 INFO  VectorLoglessPairHMM - Time spent in setup for JNI call : 0.0
10:34:26.950 INFO  PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 0.0
10:34:26.950 INFO  SmithWatermanAligner - Total compute time in java Smith-Waterman : 0.03 sec
10:34:26.951 INFO  Mutect2 - Shutting down engine
[March 23, 2020 10:34:26 AM CET] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 0.04 minutes.
Runtime.totalMemory()=1214251008
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
	at java.lang.String.substring(String.java:1927)
	at org.broadinstitute.hellbender.tools.walkers.annotator.TandemRepeat.getNumTandemRepeatUnits(TandemRepeat.java:54)
	at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.AssemblyRegionTrimmer.trim(AssemblyRegionTrimmer.java:175)
	at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2Engine.callRegion(Mutect2Engine.java:229)
	at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2.apply(Mutect2.java:299)
	at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:200)
	at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:173)
	at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
	at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
	at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
	at org.broadinstitute.hellbender.Main.main(Main.java:292)

Steps to reproduce

Here is a minimal fully reproducible example using files in https://github.com/nalcala/mutect2_issue:
gatk Mutect2 --java-options "-Xmx15G" -R GATK/hg38/Homo_sapiens_assembly38.fasta -I test1.bam -I test2.bam -O tests.vcf -L test_err.bed

Expected behavior

Variants should be called by mutect

Actual behavior

Mutect crashes

--- Thanks a lot!

@isidroc
Copy link

isidroc commented May 11, 2020

Same error here, any news? Thanks

@MarleyCodes
Copy link

Same error in HaplotypeCaller 4.1.7.0:

18:59:34.948 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 1.514351728
18:59:34.948 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 887.7367702070001
18:59:34.948 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 2117.34 sec
18:59:34.948 INFO HaplotypeCaller - Shutting down engine
[May 25, 2020 6:59:34 PM CEST] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 66.82 minutes.
Runtime.totalMemory()=69727158272
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(String.java:1927)
at org.broadinstitute.hellbender.tools.walkers.annotator.TandemRepeat.getNumTandemRepeatUnits(TandemRepeat.java:54)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.AssemblyRegionTrimmer.trim(AssemblyRegionTrimmer.java:175)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.callRegion(HaplotypeCallerEngine.java:552)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller.apply(HaplotypeCaller.java:210)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:200)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:173)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
at org.broadinstitute.hellbender.Main.main(Main.java:292)

@ericblanc20
Copy link

Same error here, affecting mutect2 (gatk 4.1.7.0) in normal-tumor pair mode.

Thanks

@cai1991
Copy link

cai1991 commented Jun 18, 2020

Same error in HaplotypeCaller 4.1.7.0:

10:05:48.875 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 25.625500792
10:05:48.875 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 11358.883564452
10:05:48.876 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 3761.34 sec
10:05:48.876 INFO HaplotypeCaller - Shutting down engine
[June 18, 2020 at 10:05:48 AM CEST] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 385.96 minutes.
Runtime.totalMemory()=14143193088
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.base/java.lang.String.substring(String.java:1837)
at org.broadinstitute.hellbender.tools.walkers.annotator.TandemRepeat.getNumTandemRepeatUnits(TandemRepeat.java:54)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.AssemblyRegionTrimmer.trim(AssemblyRegionTrimmer.java:175)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.callRegion(HaplotypeCallerEngine.java:552)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller.apply(HaplotypeCaller.java:210)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:200)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:173)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
at org.broadinstitute.hellbender.Main.main(Main.java:292)

Any news?

Thanks

@droazen
Copy link
Contributor

droazen commented Jun 18, 2020

@cai1991 @ericblanc20 @MarleyCodes @isidroc @nalcala I believe this was fixed in a recently-merged patch (#6583). The fix will go out as part of the next GATK release.

@fleharty @davidbenjamin Could one of you please confirm that #6583 fixes this issue? Thanks!

@davidbenjamin
Copy link
Contributor

@droazen Confirmed. There was no error when re-running on the user's data with the latest release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants