Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User gets StackOverflowError when using Multi-interval in GenomicsDBImport with GATK 4.0.6.0 #4994

Closed
vdauwera opened this issue Jul 10, 2018 · 4 comments

Comments

@vdauwera
Copy link
Contributor

@kgururaj We got this issue report in the forum, could you please look into it? Thanks!

https://gatkforums.broadinstitute.org/gatk/discussion/12388/how-to-use-multi-interval-in-genomicsdbimport-with-gatk-4-0-6-0


I used the GenomicsDBImport with a interval list file and got a error like below.
So what is the correct way to use Multi-interval in GenomicsDBImport?

gatk version: 4.0.6.0

Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx4g -Xms4g -jar /mnt/gatk/gatk-4.0.6.0/gatk-package-4.0.6.0-local.jar GenomicsDBImport -L test.intervals --genomicsdb-workspace-path ../RAW_VCF/my_database -V file1 -V file2 -V file3

02:57:15.591 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/workshop/xinchen.pan/test/gatk/gatk-4.0.6.0/gatk-package-4.0.6.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
02:57:15.772 INFO GenomicsDBImport - ------------------------------------------------------------
02:57:15.772 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.0.6.0
02:57:15.772 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
02:57:15.772 INFO GenomicsDBImport - Executing as on Linux v3.10.0-514.6.1.el7.x86_64 amd64
02:57:15.772 INFO GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_121-b13
02:57:15.773 INFO GenomicsDBImport - Start Date/Time: July 10, 2018 2:57:15 AM EDT
02:57:15.773 INFO GenomicsDBImport - ------------------------------------------------------------
02:57:15.773 INFO GenomicsDBImport - ------------------------------------------------------------
02:57:15.773 INFO GenomicsDBImport - HTSJDK Version: 2.16.0
02:57:15.773 INFO GenomicsDBImport - Picard Version: 2.18.7
02:57:15.773 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
02:57:15.773 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
02:57:15.773 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
02:57:15.773 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
02:57:15.774 INFO GenomicsDBImport - Deflater: IntelDeflater
02:57:15.774 INFO GenomicsDBImport - Inflater: IntelInflater
02:57:15.774 INFO GenomicsDBImport - GCS max retries/reopens: 20
02:57:15.774 INFO GenomicsDBImport - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
02:57:15.774 INFO GenomicsDBImport - Initializing engine
02:57:18.389 INFO IntervalArgumentCollection - Processing 11228744 bp from intervals
02:57:18.437 INFO GenomicsDBImport - Done initializing engine
Created workspace ../RAW_VCF/my_database
02:57:18.583 INFO GenomicsDBImport - Vid Map JSON file will be written to ../RAW_VCF/my_database/vidmap.json
02:57:18.583 INFO GenomicsDBImport - Callset Map JSON file will be written to ../RAW_VCF/my_database/callset.json
02:57:18.583 INFO GenomicsDBImport - Complete VCF Header will be written to ../RAW_VCF/my_database/vcfheader.vcf
02:57:18.583 INFO GenomicsDBImport - Importing to array - ../RAW_VCF/my_database/genomicsdb_array
02:57:18.583 INFO ProgressMeter - Starting traversal
02:57:18.583 INFO ProgressMeter - Current Locus Elapsed Minutes Batches Processed Batches/Minute
02:57:31.082 INFO GenomicsDBImport - Shutting down engine
[July 10, 2018 2:57:31 AM EDT] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.26 minutes.
Runtime.totalMemory()=4116185088
Exception in thread "main" java.lang.StackOverflowError
at com.intel.genomicsdb.model.ImportConfig.isThereChromosomeIntervalIntersection(ImportConfig.java:95)
at com.intel.genomicsdb.model.ImportConfig.isThereChromosomeIntervalIntersection(ImportConfig.java:104)
at com.intel.genomicsdb.model.ImportConfig.isThereChromosomeIntervalIntersection(ImportConfig.java:104)
at com.intel.genomicsdb.model.ImportConfig.isThereChromosomeIntervalIntersection(ImportConfig.java:104)
at com.intel.genomicsdb.model.ImportConfig.isThereChromosomeIntervalIntersection(ImportConfig.java:104)
at com.intel.genomicsdb.model.ImportConfig.isThereChromosomeIntervalIntersection(ImportConfig.java:104)
at com.intel.genomicsdb.model.ImportConfig.isThereChromosomeIntervalIntersection(ImportConfig.java:104)

This Issue was generated from your [forums]
[forums]: https://gatkforums.broadinstitute.org/gatk/discussion/12388/how-to-use-multi-interval-in-genomicsdbimport-with-gatk-4-0-6-0/p1

@droazen
Copy link
Contributor

droazen commented Jul 10, 2018

It looks like there's a method in GenomicsDB (ImportConfig.isThereChromosomeIntervalIntersection()) that uses recursion unnecessarily. With a large enough interval list Java could run out of stack space, since it makes one recursive call per interval. I'd suggest converting the method to use iteration as a quick fix. @kgururaj can you comment?

We should find out how many intervals are in the user's list to confirm this theory.

@droazen
Copy link
Contributor

droazen commented Jul 10, 2018

It also looks like the method in question is O(n^2) when it could be O(n log n) if it sorted the interval list first...

kgururaj added a commit to Intel-HLS/GenomicsDB that referenced this issue Jul 10, 2018
broadinstitute/gatk#4994

Sort partitions and then look for overlaps - eliminate recursion
kgururaj added a commit to Intel-HLS/GenomicsDB that referenced this issue Jul 11, 2018
broadinstitute/gatk#4994

Sort partitions and then look for overlaps - eliminate recursion
@cmnbroad
Copy link
Collaborator

The stack overflow issue should now be fixed, but the original forum user reported having around 11k intervals, which I think is still probably too many to use at once. See #5066.

@droazen
Copy link
Contributor

droazen commented Oct 15, 2018

Closing -- this was patched.

@droazen droazen closed this as completed Oct 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants