Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CNNScoreVariants crashes with java.lang.NullPointerException #7811

Open
GATKSupportTeam opened this issue Apr 25, 2022 · 5 comments
Open

CNNScoreVariants crashes with java.lang.NullPointerException #7811

GATKSupportTeam opened this issue Apr 25, 2022 · 5 comments
Assignees

Comments

@GATKSupportTeam
Copy link
Collaborator

Looks like this java.lang.NullPointerException is from an environment set up issue.

This request was created from a contribution made by Jordi Maggi on April 25, 2022 09:25 UTC.

Link: https://gatk.broadinstitute.org/hc/en-us/community/posts/5574426055963-CNNScoreVariants-crashes-with-java-lang-NullPointerException

--

Hi,

I created a conda environment and installed gatk4 through conda install -c bioconda gatk4. I have been using this environment to run all steps of the single sample germline variant calling best practices workflow (both gatk and picard). However, I have never been able to run CNNScoreVariants with this setup, as it always results in a java.lang.NullPointerException error. The only way I am able to run this step is by running it through the docker image you provide. That, however, is not ideal for our setup.

Any idea as to what I may try to be able to run it directly?

GATK version:

Using GATK jar /home/analyst/anaconda3/envs/snakemake_env/share/gatk4-4.2.5.0-0/gatk-package-4.2.5.0-local.jar

Running:

    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/analyst/anaconda3/envs/snakemake_env/share/gatk4-4.2.5.0-0/gatk-package-4.2.5.0-local.jar --version

The Genome Analysis Toolkit (GATK) v4.2.5.0

HTSJDK Version: 2.24.1

Picard Version: 2.25.4

Exact command:

gatk CNNScoreVariants -I 73318_WES_hg19_recalibrated.sorted.bam -V 73318_80_IDTv1.vcf.gz -R /media/analyst/Data/Reference_data/hg19.fa -O /media/analyst/Data/73318_CNNScore_test.vcf.gz -tensor-type read_tensor > /media/analyst/Data/CNNScoreVariants.log

Entire console output:

Running:

    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/analyst/anaconda3/envs/snakemake_env/share/gatk4-4.2.5.0-0/gatk-package-4.2.5.0-local.jar CNNScoreVariants -I 73318_WES_hg19_recalibrated.sorted.bam -V 73318_80_IDTv1.vcf.gz -R /media/analyst/Data/Reference_data/hg19.fa -O /media/analyst/Data/73318_CNNScore_test.vcf.gz -tensor-type read_tensor

11:17:58.509 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/analyst/anaconda3/envs/snakemake_env/share/gatk4-4.2.5.0-0/gatk-package-4.2.5.0-local.jar!/com/intel/gkl/native/libgkl_compression.so

Apr 25, 2022 11:17:58 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine

INFO: Failed to detect whether we are running on Google Compute Engine.

11:17:58.668 INFO  CNNScoreVariants - ------------------------------------------------------------

11:17:58.668 INFO  CNNScoreVariants - The Genome Analysis Toolkit (GATK) v4.2.5.0

11:17:58.669 INFO  CNNScoreVariants - For support and documentation go to https://software.broadinstitute.org/gatk/

11:17:58.669 INFO  CNNScoreVariants - Executing as analyst@WGS on Linux v5.13.0-40-generic amd64

11:17:58.669 INFO  CNNScoreVariants - Java runtime: OpenJDK 64-Bit Server VM v10.0.2+13

11:17:58.669 INFO  CNNScoreVariants - Start Date/Time: April 25, 2022 at 11:17:58 AM CEST

11:17:58.669 INFO  CNNScoreVariants - ------------------------------------------------------------

11:17:58.669 INFO  CNNScoreVariants - ------------------------------------------------------------

11:17:58.670 INFO  CNNScoreVariants - HTSJDK Version: 2.24.1

11:17:58.670 INFO  CNNScoreVariants - Picard Version: 2.25.4

11:17:58.670 INFO  CNNScoreVariants - Built for Spark Version: 2.4.5

11:17:58.670 INFO  CNNScoreVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2

11:17:58.670 INFO  CNNScoreVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false

11:17:58.670 INFO  CNNScoreVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true

11:17:58.670 INFO  CNNScoreVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false

11:17:58.670 INFO  CNNScoreVariants - Deflater: IntelDeflater

11:17:58.670 INFO  CNNScoreVariants - Inflater: IntelInflater

11:17:58.671 INFO  CNNScoreVariants - GCS max retries/reopens: 20

11:17:58.671 INFO  CNNScoreVariants - Requester pays: disabled

11:17:58.671 INFO  CNNScoreVariants - Initializing engine

WARNING: BAM index file /media/analyst/Data/WES/73318/73318_WES_hg19_recalibrated.sorted.bai is older than BAM /media/analyst/Data/WES/73318/73318_WES_hg19_recalibrated.sorted.bam

11:17:58.969 INFO  FeatureManager - Using codec VCFCodec to read file file:///media/analyst/Data/WES/73318/73318_80_IDTv1.vcf.gz

11:17:59.079 INFO  CNNScoreVariants - Done initializing engine

11:17:59.081 INFO  NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/home/analyst/anaconda3/envs/snakemake_env/share/gatk4-4.2.5.0-0/gatk-package-4.2.5.0-local.jar!/com/intel/gkl/native/libgkl_utils.so

11:17:59.187 INFO  CNNScoreVariants - Done scoring variants with CNN.

11:17:59.187 INFO  CNNScoreVariants - Shutting down engine

[April 25, 2022 at 11:17:59 AM CEST] org.broadinstitute.hellbender.tools.walkers.vqsr.CNNScoreVariants done. Elapsed time: 0.01 minutes.

Runtime.totalMemory()=1895825408

java.lang.NullPointerException

    at org.broadinstitute.hellbender.utils.runtime.ProcessControllerAckResult.hasMessage(ProcessControllerAckResult.java:49)

    at org.broadinstitute.hellbender.utils.runtime.ProcessControllerAckResult.getDisplayMessage(ProcessControllerAckResult.java:69)

    at org.broadinstitute.hellbender.utils.runtime.StreamingProcessController.waitForAck(StreamingProcessController.java:229)

    at org.broadinstitute.hellbender.utils.python.StreamingPythonScriptExecutor.waitForAck(StreamingPythonScriptExecutor.java:216)

    at org.broadinstitute.hellbender.utils.python.StreamingPythonScriptExecutor.sendSynchronousCommand(StreamingPythonScriptExecutor.java:183)

    at org.broadinstitute.hellbender.tools.walkers.vqsr.CNNScoreVariants.onTraversalStart(CNNScoreVariants.java:313)

    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1083)

    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)

    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)

    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)

    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)

    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)

    at org.broadinstitute.hellbender.Main.main(Main.java:289)

(created from Zendesk ticket #282399)
gz#282399

@cmnbroad
Copy link
Collaborator

The underlying issue here is is that the GATK conda env environment isn't established since bioconda doesn't appear to configure it. The NPE needs is fixed by #7816.

In this particular case it appears that some of the requirements are satisfied, since the code gets past the initial check to see if the GATK python code is available. But then the actual CNN code can't be loaded.

@felixm3
Copy link

felixm3 commented Oct 11, 2022

I'm getting the same issue @cmnbroad @GATKSupportTeam .

Any recommendations on how to proceed please?

Thanks in advance.


Using GATK jar /home/fmbuga/.conda/envs/gatk4/share/gatk4-4.2.6.1-1/gatk-package-4.2.6.1-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/fmbuga/.conda/envs/gatk4/share/gatk4-4.2.6.1-1/gatk-package-4.2.6.1-local.jar CNNScoreVariants --version
Using GATK jar /home/fmbuga/.conda/envs/gatk4/share/gatk4-4.2.6.1-1/gatk-package-4.2.6.1-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/fmbuga/.conda/envs/gatk4/share/gatk4-4.2.6.1-1/gatk-package-4.2.6.1-local.jar CNNScoreVariants -R /home/fmbuga/tools/hg38/hg38.fa -V /home/fmbuga/gatk4_gcp_wgs/06_vcf_raw/SRR16299720_dedup_AORRG_recal_raw.vcf -O ./08_vcf_1dCNN/SRR16299720_dedup_AORRG_recal_raw_1dCNN_scored.vcf
05:39:39.149 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/fmbuga/.conda/envs/gatk4/share/gatk4-4.2.6.1-1/gatk-package-4.2.6.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
05:39:39.304 INFO  CNNScoreVariants - ------------------------------------------------------------
05:39:39.305 INFO  CNNScoreVariants - The Genome Analysis Toolkit (GATK) v4.2.6.1
05:39:39.305 INFO  CNNScoreVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
05:39:39.305 INFO  CNNScoreVariants - Executing as fmbuga@node05.cluster on Linux v3.10.0-1062.18.1.el7.x86_64 amd64
05:39:39.305 INFO  CNNScoreVariants - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_332-b09
05:39:39.305 INFO  CNNScoreVariants - Start Date/Time: October 9, 2022 5:39:39 AM PDT
05:39:39.305 INFO  CNNScoreVariants - ------------------------------------------------------------
05:39:39.306 INFO  CNNScoreVariants - ------------------------------------------------------------
05:39:39.306 INFO  CNNScoreVariants - HTSJDK Version: 2.24.1
05:39:39.306 INFO  CNNScoreVariants - Picard Version: 2.27.1
05:39:39.306 INFO  CNNScoreVariants - Built for Spark Version: 2.4.5
05:39:39.307 INFO  CNNScoreVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
05:39:39.307 INFO  CNNScoreVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
05:39:39.307 INFO  CNNScoreVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
05:39:39.307 INFO  CNNScoreVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
05:39:39.307 INFO  CNNScoreVariants - Deflater: IntelDeflater
05:39:39.307 INFO  CNNScoreVariants - Inflater: IntelInflater
05:39:39.307 INFO  CNNScoreVariants - GCS max retries/reopens: 20
05:39:39.307 INFO  CNNScoreVariants - Requester pays: disabled
05:39:39.307 INFO  CNNScoreVariants - Initializing engine
05:39:39.905 INFO  FeatureManager - Using codec VCFCodec to read file file:///home/fmbuga/gatk4_gcp_wgs/06_vcf_raw/SRR16299720_dedup_AORRG_recal_raw.vcf
05:39:40.108 INFO  CNNScoreVariants - Done initializing engine
05:39:40.109 INFO  NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/home/fmbuga/.conda/envs/gatk4/share/gatk4-4.2.6.1-1/gatk-package-4.2.6.1-local.jar!/com/intel/gkl/native/libgkl_utils.so
05:39:40.429 INFO  CNNScoreVariants - Done scoring variants with CNN.
05:39:40.429 INFO  CNNScoreVariants - Shutting down engine
[October 9, 2022 5:39:40 AM PDT] org.broadinstitute.hellbender.tools.walkers.vqsr.CNNScoreVariants done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=1903165440
java.lang.NullPointerException
	at org.broadinstitute.hellbender.utils.runtime.ProcessControllerAckResult.hasMessage(ProcessControllerAckResult.java:49)
	at org.broadinstitute.hellbender.utils.runtime.ProcessControllerAckResult.getDisplayMessage(ProcessControllerAckResult.java:69)
	at org.broadinstitute.hellbender.utils.runtime.StreamingProcessController.waitForAck(StreamingProcessController.java:235)
	at org.broadinstitute.hellbender.utils.python.StreamingPythonScriptExecutor.waitForAck(StreamingPythonScriptExecutor.java:216)
	at org.broadinstitute.hellbender.utils.python.StreamingPythonScriptExecutor.sendSynchronousCommand(StreamingPythonScriptExecutor.java:183)
	at org.broadinstitute.hellbender.tools.walkers.vqsr.CNNScoreVariants.onTraversalStart(CNNScoreVariants.java:313)
	at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1083)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
	at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
	at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
	at org.broadinstitute.hellbender.Main.main(Main.java:289)


@cmnbroad
Copy link
Collaborator

@felixm3 The bioconda environment doesn't actually configure the gatk conda environment (it installs gatk, but not the python dependencies required for CNNScoreVariants). You need to set up the gatk conda environment, as described in the Python Dependencies section in the README.md file: https://github.com/broadinstitute/gatk#readme.

@chundruv
Copy link

chundruv commented Sep 8, 2023

I spent a long time struggling to install the environment as it hasn't been updated to the new tensorflow and keras versions which changed syntax in the newer versions which cause a lot of the errors you see here. I managed to get it all working by fixing the versions in the yaml but conda takes a loooooong time to solve the environment so I would highly recommend using mamba or micromamba!
I'm attaching the yaml I used to get CNNScoreVariants to work here (renamed as .txt as it won't attach as a yml).

gatkcondaenv_fixed.yml.txt

@nservant
Copy link

Would it be possible to update the gatktool bioconda repo to ensure that all python dependencies are well installed to run CNNScoreVariants ? It would be really helpful and easier to manage a GATK conda environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants