Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault reported by GATK user #213

Open
lbergelson opened this issue Sep 30, 2024 · 0 comments
Open

Segfault reported by GATK user #213

lbergelson opened this issue Sep 30, 2024 · 0 comments

Comments

@lbergelson
Copy link
Contributor

We had a report of a repeatable seg fault from a GATK user running HaplotypeCaller. They're using gatk 4.6.0.0 which is using the most recent GKL 0.8.11 .

I've repeated their report below. (from broadinstitute/gatk#8988)

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f06ed243291, pid=1058615, tid=1058616
#
# JRE version: OpenJDK Runtime Environment (17.0.2+8) (build 17.0.2+8-86)
# Java VM: OpenJDK 64-Bit Server VM (17.0.2+8-86, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C  [libc.so.6+0xcf291]  __memset_avx2_erms+0x11
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e" (or dumping to /bigdata/ramadugulab/luy/SNPcallingBreeding/core.1058615)
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

---------------  S U M M A R Y ------------

Command Line: -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 /bigdata/operations/pkgadmin/opt/linux/centos/8.x/x86_64/pkgs/gatk/4.6.0.0/gatk-package-4.6.0.0-local.jar HaplotypeCaller -R /rhome/luy/bigdata/genomes/Cclementina_182_v1_2.fa -I AlignedCalToCcl_Scaffolds_MarkDupOut.bam -O AlignedCalToCcl_Scaffolds.vcf.gz -ERC GVCF

Host: Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz, 64 cores, 20G, Rocky Linux release 8.8 (Green Obsidian)
Time: Sat Sep 28 04:11:19 2024 PDT elapsed time: 58592.788414 seconds (0d 16h 16m 32s)

---------------  T H R E A D  ---------------

Current thread (0x00007f06e4025b70):  JavaThread "main" [_thread_in_native, id=1058616, stack(0x00007f06edc7a000,0x00007f06edd7b000)]

Stack: [0x00007f06edc7a000,0x00007f06edd7b000],  sp=0x00007f06edbe6458,  free space=18014398509481393k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libc.so.6+0xcf291]  __memset_avx2_erms+0x11
C  [libgkl_pairhmm_omp5311772482084658743.so+0x1500f]  Java_com_intel_gkl_pairhmm_IntelPairHmm_computeLikelihoodsNative._omp_fn.0+0xcf

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J 8942  com.intel.gkl.pairhmm.IntelPairHmm.computeLikelihoodsNative([Ljava/lang/Object;[Ljava/lang/Object;[D)V (0 bytes) @ 0x00007f06d563401c [0x00007f06d5633fa0+0x000000000000007c]
J 10003 c2 com.intel.gkl.pairhmm.IntelPairHmm.computeLikelihoods([Lorg/broadinstitute/gatk/nativebindings/pairhmm/ReadDataHolder;[Lorg/broadinstitute/gatk/nativebindings/pairhmm/HaplotypeDataHolder;[D)V (119 bytes) @ 0x00007f06d5bff3e0 [0x00007f06d5bff3a0+0x0000000000000040]
J 6781 c2 org.broadinstitute.hellbender.utils.pairhmm.VectorLoglessPairHMM.computeLog10Likelihoods(Lorg/broadinstitute/hellbender/utils/genotyper/LikelihoodMatrix;Ljava/util/List;Lorg/broadinstitute/hellbender/utils/pairhmm/PairHMMInputScoreImputator;)V (450 bytes) @ 0x00007f06d54f8cc8 [0x00007f06d54f8a00+0x00000000000002c8]
J 10022 c2 org.broadinstitute.hellbender.tools.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.computeReadLikelihoods(Lorg/broadinstitute/hellbender/tools/walkers/haplotypecaller/AssemblyResultSet;Lorg/broadinstitute/hellbender/utils/genotyper/SampleList;Ljava/util/Map;Z)Lorg/broadinstitute/hellbender/utils/genotyper/AlleleLikelihoods; (25 bytes) @ 0x00007f06d5c0cb30 [0x00007f06d5c0b540+0x00000000000015f0]
J 9971 c2 org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.callRegion(Lorg/broadinstitute/hellbender/engine/AssemblyRegion;Lorg/broadinstitute/hellbender/engine/FeatureContext;Lorg/broadinstitute/hellbender/engine/ReferenceContext;)Ljava/util/List; (2286 bytes) @ 0x00007f06d5bdef08 [0x00007f06d5bdcd60+0x00000000000021a8]
J 10571% c2 org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(Lorg/broadinstitute/hellbender/engine/MultiIntervalLocalReadShard;Lorg/broadinstitute/hellbender/engine/ReferenceDataSource;Lorg/broadinstitute/hellbender/engine/FeatureManager;)V (154 bytes) @ 0x00007f06d5c8e5c0 [0x00007f06d5c8dd20+0x00000000000008a0]
j  org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse()V+83
j  org.broadinstitute.hellbender.engine.GATKTool.doWork()Ljava/lang/Object;+19
j  org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool()Ljava/lang/Object;+34
j  org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs()Ljava/lang/Object;+225
j  org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain([Ljava/lang/String;)Ljava/lang/Object;+14
j  org.broadinstitute.hellbender.Main.runCommandLineProgram(Lorg/broadinstitute/hellbender/cmdline/CommandLineProgram;[Ljava/lang/String;)Ljava/lang/Object;+20
j  org.broadinstitute.hellbender.Main.mainEntry([Ljava/lang/String;)V+22
j  org.broadinstitute.hellbender.Main.main([Ljava/lang/String;)V+8
v  ~StubRoutines::call_stub

siginfo: si_signo: 11 (SIGSEGV), si_code: 2 (SEGV_ACCERR), si_addr: 0x00007f06edc39d00

Register to memory mapping:

RAX=0x0 is NULL
RBX=0x00007f06edc39d00: <offset 0x0000000000006d00> in /bigdata/operations/pkgadmin/opt/linux/centos/8.x/x86_64/pkgs/java/17.0.2/lib/libjava.so at 0x00007f06edc33000
RCX=0x0000000000028318 is an unknown value
RDX=0x00007f06edc39d00: <offset 0x0000000000006d00> in /bigdata/operations/pkgadmin/opt/linux/centos/8.x/x86_64/pkgs/java/17.0.2/lib/libjava.so at 0x00007f06edc33000
RSP=0x00007f06edbe6458 points into unknown readable memory: 0x00007f0673c89bc4 | c4 9b c8 73 06 7f 00 00
RBP=0x00007f06edd78f50 is pointing into the stack for thread: 0x00007f06e4025b70
RSI=0x0 is NULL
RDI=0x00007f06edc39d00: <offset 0x0000000000006d00> in /bigdata/operations/pkgadmin/opt/linux/centos/8.x/x86_64/pkgs/java/17.0.2/lib/libjava.so at 0x00007f06edc33000
R8 =0x0000000000004f9a is an unknown value
R9 =0x0000000000000001 is an unknown value
R10=0x00000000000000c3 is an unknown value
R11=0x00007f06e47c9840 points into unknown readable memory: 0x4141474141414143 | 43 41 41 41 41 47 41 41
R12=0x00007f06edc119e0 points into unknown readable memory: 0x0000000000000000 | 00 00 00 00 00 00 00 00
R13=0x00007f06edbe96c0 points into unknown readable memory: 0x00007f06e4f65c50 | 50 5c f6 e4 06 7f 00 00
R14=0x0000000000028318 is an unknown value
R15=0x0000000000005063 is an unknown value


Registers:
RAX=0x0000000000000000, RBX=0x00007f06edc39d00, RCX=0x0000000000028318, RDX=0x00007f06edc39d00
RSP=0x00007f06edbe6458, RBP=0x00007f06edd78f50, RSI=0x0000000000000000, RDI=0x00007f06edc39d00
R8 =0x0000000000004f9a, R9 =0x0000000000000001, R10=0x00000000000000c3, R11=0x00007f06e47c9840
R12=0x00007f06edc119e0, R13=0x00007f06edbe96c0, R14=0x0000000000028318, R15=0x0000000000005063
RIP=0x00007f06ed243291, EFLAGS=0x0000000000010206, CSGSFS=0x002b000000000033, ERR=0x0000000000000007
  TRAPNO=0x000000000000000e

Top of Stack: (sp=0x00007f06edbe6458)
0x00007f06edbe6458:   00007f0673c89bc4 7b8f04462509c62f
0x00007f06edbe6468:   8010180048120140 0000c12912a02890
0x00007f06edbe6478:   0460229080441000 ffffffffffffffff
0x00007f06edbe6488:   4a03ed807b023001 3040120080800100

Steps to reproduce

The command ran was

gatk  HaplotypeCaller -R /rhome/luy/bigdata/genomes/Cclementina_182_v1_2.fa -I AlignedCalToCcl_Scaffolds_MarkDupOut.bam \
    -O AlignedCalToCcl_Scaffolds.vcf.gz \
    -ERC GVCF

Submitted to an HPC cluster using Slurm. Multiple machines tested, one Intel with an Xeon CPU E5-2683 v4 CPU and additionally tested on AMD with an EPYC 7713 CPU.

This has also been run multiple times, all crashing at the same __memset_avx2_erms+0x11 instruction.

Other package versions that might be relevant:
java/17.0.2
glibc-common-2.28-225

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant