Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] VarDict fails AGAIN with Critical exception occurs on region #1511

Open
mathiasbio opened this issue Dec 9, 2024 · 7 comments
Open
Labels
Bug Something isn't working
Milestone

Comments

@mathiasbio
Copy link
Collaborator

mathiasbio commented Dec 9, 2024

Description

Multiple cases have failed on VarDict with the error we have seen before:
#1286

From settledtoad:
Run time: 2024-11-21 01:04
Error log: BALSAMIC.settledtoad.vardict_tumor_only.90.sh.7683749.err

Critical exception occurs on region: 17:29664737-29665257, program will be stopped.
Critical exception occurs on region: 17:29669927-29670253, program will be stopped.
Critical exception occurs on region: 17:29665622-29665923, program will be stopped.
Critical exception occurs on region: 17:29667423-29667763, program will be stopped.
Critical exception occurs on region: 17:29663251-29664032, program will be stopped.
Critical exception occurs on region: 17:29664286-29664700, program will be stopped.
java.util.concurrent.CompletionException: java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code

From becomingdoe:
Run time: 2024-12-05 01:55
Error log: BALSAMIC.becomingdoe.vardict_tumor_only.65.sh.7791586.err

Critical exception occurs on region: 15:81591545-81592991, program will be stopped.
java.util.concurrent.CompletionException: java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code

From gladmink:
Run time: 2024-12-05 11:25
Error log: BALSAMIC.gladmink.vardict_tumor_only.90.sh.7814826.err

Critical exception occurs on region: 5:148756198-148756688, program will be stopped.
Critical exception occurs on region: 5:148759484-148759900, program will be stopped.
Critical exception occurs on region: 5:148758667-148758888, program will be stopped.
Critical exception occurs on region: 5:148746822-148747163, program will be stopped.
Critical exception occurs on region: 5:148743498-148743873, program will be stopped.
Critical exception occurs on region: 5:148745404-148745816, program will be stopped.
Critical exception occurs on region: 5:148737531-148737835, program will be stopped.
Critical exception occurs on region: 5:148730380-148730945, program will be stopped.
Critical exception occurs on region: 5:148747455-148748317, program will be stopped.
Critical exception occurs on region: 5:148876315-148876523, program will be stopped.
Critical exception occurs on region: 5:148742099-148742670, program will be stopped.
Critical exception occurs on region: 5:148753831-148754263, program will be stopped.
java.util.concurrent.CompletionException: java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code

How to reproduce

No response

Expected behaviour

No response

Anything else?

No response

Pipeline version

16.0.0

@mathiasbio mathiasbio added the Bug Something isn't working label Dec 9, 2024
@github-project-automation github-project-automation bot moved this to Todo in BALSAMIC Dec 9, 2024
@mathiasbio mathiasbio added this to the Release 17 milestone Dec 11, 2024
@khurrammaqbool
Copy link
Collaborator

khurrammaqbool commented Dec 12, 2024

Two additional cases failed due to error in vardict:

  1. learningescargot - job 7868555
  • Vardict error from the segment 5:79801380-79801660:
  • Number of reads in the segment 3822
  1. notablegrubworm - job 7865531
  • Vardict error from the segment: 7:116415729-116419766:
  • Number of reads in the segment: 189401

@khurrammaqbool
Copy link
Collaborator

Three additional cases:

  1. civilbass
  2. directkrill
  3. validsnapper

@khurrammaqbool
Copy link
Collaborator

The most probable cause of these errors is related to memory allocation. Java fails when it tries to read a bam file with a segment with high number of reads. The list of these cases with th errors along with the number of reads in the segments is given below:

CASE segment in the error number of reads
settledtoad the case does not exist in production anymore
gladmink
becomingdoe 15:81591545-81592991 74782
learningescargot 5:79801380-79801660 3822
notablegrubworm
civilbass
directkrill
validsnapper

@mathiasbio
Copy link
Collaborator Author

mathiasbio commented Dec 13, 2024

It looks like these segments have a lot of reads. I just wanted to compare to other cases from the same panel to see if there were samples that had similar amounts of reads in these regions and which didn't crash. I just took some at random and checked, and it seems like the pattern is not so clear to say that this is caused by abnormal amounts of reads.

In some cases like charmedstork we don't see the error in 5:79801380-79801660 despite having more reads than the failing case in that region, however the case does fail in chromosome 7, in a region that it has 0 reads.

myeloid cases Critical exception occurs on region reads failed vardict in similar or adjacent region
proakita 5:79801380-79801660 3246 no
learningescargot 5:79801380-79801660 3822 yes
boldopossum 5:79801380-79801660 4098 no
charmedstork 5:79801380-79801660 3985 no
likedguppy 5:79801380-79801660 2628 no
coherentplatypus 5:79801380-79801660 2471 no
pumpedanteater 5:79801380-79801660 2301 no
proakita Y:14891361-14891641 739 no
learningescargot Y:14891361-14891641 748 no
boldopossum Y:14891361-14891641 0 no
charmedstork Y:14891361-14891641 0 yes
likedguppy Y:14891361-14891641 3232 no
coherentplatypus Y:14891361-14891641 3 no
pumpedanteater Y:14891361-14891641 0 no

@mathiasbio
Copy link
Collaborator Author

I have tested rerunning the failing vardict jobs for 2 cases a lot of times to try replicate the issue. I did this by creating a bash-script that runs vardict-java through the v16.0.0 singularity container, with the arguments copied from the rule.

I ran notablegrub 38 times for the failing chromosome 7, and learningescargot 37 times for the failing chromosome 5, but the test succeeded every time. I'm going to try rerunning the cases from start a few times and see if the error appears when running the cases through the workflow

@beatrizsavinhas
Copy link

Just adding a bit of information on one additional case with this error:
acesnapper

Critical exception occurs on region: 17:30677005-30677345, program will be stopped.
Critical exception occurs on region: 17:30322483-30322881, program will be stopped.
Critical exception occurs on region: 17:30323717-30323996, program will be stopped.
Critical exception occurs on region: 17:30320784-30321127, program will be stopped.
Critical exception occurs on region: 17:30696152-30696432, program will be stopped.
Critical exception occurs on region: 17:30325577-30326123, program will be stopped.
Critical exception occurs on region: 17:30321483-30321840, program will be stopped.

@mathiasbio
Copy link
Collaborator Author

After re-running learningescargot from start in the development environment corresponding to the production version of balsamic (v16.0.0) it failed again but in a different chromosomal region compared to before:

Critical exception occurs on region: 15:41363946-41364356, program will be stopped.
java.util.concurrent.CompletionException: java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code

It seems that this error is happening more frequently when run in the context of the whole pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working
Projects
Status: Todo
Development

No branches or pull requests

3 participants