Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Canu release v1.5 failed with: canu iteration count too high, stopping pipeline #554

Closed
steinjosh opened this issue Jul 18, 2017 · 7 comments

Comments

@steinjosh
Copy link

Hi,
Can you please help me with setting parameters to avoid current problem? Thanks.
Here is the command, running on CentOS cluster with a Slurm scheduler:

$ canu -p cargold -d cargold_v1.5_start2.out/ genomeSize=400m -pacbio-raw ../20170*/*/Analysis_Results/*.fastq
-- Canu release v1.5
-- Detected Java(TM) Runtime Environment '1.8.0_60' (from 'java').
-- Detected gnuplot version '4.6 patchlevel 7' (from 'gnuplot') and image format 'svg'.
-- Detected 40 CPUs and 126 gigabytes of memory.
-- Detected Slurm with 'sinfo' binary in /usr/bin/sinfo.
-- Detected Slurm with 'MaxArraySize' limited to 1000 jobs.
-- 
-- Found  58 hosts with  40 cores and  125 GB memory under Slurm control.
-- Found   2 hosts with 120 cores and 1513 GB memory under Slurm control.
-- Found   3 hosts with 120 cores and 1513 GB memory under Slurm control.
--
-- Run under grid control using   31 GB and  10 CPUs for stage 'meryl'.
-- Run under grid control using   13 GB and  10 CPUs for stage 'mhap (cor)'.
-- Run under grid control using    8 GB and   8 CPUs for stage 'overlapper (obt)'.
-- Run under grid control using    8 GB and   8 CPUs for stage 'overlapper (utg)'.
-- Run under grid control using   12 GB and   4 CPUs for stage 'falcon_sense'.
-- Run under grid control using    3 GB and   1 CPU  for stage 'ovStore bucketizer'.
-- Run under grid control using   16 GB and   1 CPU  for stage 'ovStore sorting'.
-- Run under grid control using    6 GB and   5 CPUs for stage 'read error detection'.
-- Run under grid control using    2 GB and   1 CPU  for stage 'overlap error adjustment'.
-- Run under grid control using   25 GB and   8 CPUs for stage 'bogart'.
-- Run under grid control using    4 GB and   4 CPUs for stage 'GFA alignment and processing'.
-- Run under grid control using   25 GB and   8 CPUs for stage 'consensus'.
--
-- Generating assembly 'cargold' in '/project/gbru_fy17_002/pacbio_fastq/canu_assembly/cargold_v1.5_start2.out'
--
-- Parameters:
--
--  genomeSize        400000000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.2400 ( 24.00%)
--    obtOvlErrorRate 0.0450 (  4.50%)
--    utgOvlErrorRate 0.0450 (  4.50%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.3000 ( 30.00%)
--    obtErrorRate    0.0450 (  4.50%)
--    utgErrorRate    0.0450 (  4.50%)
--    cnsErrorRate    0.0450 (  4.50%)
----------------------------------------
-- Starting command on Tue Jul 18 10:37:55 2017 with 1267857.651 GB free disk space

    cd /project/gbru_fy17_002/pacbio_fastq/canu_assembly/cargold_v1.5_start2.out
    sbatch \
      --mem-per-cpu=4g \
      --cpus-per-task=1   \
      -D `pwd` \
      -J 'canu_cargold' \
      -o canu-scripts/canu.15.out canu-scripts/canu.15.sh
Submitted batch job 121448

-- Finished on Tue Jul 18 10:37:55 2017 (lickety-split) with 1267857.651 GB free disk space

Here is the output of canu.out:

-- Canu release v1.5
-- Detected Java(TM) Runtime Environment '1.8.0_60' (from 'java').
-- Detected gnuplot version '4.6 patchlevel 7' (from 'gnuplot') and image format 'svg'.
-- Detected 40 CPUs and 126 gigabytes of memory.
-- Detected Slurm with 'sinfo' binary in /usr/bin/sinfo.
-- Detected Slurm with 'MaxArraySize' limited to 1000 jobs.
-- 
-- Found  58 hosts with  40 cores and  125 GB memory under Slurm control.
-- Found   3 hosts with 120 cores and 1513 GB memory under Slurm control.
-- Found   2 hosts with 120 cores and 1513 GB memory under Slurm control.
--
-- Run under grid control using   31 GB and  10 CPUs for stage 'meryl'.
-- Run under grid control using   13 GB and  10 CPUs for stage 'mhap (cor)'.
-- Run under grid control using    8 GB and   8 CPUs for stage 'overlapper (obt)'.
-- Run under grid control using    8 GB and   8 CPUs for stage 'overlapper (utg)'.
-- Run under grid control using   12 GB and   4 CPUs for stage 'falcon_sense'.
-- Run under grid control using    3 GB and   1 CPU  for stage 'ovStore bucketizer'.
-- Run under grid control using   16 GB and   1 CPU  for stage 'ovStore sorting'.
-- Run under grid control using    6 GB and   5 CPUs for stage 'read error detection'.
-- Run under grid control using    2 GB and   1 CPU  for stage 'overlap error adjustment'.
-- Run under grid control using   25 GB and   8 CPUs for stage 'bogart'.
-- Run under grid control using    4 GB and   4 CPUs for stage 'GFA alignment and processing'.
-- Run under grid control using   25 GB and   8 CPUs for stage 'consensus'.
--
-- Generating assembly 'cargold' in '/project/gbru_fy17_002/pacbio_fastq/canu_assembly/cargold_v1.5_start2.out'
--
-- Parameters:
--
--  genomeSize        400000000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.2400 ( 24.00%)
--    obtOvlErrorRate 0.0450 (  4.50%)
--    utgOvlErrorRate 0.0450 (  4.50%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.3000 ( 30.00%)
--    obtErrorRate    0.0450 (  4.50%)
--    utgErrorRate    0.0450 (  4.50%)
--    cnsErrorRate    0.0450 (  4.50%)
--
--
-- BEGIN CORRECTION
--
--
-- 14 mhap jobs failed:
--   job 1-overlapper/results/000009.ovb FAILED.
--   job 1-overlapper/results/000010.ovb FAILED.
--   job 1-overlapper/results/000011.ovb FAILED.
--   job 1-overlapper/results/000012.ovb FAILED.
--   job 1-overlapper/results/000136.ovb FAILED.
--   job 1-overlapper/results/000137.ovb FAILED.
--   job 1-overlapper/results/000138.ovb FAILED.
--   job 1-overlapper/results/000151.ovb FAILED.
--   job 1-overlapper/results/000152.ovb FAILED.
--   job 1-overlapper/results/000153.ovb FAILED.
--   job 1-overlapper/results/000215.ovb FAILED.
--   job 1-overlapper/results/000216.ovb FAILED.
--   job 1-overlapper/results/000229.ovb FAILED.
--   job 1-overlapper/results/000237.ovb FAILED.
--
================================================================================
Don't panic, but a mostly harmless error occurred and Canu stopped.

Canu release v1.5 failed with:
  canu iteration count too high, stopping pipeline (most likely a problem in the grid-based computes)

Simply restarting doesn't do the trick, as it encounters the same problem.

Here is an example mhap output file:

$ less correction/1-overlapper/mhap.121469_237.out
Running job 237 based on SLURM_ARRAY_TASK_ID=237 and offset=0.
Fetch blocks/000094.dat
Fetch blocks/000095.dat
Fetch blocks/000096.dat
Fetch blocks/000097.dat

Running block 000093 in query 000237

INVALID OVERLAP  3627211 (len   5938)  3610169 (len  22314) hangs   4966   2277 -      1  13635 flip 1

Thanks for your help.

Josh

@skoren
Copy link
Member

skoren commented Jul 18, 2017

This usually implies a failure in the previous step or some kind of file corruption though Canu should be detecting this and stopping. However, there are three other recent issues with similar problems (#543, #542) so it is possible there was a recently introduced bug.

Does the previous step (correction/1-overlapper/precompute.*.out) report any errors? Did any of those jobs get killed by your grid scheduler? Are any of the dat files (correction/1-overlapper/blocks/*.dat) 0-sized?

@steinjosh
Copy link
Author

Thanks Sergey for your guidance. All of the blocks/*.dat files have sizes of non-0. The precompute.files file lists 97 *.dat files, which equals the number of precompute.*.out files. All of the precompute.*.out files terminate in “Total time (s): ”, except one file which looks mangled at the end (precompute.120861_27.out):

$ less precompute.120861_27.out
Running job 27 based on SLURM_ARRAY_TASK_ID=27 and offset=0.
Dumping reads from 1014001 to 1053000 (inclusive).

Starting mhap precompute.

Running with these settings:
--filter-threshold = 5.0E-6
--help = false
--max-shift = 0.2
--min-olap-length = 116
--min-store-length = 0
--no-rc = false
--no-self = false
--no-tf = false
--num-hashes = 256
--num-min-matches = 3
--num-threads = 10
--ordered-kmer-size = 14
--ordered-sketch-size = 1000
--repeat-idf-scale = 10.0
--repeat-weight = 0.9
--settings = 0
--store-full-id = false
--supress-noise = 0
--threshold = 0.8
--version = false
-f = ../../0-mercounts/cargold.ms16.frequentMers.ignore.gz
-h = false
-k = 16
-p = ./000027.input.fasta
-q = .
-s = 

Reading in filter file ../../0-mercounts/cargold.ms16.frequentMers.ignore.gz.
Time (s) to read filter file: 0.167291781
Read in k-mer filter for sizes: [16]
Processing FASTA files for binary compression...
Current # sequences loaded and processed from file: 5000...
Current # sequences loaded and processed from file: 10000...
Current # sequences loaded and processed from file: 15000...
Current # sequences loaded and processed from file: 20000...
Current # sequences loaded and processed from file: 25000...
Current # sequences loaded and processed from file: 30000...

otal time (s): 751.205191795 ./000027.input.fasta to ./000027.input.dat.

@skoren
Copy link
Member

skoren commented Jul 18, 2017

Perhaps that one is the source of the issue. For the failed mhap jobs (like 9, 10, ... 237), check the corresponding query folder (like correction/1-overlapper/queries/00009) and see if they are using block 27. If so then it is the likely culprit. You would need to remove it, remove correction/1-overlapper/results/*.mhap (leave the ovb files), re-run precompute.sh 27 so it finishes without issue, and re-launch Canu.

@steinjosh
Copy link
Author

Of the 14 failed mhap jobs, only one utilized block 27, and of the 26 mhap jobs that utilized block 27, only 1 failed. According to canu-scripts/canu.03.out "All 97 mhap precompute jobs finished successfully."

@skoren
Copy link
Member

skoren commented Jul 18, 2017

In that case it is unlikely that the dat file is an issue. Try still removing all the correction/1-overlapper/results/*.mhap files and re-run one of the failed mhap.sh jobs by hand to see what it outputs.

@steinjosh
Copy link
Author

Okay, I'm new to Slurm, as well as canu, mhap, PacBio...

Let's say I want to re-run mhap on a failed job, say 9. There is a script already available in correction/1-overlapper/mhap.jobSubmit.sh that looks like this:

#!/bin/sh

sbatch \
  --mem-per-cpu=1331m --cpus-per-task=10 -o mhap.%A_%a.out \
  -D `pwd` -J "cormhap_cargold" \
  -a 237-237 \
  ./mhap.sh 0 \
> ./mhap.jobSubmit.out 2>&1

Would I edit the -a parameter to -a 9. Or if I wanted to do all of the failed jobs I would just list them -a 9-12, 136-138, 151-153, etc. ?

I also note that in correction/1-overlapper/results/ directory only the failed jobs have the *.mhap or *.mhap.ovb.WORKING suffix, while the successful jobs have *.counts and *.ovb suffix.

Here is something else: looking at the mhap.*.out in correction/1-overlapper it is possible to find these errors, which appear to only affect the failed jobs

$ grep 'Exception' mhap.*.out | less
mhap.120960_10.out:Exception in thread "pool-2-thread-4" Exception in thread "pool-2-thread-1" Exception in thread "pool-2-thread-8" Stored 78000 sequences in the index.
mhap.120960_10.out:Exception in thread "pool-2-thread-5" Exception in thread "pool-2-thread-7" Exception in thread "pool-2-thread-9" Exception in thread "pool-2-thread-2" Processed 78800 unique sequences (fwd and rev).
mhap.120960_10.out:Exception in thread "pool-2-thread-6" Time (s) to read and hash from file: 15.792406009
mhap.120960_10.out:Exception in thread "pool-2-thread-3" Exception in thread "pool-2-thread-10" edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_10.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_10.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_10.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_10.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_10.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_10.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_10.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_10.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_10.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_11.out:Exception in thread "pool-2-thread-9" Exception in thread "pool-2-thread-6" Exception in thread "pool-2-thread-4" Exception in thread "pool-2-thread-1" Exception in thread "pool-2-thread-7" Exception in thread "pool-2-thread-10" Exception in thread "pool-2-thread-2" Exception in thread "pool-2-thread-3" Exception in thread "pool-2-thread-5" Stored 78000 sequences in the index.
mhap.120960_11.out:Exception in thread "pool-2-thread-8" Processed 78800 unique sequences (fwd and rev).
mhap.120960_11.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_11.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_11.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_11.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_11.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_11.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_11.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_11.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_11.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_11.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_12.out:Exception in thread "pool-2-thread-9" Exception in thread "pool-2-thread-2" Exception in thread "pool-2-thread-5" Exception in thread "pool-2-thread-10" Exception in thread "pool-2-thread-4" Exception in thread "pool-2-thread-6" Exception in thread "pool-2-thread-7" Exception in thread "pool-2-thread-8" Exception in thread "pool-2-thread-1" Exception in thread "pool-2-thread-3" Stored 78000 sequences in the index.
mhap.120960_12.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_12.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_12.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_12.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_12.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_12.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_12.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_12.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_12.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_12.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_136.out:Exception in thread "pool-2-thread-9" Exception in thread "pool-2-thread-6" Exception in thread "pool-2-thread-10" Exception in thread "pool-2-thread-8" Exception in thread "pool-2-thread-5" Exception in thread "pool-2-thread-4" Exception in thread "pool-2-thread-2" Stored 78000 sequences in the index.
mhap.120960_136.out:Exception in thread "pool-2-thread-3" Exception in thread "pool-2-thread-7" Exception in thread "pool-2-thread-1" Processed 78515 unique sequences (fwd and rev).
mhap.120960_136.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_136.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_136.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_136.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_136.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_136.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_136.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_136.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_136.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_136.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_137.out:Exception in thread "pool-2-thread-7" Exception in thread "pool-2-thread-6" Exception in thread "pool-2-thread-10" Exception in thread "pool-2-thread-5" Exception in thread "pool-2-thread-2" Exception in thread "pool-2-thread-4" Exception in thread "pool-2-thread-8" Stored 78000 sequences in the index.
mhap.120960_137.out:Exception in thread "pool-2-thread-1" Exception in thread "pool-2-thread-3" Exception in thread "pool-2-thread-9" Processed 78515 unique sequences (fwd and rev).
mhap.120960_137.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_137.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_137.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_137.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_137.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_137.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_137.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_137.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_137.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_137.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_138.out:Exception in thread "pool-2-thread-1" Exception in thread "pool-2-thread-2" Exception in thread "pool-2-thread-8" Exception in thread "pool-2-thread-6" Exception in thread "pool-2-thread-5" Exception in thread "pool-2-thread-7" Exception in thread "pool-2-thread-10" Stored 78000 sequences in the index.
mhap.120960_138.out:Exception in thread "pool-2-thread-3" Exception in thread "pool-2-thread-4" Exception in thread "pool-2-thread-9" Processed 78515 unique sequences (fwd and rev).
mhap.120960_138.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_138.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_138.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_138.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_138.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_138.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_138.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_138.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_138.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_138.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
Binary file mhap.120960_151.out matches
mhap.120960_152.out:Exception in thread "pool-2-thread-5" Exception in thread "pool-2-thread-9" Exception in thread "pool-2-thread-8" Exception in thread "pool-2-thread-2" Exception in thread "pool-2-thread-4" Exception in thread "pool-2-thread-1" Exception in thread "pool-2-thread-6" Exception in thread "pool-2-thread-10" Stored 78000 sequences in the index.
mhap.120960_152.out:Exception in thread "pool-2-thread-7" Exception in thread "pool-2-thread-3" Processed 78243 unique sequences (fwd and rev).
mhap.120960_152.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_152.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_152.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_152.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_152.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_152.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_152.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_152.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_152.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_152.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_153.out:Exception in thread "pool-2-thread-10" Exception in thread "pool-2-thread-1" Exception in thread "pool-2-thread-6" Exception in thread "pool-2-thread-3" Exception in thread "pool-2-thread-2" Exception in thread "pool-2-thread-4" Exception in thread "pool-2-thread-8" Exception in thread "pool-2-thread-9" Exception in thread "pool-2-thread-7" Exception in thread "pool-2-thread-5" Stored 78000 sequences in the index.
mhap.120960_153.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_153.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_153.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_153.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_153.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_153.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_153.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_153.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_153.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_153.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_215.out:Exception in thread "pool-2-thread-5" Exception in thread "pool-2-thread-1" Exception in thread "pool-2-thread-4" Exception in thread "pool-2-thread-9" Exception in thread "pool-2-thread-8" Exception in thread "pool-2-thread-2" Exception in thread "pool-2-thread-6" Exception in thread "pool-2-thread-10" Exception in thread "pool-2-thread-3" Exception in thread "pool-2-thread-7" Stored 78000 sequences in the index.
mhap.120960_215.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_215.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_215.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_215.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_215.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_215.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_215.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_215.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_215.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_215.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_216.out:Exception in thread "pool-2-thread-5" Exception in thread "pool-2-thread-10" Exception in thread "pool-2-thread-8" Exception in thread "pool-2-thread-3" Exception in thread "pool-2-thread-4" Exception in thread "pool-2-thread-1" Exception in thread "pool-2-thread-7" Exception in thread "pool-2-thread-2" Exception in thread "pool-2-thread-9" Exception in thread "pool-2-thread-6" Stored 78000 sequences in the index.
mhap.120960_216.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_216.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_216.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_216.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_216.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_216.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_216.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_216.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_216.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_216.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_229.out:Exception in thread "pool-2-thread-1" Exception in thread "pool-2-thread-7" Exception in thread "pool-2-thread-6" Exception in thread "pool-2-thread-10" Exception in thread "pool-2-thread-9" Exception in thread "pool-2-thread-3" Exception in thread "pool-2-thread-4" Exception in thread "pool-2-thread-8" Exception in thread "pool-2-thread-2" Exception in thread "pool-2-thread-5" Stored 78000 sequences in the index.
mhap.120960_229.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_229.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_229.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_229.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_229.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_229.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_229.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_229.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_229.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_229.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_237.out:Exception in thread "pool-2-thread-4" Exception in thread "pool-2-thread-8" Exception in thread "pool-2-thread-7" Exception in thread "pool-2-thread-9" Exception in thread "pool-2-thread-5" Exception in thread "pool-2-thread-1" Exception in thread "pool-2-thread-2" Exception in thread "pool-2-thread-10" Stored 78000 sequences in the index.
mhap.120960_237.out:Exception in thread "pool-2-thread-3" Processed 78348 unique sequences (fwd and rev).
mhap.120960_237.out:Exception in thread "pool-2-thread-6" Time (s) to read and hash from file: 14.969531949
mhap.120960_237.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_237.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_237.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_237.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_237.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_237.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_237.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_237.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_237.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_237.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_9.out:Exception in thread "pool-2-thread-7" Exception in thread "pool-2-thread-1" Exception in thread "pool-2-thread-6" Exception in thread "pool-2-thread-3" Exception in thread "pool-2-thread-8" Exception in thread "pool-2-thread-10" Exception in thread "pool-2-thread-5" Exception in thread "pool-2-thread-4" Stored 78000 sequences in the index.
mhap.120960_9.out:Exception in thread "pool-2-thread-9" Processed 78800 unique sequences (fwd and rev).
mhap.120960_9.out:Exception in thread "pool-2-thread-2" Time (s) to read and hash from file: 15.599218157000001
mhap.120960_9.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_9.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_9.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_9.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_9.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_9.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_9.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_9.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_9.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.
mhap.120960_9.out:edu.umd.marbl.mhap.impl.MhapRuntimeException: Sequence ID already exists in the hash table.

@skoren
Copy link
Member

skoren commented Jul 18, 2017

Those exceptions do point to a failed dat file which is somehow not captured in the logs. Is there a common dat file shared between all the failed jobs? The log around the exception should say what file it is reading so they may all list the same one.

I see your submit command is not requesting a runtime, what's the default runtime on your system? Is it possible the jobs hit the runtime limit? You should be able to query slurm for a job history to find this out.

As for rerunning, I would suggest running it by hand not on the grid using the mhap.sh script not mhap.jobSubmit.sh. You can run it in an interactive session on the grid for example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants