-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Canu release v1.5 failed with: canu iteration count too high, stopping pipeline #554
Comments
This usually implies a failure in the previous step or some kind of file corruption though Canu should be detecting this and stopping. However, there are three other recent issues with similar problems (#543, #542) so it is possible there was a recently introduced bug. Does the previous step ( |
Thanks Sergey for your guidance. All of the blocks/*.dat files have sizes of non-0. The precompute.files file lists 97 *.dat files, which equals the number of precompute.*.out files. All of the precompute.*.out files terminate in “Total time (s): ”, except one file which looks mangled at the end (precompute.120861_27.out):
|
Perhaps that one is the source of the issue. For the failed mhap jobs (like 9, 10, ... 237), check the corresponding query folder (like |
Of the 14 failed mhap jobs, only one utilized block 27, and of the 26 mhap jobs that utilized block 27, only 1 failed. According to canu-scripts/canu.03.out "All 97 mhap precompute jobs finished successfully." |
In that case it is unlikely that the dat file is an issue. Try still removing all the |
Okay, I'm new to Slurm, as well as canu, mhap, PacBio... Let's say I want to re-run mhap on a failed job, say 9. There is a script already available in
Would I edit the I also note that in Here is something else: looking at the mhap.*.out in
|
Those exceptions do point to a failed dat file which is somehow not captured in the logs. Is there a common dat file shared between all the failed jobs? The log around the exception should say what file it is reading so they may all list the same one. I see your submit command is not requesting a runtime, what's the default runtime on your system? Is it possible the jobs hit the runtime limit? You should be able to query slurm for a job history to find this out. As for rerunning, I would suggest running it by hand not on the grid using the mhap.sh script not mhap.jobSubmit.sh. You can run it in an interactive session on the grid for example. |
Hi,
Can you please help me with setting parameters to avoid current problem? Thanks.
Here is the command, running on CentOS cluster with a Slurm scheduler:
Here is the output of canu.out:
Simply restarting doesn't do the trick, as it encounters the same problem.
Here is an example mhap output file:
Thanks for your help.
Josh
The text was updated successfully, but these errors were encountered: