-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Program killed during early extension; RAM issue unlikely #266
Comments
|
Thank you for your detailed help. I have now consulted the cluster's admin and we figured out, that it was an out-of-memory-error. This issue was caused by trying to reserve multiple nodes on the HPC. However, if this is the case, PBS ignores the memory requested and allocates whatever memory is left on the nodes. This resulted in the processes to run out of memory and abort. Thanks also for the providing additional assistance with the databases. I now left the multi-organelle mode behind. The assemblies performed all worked fine and I am impressed how well the animal_mt assemblies worked out with minor adjustments. |
First check
Describe the bug
GetOrganelle aborts early during extension ("Killed") without more information. RAM is unlikely to be the limiting factor, as the --out-per-round is specified and 128 GB of RAM are provided.
Command executed:
Additional context
Compared to earlier runs (#264), I decided to increase the word limit, as it became apparent from the log files that the word limit was reached after only ~ 7 rounds of extension (requested: 40). Processing is performed on 40 cores with 128 GB of RAM. Interestingly, early abortion of extension is most often observed when the combined database "embplant_pt,embplant_mt" is used.
The following warning message is interesting, as you advised against using the separate embplant_pt and embplant_mt databases (#263):
Can the multi-organelle mode be addressed by disabling the auto-estimation of word sizes and setting the word size to an arbitrarily low value?
Thank you for your help.
UPDATE: Now, also assemblies using only the animal_mt database fail early during extension (round 2-3). This is way earlier than with the old parameters (#264). Can it be that the current parameter selection is suboptimal, and if so, why?
Species1_log.txt
Species2_log.txt
Using the old parameters, assembly was successful for this species, but now aborts early:
Species3_log.txt
UPDATE2: The crashes of GetOrganelle produced core dump files. I inspected some of them using the file command. Here is the output:
UPDATE3: To rule out a RAM issue, I have now performed runs on the clusters's SMP queue. The command executed was:
Each job was provided 600 GB of RAM. Still they were aborted.
smp_run.log.txt
I am at my wit's end...
The text was updated successfully, but these errors were encountered: