You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Everything works smoothly, up until the 4-unitigger/unitigger.sh script, which requests a bit over 500GB of memory. There are nodes on our grid that have that, so it has run correctly, but queues for a long time. After inspecting the log files, all three assemblies have only used between 5GB and 16GB of max memory during this time.
All other stages seem to reasonably request memory and threads, so I was surprised by such a divergence here. I found a relevant line in Configure.pm, where it assumes this memory requirement based on the 3g estimate provided.
I wasn't sure if this was some artefact of canu assembling shorter/noisier reads than hifi, and so overestimates the resources needed for this stage. I appreciate it is easier to err on the side of caution and presumably this can be adjusted manually with the batMemory option, but I thought it was worth sharing as canu has otherwise been better than me at predicting resources.
I've included some details on the LSF grid resource summary from the unitigger job below.
CPU time : 3567.08 sec.
Max Memory : 15621 MB
Average Memory : 12243.67 MB
Total Requested Memory : 524288.00 MB
Delta Memory : 508667.00 MB
Run time : 697 sec.
Turnaround time : 11703 sec.
Thanks,
Alex
The text was updated successfully, but these errors were encountered:
Unitigging is a bit aggressive with it's memory request, especially for HiFi data where it ends up loading only very high quality overlaps. You can add a canu.defaults to the canu bin folder with batMemory=50 (or whatever the max limit is of the faster scheduled nodes) so it will always request less memory as a workaround.
Hello,
I've been using (Hi)canu to assemble hifi reads, with some good success. I had a question regarding the bat-unitigging stage.
command
The overall command I'm running is
canu -p asm -d out_dir genomeSize=3g -pacbio-hifi <input>
versions
I've tested this on different data as well as two (at the time tip) versions
issue
Everything works smoothly, up until the
4-unitigger/unitigger.sh
script, which requests a bit over 500GB of memory. There are nodes on our grid that have that, so it has run correctly, but queues for a long time. After inspecting the log files, all three assemblies have only used between 5GB and 16GB of max memory during this time.All other stages seem to reasonably request memory and threads, so I was surprised by such a divergence here. I found a relevant line in Configure.pm, where it assumes this memory requirement based on the 3g estimate provided.
I wasn't sure if this was some artefact of canu assembling shorter/noisier reads than hifi, and so overestimates the resources needed for this stage. I appreciate it is easier to err on the side of caution and presumably this can be adjusted manually with the batMemory option, but I thought it was worth sharing as canu has otherwise been better than me at predicting resources.
I've included some details on the LSF grid resource summary from the unitigger job below.
Thanks,
Alex
The text was updated successfully, but these errors were encountered: