Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unitigging memory requirements potentially overestimated for hifi reads #1788

Closed
ASLeonard opened this issue Sep 4, 2020 · 1 comment
Closed
Assignees

Comments

@ASLeonard
Copy link

Hello,
I've been using (Hi)canu to assemble hifi reads, with some good success. I had a question regarding the bat-unitigging stage.

command

The overall command I'm running is canu -p asm -d out_dir genomeSize=3g -pacbio-hifi <input>

versions

I've tested this on different data as well as two (at the time tip) versions

canu snapshot v2.0-development +612 changes (r10105 90065fd)
canu snapshot v2.2-development +15 changes (r10124 2a31172)

issue

Everything works smoothly, up until the 4-unitigger/unitigger.sh script, which requests a bit over 500GB of memory. There are nodes on our grid that have that, so it has run correctly, but queues for a long time. After inspecting the log files, all three assemblies have only used between 5GB and 16GB of max memory during this time.

All other stages seem to reasonably request memory and threads, so I was surprised by such a divergence here. I found a relevant line in Configure.pm, where it assumes this memory requirement based on the 3g estimate provided.

I wasn't sure if this was some artefact of canu assembling shorter/noisier reads than hifi, and so overestimates the resources needed for this stage. I appreciate it is easier to err on the side of caution and presumably this can be adjusted manually with the batMemory option, but I thought it was worth sharing as canu has otherwise been better than me at predicting resources.

I've included some details on the LSF grid resource summary from the unitigger job below.


    CPU time :                                   3567.08 sec.
    Max Memory :                                 15621 MB
    Average Memory :                             12243.67 MB
    Total Requested Memory :                     524288.00 MB
    Delta Memory :                               508667.00 MB
    Run time :                                   697 sec.
    Turnaround time :                            11703 sec.

Thanks,
Alex

@skoren
Copy link
Member

skoren commented Sep 4, 2020

Unitigging is a bit aggressive with it's memory request, especially for HiFi data where it ends up loading only very high quality overlaps. You can add a canu.defaults to the canu bin folder with batMemory=50 (or whatever the max limit is of the faster scheduled nodes) so it will always request less memory as a workaround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants