Ideas for Assembling an Extremely Large Dataset #1373

howla1ke · 2024-09-13T14:56:46Z

Hello, I have NovaSeq 150 bp PE data, that was run on 2 separate runs to obtain the quantity of data we needed. I want to co-assemble both of these, but my dilemma is that I can only allocate 996 GB of RAM. My job was killed because it ran out of memory and it was noted it the spades log that I need approximately 1118 GB of RAM to assemble. Would it be advised to perform the error correction only step separately on each run and then try to co-assemble the output of both of those on assembler only? Is that possible? Do you have any ideas beyond normalizing the data? Thank you, for your time.

yqy6611 · 2024-09-24T21:04:14Z

One approach is using longer k-mer. I found that the default k-mer set for metagenomics is not enough. Longer k-mer will reduce RAM consumption during tandem-repeat resolution. If more SSD is available, personally I use 21,33,55,77 or 21,33,55,77,99,127

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ideas for Assembling an Extremely Large Dataset #1373

Ideas for Assembling an Extremely Large Dataset #1373

howla1ke commented Sep 13, 2024

yqy6611 commented Sep 24, 2024 •

edited

Loading

Ideas for Assembling an Extremely Large Dataset #1373

Ideas for Assembling an Extremely Large Dataset #1373

Comments

howla1ke commented Sep 13, 2024

yqy6611 commented Sep 24, 2024 • edited Loading

yqy6611 commented Sep 24, 2024 •

edited

Loading