-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to reduce memory consumption during population calling? #448
Comments
There will be a new release coming very soon (days away) that reduces this and allows to split. @hermannromanek is on it :) |
Has the feature to split up SNF files by chromosome already been released? If so, where can we find the new binaries? |
Hi, Sorry for the delay - we encountered some issues which had to be fixed first and are in the process of re-testing. I just pushed the current release candidate, feel free to give it a try. Bear in mind this is not yet fully tested, there is one open bug we know of causing sniffles to report the same SVs twice. Please share with us any other issues you encounter. To enable the improved population calling, please also make sure the library psutil is installed. Thanks, |
I noticed that there is a new release: https://github.com/fritzsedlazeck/Sniffles/releases/tag/v2.3.2. Does this happen to solve this issue of large RAM usage for many samples? (We estimated Sniffles v2.2. will use up ~500-600 GB of RAM to do multisample calling on 5000 Human ONT samples, with no way to parallelize the effort across multiple machines to reduce the RAM consumption). If so, how does Sniffles v2.3+ handle many samples? Does it automatically throttle the memory usage when it detects that memory usage is becoming too high? We can't seem to find a way to tell Sniffles2.3+ to process the SNF files by chromosome (thereby increasing parallism and reducing RAM usage on a single machine). |
Hey @tnguyengel |
Hi @tnguyengel Yes, sniffles 2.3 should not use as high amounts of memory for merging as 2.2 did. It does so by monitoring RAM usage and freeing up memory once the memory footprint exceeds 2gb per thread/worker process (which will be hit quite soon when processing 5000 samples). Also, while with 2.2 threads were working on one chromosome each, 2.3 threads work on the same chromosome in parallel, thus you get better parallelization when processing only one chromosome. To process a single chromosome you can use the new parameter --contig CONTIG (or -c CONTIG) with CONTIG being the contig name you want to process. Whats the command you've been trying to run sniffles with? Thanks for your feedback, |
For both Sniffles v2.3.2 and Sniffles v2.2, we were running
Facepalm! I missed that. My apologies. We'll try scaling tests again with the --contig option. |
Dear tnguyengel, did you manage to run the 5000 samples? |
We don't have the full 5000 samples to run yet, but that will be the final set that we eventually run with. We will rerun scaling tests with v2.3.3, and report the results here. |
Cool. We keep testing and optimizing. Keep us posted and we will push forward. |
Fyi, initial scaling test with up to 35 samples indicate v2.3.3 would theoretically use ~100GB of RAM to aggregate a contig across 5000 sample cohort. Much more reasonable in terms of resource usage. I'll report more results with more details as we go along. |
While there are more improvements to come, v2.5 should yet improve multisample calling on larger data sets significantly. Merging 35 samples should stay well below 10gb of RAM. |
Hey guys, the new version just got live which is much better in memory consumption. Please test it out. |
We would like to reduce memory consumption during population calling. Is it possible to split SNF files by chromosome or genomic region?
Alternatively, should we supply smaller bams to Sniffles2 by splitting bams such that each bam only contains the reads that align to a chromosome/genomic region?
Related to #282.
The text was updated successfully, but these errors were encountered: