-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RELION-3.1 Pre-read all particles into RAM #514
Comments
Are you sure you used the same |
Yes. |
This is puzzling... We didn't change codes related to |
Yes, it happened to other datasets and on another computer as well. That computer has even more ram (768 GB) and Relion 3.1 still attempted to fill it with the Master MPI proc (each Slave MPI procs use roughly < 10 GB RAM). It uses Open MPI 2.1.1 and 2x Tesla M60 so I can say the issue is somehow independent from the systems I am using (?). |
Hi. I am having similar issue.. I was monitoring the RAM usage now (relion3.1) and before (<3.1 versions). In 3.1b the particles are read to RAM and then at a first step (maximization) suddenly RAM is filled up to it's max and there is a longer waiting.. In subsequent steps, after RELION prints Maximization is done in XX seconds, the RAM is again filled up to the max (actually more than NCPU x 'du -hs Extract/jobXXX' and there is like 5-10 min waiting till RELION is going to next iteration. THe same dataset reading from disk - no waiting.. In 3.0stable there was no issue like that. |
What are the box size, pixel size and resolution? Are you familiar with
Is this from the scratch, or from the original location? |
What are the box size, pixel size and resolution? Can you investigate what RELION is doing during "5-10 min waiting"? Is this from the scratch, or from the original location? |
Can you do |
as in 'gdb bt -p XX' ? |
btw, i rolled back to relion3.0s on the same gpu station - works flawless.. |
After |
[Thread debugging using libthread_db enabled] |
Thanks. Can you try the same with other MPI processes? |
sure. root@jekyll:/home# for i in For help, type "help".
Quit anyway? (y or n) y For help, type "help".
Quit anyway? (y or n) y For help, type "help".
Quit anyway? (y or n) y For help, type "help".
Quit anyway? (y or n) y |
Thanks. This is very useful. Another question: how many particles do you have? |
Another question: how many particles do you have? |
OK, probably I understood what is happening. How many optics groups do you have? |
|
--random_seed 0 |
The problem seems to be the sorting of particles by optics groups. This sorting was not present in RELION 3.0. |
i see. Thanks! |
I made an improvement to the code; can you test it without setting |
The latest version on the repository should fix this issue. If not, please reopen this issue. |
in the latest RELION-3.1 (commit 9d7525), when reading ptcls to RAM, one still has to specify "--random_seed 0", otherwise it "eats" up all RAM and stalls.. If read from disk - normal behaviour. |
Although the commit 76fa3d2 reduced the memory usage, nonetheless RELION 3.1 needs more space and operations than 3.0 due to optics groups. I don't think we can reduce further. If you have plenty of RAM, you can make a RAM disk and use it as a scratch space. |
Note to self:
|
@nym2834610, @ashkumatov What is the number of particles? Which compiler did you use? |
~200 K particles at 1A/pix, box size 400. We use the bash shell. |
What is the compiler, not the shell? |
Actually I tried with 6k ptcls, which is about 10Gb. Running it on 4GPUs, using 5CPUs total. If I don’t use the flag “--random_seed 0“, RAM consumption goes up to 240Gb.. and at the peak consumption steps, there’s a really long waiting. if I use the flag, it’s a standard behaviour. I will check a compiler version on Monday. |
Is the memory consumption more or less proportional to the number of the particles? How much does it use with 3K particles, for example? |
It loads into ram proportional amount and then at certain steps it goes to max RAM available.. |
Does it use all RAM and take very long even with say 100 particles? |
Thanks for you comment! It actually helped me to find the problem: i typically compile two version of RELION - with CUDA8.0 (to be able to run GCTF) and CUDA9.2, which require different version of C complier. Basically, i forgot to switch back to newer C compiler when compiling with CUDA9.2 |
@ashkumatov Can you comment on which compiler works and which does not? |
We use the CUDA 7.5 complier. I'll try other complier versions on Monday and let you know if the problem is gone without randomseed to 0. |
What is the version of GCC invoked by your CUDA compiler (nvcc)? |
@biochem-fan actually, i did more tests - still the problem is there. I load to RAM 90Gb of ptcls for 2D classification and at some steps the RAM gets filled up to 180Gb.. so essentially doubles. |
I think double are reasonable. We need space to move particles arounds. But earlier you said 10Gb of particles consume "up to 240Gb", which is quite unexpected and something I cannot reproduce locally. Does the memory consumption different between GCC versions (I don't care CUDA versions)? Does it still happen with very very few particles, say 100? When particles ( |
@ashkumatov @nym2834610 @kaoweichun |
We also had large peaks of RAM usage that stalled the jobs (relion 3.1 downloaded on March 5). The new version solved these issues. Thanks! |
Hello,
I used Relion 3.1-beta-commit-a6aaa5 to repeat a previously completed 3D auto-refine from Relion 3.0.7 without changing any settings. I ran it on a single machine (specs see below) and I always enabled
Pre-read all particles into RAM
. The typical behaviour on 3.0.7 is, one master MPI proc uses 250 GB RAM, and each slave MPI proc uses some 20 GB RAM. However, in Relion 3.1, upon starting refinement the master MPI proc used up all the RAM capacities till mpirun crashed, before estimating initial noise spectra (update: and each slave MPI proc used only around 4 GBM RAM in this stage).By observing the aforementioned RAM usage behaviour of Relion 3.1, I simply disabled
Pre-read all particles into RAM
and I could avoid this problem and the refinement could proceed, but I am afraid that there is an issue with memory usage.Thanks,
WCK
Computer brief specs: 48 cores (hyperthreading-enabled)/ 384 GB RAM/ 2 TB SSD/ Open MPI 3.0.2/ GTX 1080 Ti (4x)/ sge/ CentOS 7.6/ CUDA version 10.1
The text was updated successfully, but these errors were encountered: