-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelization issue during iMTD-GC in crest 3.0 #284
Comments
Moreover I also ran into a seg fault during optimization ======================================
|
Hey, |
Two comments from my side:
I don't think it will slow down and also haven't really seen evidence for this, so far. Both parallelization strategies still function via the OpenMP library, the threads setting in the .toml adjusts the very same thing as the OMP_NUM_THREADS variable. In CREST 2.12, threads are evenly distributed to the MDs/MTDs only when the number of threads exceeds the number of jobs by at least double. In contrast, CREST 3.0 has all the parallelization done by the OpenMP lib internally. Since tblite and gfn0 and gfnff are all using OpenMP by themselves my understanding is that the "outer" OMP loop running the jobs uses only the required number of threads while all threads exceeding this count go into the parallelization of the potential. These potential calls are quite short compared to the outer OMP loop, so they should not produce much traceable thread load. If all, this is the clean version of how to do it compared to 2.12. Now, this doesn't explain the error from the second comment. It seems to stem from the |
I just repeated one of the meta-dynamics simulation performed during the iMTD-GC run independently using the following setup: (System size is 101 atoms) Moreover, using the --legacy option each simulation in the first round of the iMTD-GC workflow completed around 16 minute so there appears to be some benefit of the old parallelization scheme. While doing that I also noted inconsistent vbias prefactor during the first iteration between the CREST 3.0 iMTD-GC and that with the --legacy option with the same system. Visualization of the trajectory confirms the different behavior as the MD runs with --legacy appears to move much more compared to without the flag. Old version: Many thanks in advance! I will try to recompile CREST on my machine in the meantime to see if that resolves the optimization issue. |
Fair enough, it seems OpenMP doesn't handle the potential parallelization as automatically as I thought. I'll have to look into how and if the parallelization can be motivated into a similar behavior as before. The kbias should not be different but the printout declaration in CREST <3.0 and with the |
I looked into the parallelization issue. Unfortunately with not much success, but some insight. What I was originally hoping for is automatic nested parallelism in the OpenMP regions. Since this was obviously not the case, I tried defining it explicitly, and in the process revised some of the automatic OpenMP handling of the code in general (#288). The problem with nested OpenMP parallel regions is that it can create severe overhead for managing the threads, which might slow down the overall process. I think we are seeing this here as well. Unfortunately, this is nowhere near the speedup a single standalone MD would receive from 3 threads (this would result in a runtime with ~30 sec. wall time), but it still a little better than the old code version and uses the available threads more effectively. In summary, this is unfortunately the best nested OpenMP can achieve for the MDs. I think, to actually restore behavior similar to the xtb subprocess calls, we'd need to refer to a combination of OpenMP and MPI code. At some point I might want to do this, but not right now. |
Adding some timing data for the updated parallelization with the same system: So the speedup appears to be more meaningful for larger system.
|
Not bad, that's a relief.
Let me see, I haven't been able to reproduce this, but have a hunch where it might come from (which is maybe not tblite after all). |
I had to finish some other things and might be able to get to it tomorrow. The double dictionary is in a serial part of the code there is no OMP paralellization. There is now also a dependency on toml-f. |
@pitsteinbach I think it's not related to @kevinkong98 Could you please try a run with the changes committed in #289? This should circumvent the implicit allocation which threw the error. |
@pprcht It worked! The entire conformation search finished without any error! |
great, thanks for testing! |
During the iMTD-GC workflow, if the number of threads specified in CREST 3.0 (using the pre-compiled version) is bigger than number of MTD running co-currently(for example I have 52 threads on my workstation), a lot of threads are unused (threads specified in the .toml input file format) and each MTD simulation only uses one thread. In contrast, CREST 2.12 uses all thread specified by the OMP_NUM_THREADS environment variable. This probably cause significant slowdown when using CREST with large systems. Is there any way to recover the old behavior on parallelization?
Many thanks!
The text was updated successfully, but these errors were encountered: