-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with parallelization when running CP2K on quantum-mobile:20.11.2a
#180
Comments
I can reproduce this on a Mac host (running in VirtualBox), with QM 20.11.2a. Do I understand correctly that the ssmp version (I guess downloaded from the GitHub releases page, looking at the ansible role) is supposed to be used with OpenMP/multithreading, and not with MPI? In this case, what is the suggested/simplest way to get a compiled binary of CP2K working with MPI on Quantum Mobile (Ubuntu)? |
Yes, |
You should also make sure that |
Yeh basically this binary download has never worked on any Quantum Mobile, which is a little annoying to find out now (I had no part in writing it). The other route is to use the https://github.com/conda-forge/cp2k-feedstock, which we eventually want to look into using for all simulation codes. But I think this may be too difficult to implement at this time (and also v8.1.0 is not yet released as there is still some outstanding issues for it) |
I am not a 100% that when running with SLURM you have to run with I will try to look around a bit in the documentation of SLURM to see if I can find anything |
@sphuber not really, sorry. My guess is that you have to configure slurm and that |
The installation of CP2K is indeed a major pain point. We now have 30+ dependency and keep adding more. The binary we provide with the releases is hand-rolled, statically linked, and stripped-down, e.g. without MPI. While CP2K is included in Debian and Fedora, those distributions have long release cycles. Hence, I believe the way to go are indeed package managers like Conda or Spack. Unfortunately, maintaining those packages is a lot of work. |
Probably stating the obvious, but the quickfix for now would be to limit the number of ranks to 1. |
That is something that we are considering adding in the input generators of the common workflow project, for which this problem is most critical now. But we cannot enforce this on the plugin level and so this means that CP2K is broken on QM for any other calculation where the user selects more than 1 rank. So we will have to find a solution at some point if we want CP2K to run reliably on QM |
Just mentioning for anyone needing to run cp2k on the quantum mobile: The following for me uses 1 process and 12 threads on the quantum mobile 21.05.1 docker container under ubuntu:
The calculation seems to be running fine (except for being rather slow of course ;-) ). |
By the way, what is the status of this? I believe the issue could be solved by installing CP2K from conda-forge. |
Taken from the discussion thread of PR #160 👍
When running a CP2K relax workflow on the
quantum-mobile:20.11.2a
docker container on an Ubuntu host OS, there seems to be a problem with the parallelization. Many more processes are launched than intended and multiple processes start to write independently to the output file.@yakutovicha who ran on MacOS host could not reproduce this.
Here is a screenshot from
htop
once I launch a single CP2K relax workchain with the fast protocol for silicon:It spawns 24 processes for my 12 core CPU and uses them to the max. Could there be a problem with the parallelization that causes it to run double? In the output file, I see the following
It does seem to double each step, or is that normal? Maybe this is all due to the submission script:
The text was updated successfully, but these errors were encountered: