-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mpi crushed in Q6 intel- Rackham cluster #2
Comments
Thanks for moving this to the open issue tracker. |
My procedure: cd Q6/src/ module purge module load intel/17.4 module load openmpi/2.1.1 make all COMP=ifort make mpi COMP=ifort Running script (just important part): module purge mpirun -np 4 /home/klaudia/Q/Q6/bin/Qdyn6p relax.inp > relax.log |
Thanks, please also upload the input files needed for relax.inp |
Sorry, but I meant that you attach an archive with all the files (input, topology, if needed fep file) |
Perfect, thank you! |
Please try to build Q6 with the modules |
Hej Klaudia and Paul, Also you should comment out the -Nmpi card in the makefile if compiling with intel only, that variable doesn't exist for intelmpi. You can try compiling just with intel like so:
Using the attached makefile. Before using the makefile make sure to do.
For some reason github doesn't allow uploads of extension less files, that's why I uploaded it with the .txt extension. The compilation takes a looooooooong time, no idea why. I will try to run your files at rackham and see what's going on too. Cheers, M. |
I ran a test already with two cores and srun and it worked fine. |
If there are no more issues now I would close this one again. Cheers Paul |
Also, Mauricio, can you make a quick pull request for the makefile (or push it yourself)? |
I can send a pull-request with the makefile, but, first accepting your |
Ooops, I don't know how I managed to close this. |
fine with me. I did not have more time to look into this, but it reliably crashed ddt during the mpi_init part. No idea what the heck is going on, might be an issue with the mpi set-up on rackhem |
Mauricio, did you have some more luck in testing this? |
Hej, |
For some reason in the Rackham cluster they have aliased mpirun to echo this: alias mpirun='echo Please use srun' They say that this is needed when using intel compiled programs, that is, changing the use of mpirun to srun. So, compiling with:
Produces a binary which works when invoked with:
Paul. I guess you can close these if @klaudia-dais also sees her jobs running when Q is compiled and run in the suggested way. |
I haven't heard anything there again, but I think we may want to keep it until we add something about this to the readme? |
Hej, |
I compiled Q on Rackham (intel) and with mpi it shows error unknown option -Nmpi. But even with that it finish. When I submit job it finish imidietly with this kind of error:[r101:2335] *** An error occurred in MPI_Allreduce
[r101:2335] *** reported by process [1808072705,0]
[r101:2335] *** on communicator MPI_COMM_WORLD
[r101:2335] *** MPI_ERR_OP: invalid reduce operation
[r101:2335] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[r101:2335] *** and potentially your MPI job)
I tried different versions of intel and with intelmpi or openmpi. Everytime crash with similar error. When I run the same job on different cluster and local with Qdyn6 it works without problem.
Any idea how to solve it?
The text was updated successfully, but these errors were encountered: