-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NPROC > 1 not working #194
Comments
Hi @raulleoncz, sorry you're having issues with the example problem, thanks for providing the error messages. Starting with the second issue, I think this is coming from an update to the SPECFEM2D parameter file that has broken one of the functionalities used in the example (related #196). I'll have to make an update to the code to fix this, sorry! Regarding your first issue, seems like there is some trouble reading your SPECFEM model, if I am reading the error message correctly, it seems like all 20 parts of the model file may be empty? Are you able to check the outputs of meshfem/specfem to make sure they ran properly? |
Hi @raulleoncz, I think I fixed the second issue you were seeing in #197 and the subsequent devel commits (if you are using devel branch). Can you please update and let me know if that solves that issue? |
Hi @bch0w, I already ran the example using the devel branch. I didn't get the previous error but I got this: |
Hi @raulleoncz, woops sorry there was a missing import statement there, I've added that to the latest commit (9c2c082). seisflows/seisflows/system/workstation.py Lines 256 to 259 in 9c2c082
That suggests that something may be going wrong with your forward simulation, either you need to increase |
Hi @bch0w. On the other hand, in the first error I showed above, I have checked the .bin files when running for nproc>1. Fortunately, specfem has added a python script to visualize the 'proc000....bin' files and those files look correct. --- Update --- I have been checking the example's files and the first thing I noticed is that xmeshfem2D was ran with mpi (first thing I was doing differently): Also, when looking the mesher_log.txt we can see that the total number of elements were divided equally, this means that each processor has (in the example case) 400 elements. Comparing my simulation with the example, I see that this condition is not being met. For example, my simulation has 58871, 56329, 57523 and 57677 elements per processor. Is it possible that this affects the simulation? Thanks for the help. |
Based on the last idea, I run forward simulation using a .xyz file and looking for an equally distribution of the spectral elements, literally running xmeshfem2D again and again. After having same number of elements in both init and true models I submitted the job and the first time I got this error: The external numerical solver has returned a nonzero exit code (failure). exc: mpirun -n 4 bin/xspecfem2D |
Hi @raulleoncz, sorry for the slow response here, I'm still trying to figure out the exact issue you're facing.
When you run SPECFEM with nproc > 1 then it is natural for the mesh and simulation to be split over many processors, so this seems fine and expected.
I suspect something is going wrong with meshfem or specfem, do you mind sharing the following log files, you can probably attach them to your message directly or in a zip file, that would help diagnose the problem.
|
Hello @bch0w, I'm sorry for my slow response. I was trying to run the simulations again but I wasn't able to get the same mesh partition. The error that I got is the same as in the first image ("The array has an inhomogeneous shape") and because of that I'm not able to add the log files. Just to give more information, I tried with the last version of Specfem2D (devel brach 8.1.0) and the version of the example 1. Both of them worked as it should be but when I wanted to run another example of specfem, let's say the "tomographic_ocean_model" example, I faced the same error. I don't know if the gcc, mpif90 and gfortran version has something to do. Just in case, I'm using openmpi-gcc12 and fftw 3.3.10_0+gfortran. Are you able to run the simulations with mpirun? Maybe I'm using a wrong version or configuration. |
Hi @raulleoncz, if I'm understanding correctly, this sounds more like a SPECFEM2D issue than a SeisFlows issue. Similarly the SeisFlows examples are really only configured to run a very specific SPECFEM2D problem so there is no guarantee that switching to a different example will work. I'd encourage you to open an issue with SPECFEM (https://github.com/SPECFEM/specfem2d/issues) and hopefully you can get some more targeted feedback. |
Hello @bch0w, I am really sorry for my late response. In one of my examples, I was using 4 processors and the elements were 58871, 56329, 57523 and 57677 elements per processor and it did not worked. I already tried to run one example with 2 processors and it worked because the elements were the same in both processors. I will add below the fwd_mesher.log and fwd_solver.log. Also, I add pictures of the domain of each processor: I hope this information can be useful... |
Hello Mr. @bch0w,
I'm trying to use the MPI option for nproc>1. I already compiled specfem2d using FC=ifort, CC=icc and MPIFC=mpiifort but I'm getting this error:
The parameters than I'm using for the simulation are:
Can you help me to understand why I'm getting this error?
PD. I also try to run the example 1 with nproc 4 and I got this error:
I hope you can help me.
Thanks.
The text was updated successfully, but these errors were encountered: