-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 [BUG] - <title> Inconsistent gradients computed using CPU and GPU #1755
Comments
interesting - is this an issue with SPECFEM or seisflows? maybe you could provide a small SPECFEM example setup where you see different kernel values between CPU and GPU simulations. this would help to reproduce your issue. |
Hi Daniel, I hope this message finds you well. I’m glad to hear from you and apologize for my delayed response. I have uploaded the specfem3D package I’ve been using to GitHub: https://github.com/jlulh/Specfem3d_test/. This version is based on the devel branch (a5bb135), and I made a few minor modifications to the following functions: compute_arrays_source.f90, write_output_SU.f90, compute_kernels.f90, and compute_kernels_hess_el_cudakernel.cu. Additionally, I have included an example (model0050_test) that I used for testing. The MESH, as well as the true and initial model files, were all generated using xmeshfem3D. I tested the kernel of a shot dataset located in the model0050_test/scratch/solver/000000/ folder. You can modify the model0050_test/scratch/solver/000000/DATA/Par_file to set GPU_MODE=true or false, and then run the simulation with the command mpirun -np 1 ./bin/xspecfem3D. You will notice that the output files in OUTPUT_FILES/DATABASES_MPI have inconsistent *_kernel.bin results. Please let me know if you have any questions or need further clarification. Best regards |
thanks for pointing out this inconsistency! there was indeed some differences between CPU and GPU versions in how the sources have been applied in your coupled-domain setup. PR #1759 should address and fix these. I noted that you modified the SU adjoint source reading. in the PR, I incorporated a similar fix to be able to run the kernels with only the elastic adjoint source files (0_dx_SU.adj, ..) for this coupled acoustic/elastic domain setup. also, you seem to have modified the Hessian kernel in file |
Thank you for the fix! I have tested the updated version, and the issue is resolved. I appreciate your help and efforts. |
Description
I built the same model and inverted it using CPU and GPU respectively, and the computed kernel is very different. I tested the latest specfem3d version 4.1.1, old versions 4.1.0 and 4.0.0, and they both have problems.
Affected SPECFEM3D version
4.1.1(a5bb135), 4.1.0(89d1601) and 4.0.0(c97d521)
Your software and hardware environment
Ubuntu 22.04.4 LTS; gcc version 11.4.0; MPICH Version: 4.0; cpu: AMD EPYC 9684X; GPU: RTX4090;
Reproduction steps
I used seisflow and specfem3d for the inversion test, and I found that the adjoint sources computed by the CPU and GPU are the same, but the kernel output by xspecfem3d is very different.
Screenshots
No response
Logs
No response
OS
No response
The text was updated successfully, but these errors were encountered: