Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 [BUG] - <title> Inconsistent gradients computed using CPU and GPU #1755

Closed
jlulh opened this issue Oct 28, 2024 · 4 comments
Closed

🐛 [BUG] - <title> Inconsistent gradients computed using CPU and GPU #1755

jlulh opened this issue Oct 28, 2024 · 4 comments
Labels

Comments

@jlulh
Copy link

jlulh commented Oct 28, 2024

Description

I built the same model and inverted it using CPU and GPU respectively, and the computed kernel is very different. I tested the latest specfem3d version 4.1.1, old versions 4.1.0 and 4.0.0, and they both have problems.

Affected SPECFEM3D version

4.1.1(a5bb135), 4.1.0(89d1601) and 4.0.0(c97d521)

Your software and hardware environment

Ubuntu 22.04.4 LTS; gcc version 11.4.0; MPICH Version: 4.0; cpu: AMD EPYC 9684X; GPU: RTX4090;

Reproduction steps

I used seisflow and specfem3d for the inversion test, and I found that the adjoint sources computed by the CPU and GPU are the same, but the kernel output by xspecfem3d is very different.

Screenshots

No response

Logs

No response

OS

No response

@jlulh jlulh added the bug label Oct 28, 2024
@danielpeter
Copy link
Contributor

interesting - is this an issue with SPECFEM or seisflows?

maybe you could provide a small SPECFEM example setup where you see different kernel values between CPU and GPU simulations. this would help to reproduce your issue.

@jlulh
Copy link
Author

jlulh commented Oct 31, 2024

Hi Daniel,

I hope this message finds you well. I’m glad to hear from you and apologize for my delayed response.

I have uploaded the specfem3D package I’ve been using to GitHub: https://github.com/jlulh/Specfem3d_test/. This version is based on the devel branch (a5bb135), and I made a few minor modifications to the following functions: compute_arrays_source.f90, write_output_SU.f90, compute_kernels.f90, and compute_kernels_hess_el_cudakernel.cu.

Additionally, I have included an example (model0050_test) that I used for testing. The MESH, as well as the true and initial model files, were all generated using xmeshfem3D. I tested the kernel of a shot dataset located in the model0050_test/scratch/solver/000000/ folder. You can modify the model0050_test/scratch/solver/000000/DATA/Par_file to set GPU_MODE=true or false, and then run the simulation with the command mpirun -np 1 ./bin/xspecfem3D. You will notice that the output files in OUTPUT_FILES/DATABASES_MPI have inconsistent *_kernel.bin results.

Please let me know if you have any questions or need further clarification.

Best regards

@danielpeter
Copy link
Contributor

thanks for pointing out this inconsistency! there was indeed some differences between CPU and GPU versions in how the sources have been applied in your coupled-domain setup. PR #1759 should address and fix these.

I noted that you modified the SU adjoint source reading. in the PR, I incorporated a similar fix to be able to run the kernels with only the elastic adjoint source files (0_dx_SU.adj, ..) for this coupled acoustic/elastic domain setup.

also, you seem to have modified the Hessian kernel in file compute_kernels.f90. note that the current SPECFEM3D version implements an approximate source-receiver Hessian kernel (multiplying accel() * b_accel()), as compared to your source-source Hessian modification (b_accel() * b_accel()). you would have to re-do that modification when pulling and trying out the new devel version (and the same with your SU header modification).

@jlulh
Copy link
Author

jlulh commented Nov 20, 2024

Thank you for the fix! I have tested the updated version, and the issue is resolved. I appreciate your help and efforts.

@jlulh jlulh closed this as completed Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants