-
Notifications
You must be signed in to change notification settings - Fork 358
Enable GPU execution of mpas_reconstruct_2d via OpenACC #1289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable GPU execution of mpas_reconstruct_2d via OpenACC #1289
Conversation
I used the compare_netcdf.py script and looked at the differences between log.atmosphere.0000.out files to characterize the differences. I used the 6 timestep regional testcase. Running the loop near mpas_vector_reconstruct.F L273-283 in unmodified code on the CPU had no answer differences to a GPU run of the commit I started this PR-branch from. When running this code on the GPU I observed differences in the
Footnotes
|
More on the answer differences, this does seem to be due to how the default CPU and GPU math implementations differ. I get no answer differences if I add the flags described in #1287 to my baseline_acc and PR branch builds (namely |
@mgduda This should be ready for re-review now! |
@gdicker1 Other than one more whitespace change request, I think this PR is good to go. After adjusting the whitespace, please feel free to clean up the commit history, and I'll approve the PR. Thanks! |
Add nVertLevels, derefernce integer pointers to loop bounds so they transfer to the GPU correctly, and make loops in vertical dimension explicit for OpenACC parallel loop directives. Also ensure that, after initialization finishes, the invariant fields used in this routine will be on the device.
Ensures the data needed for the mpas_reconstruct_2d routine has been fetched onto the device (GPU) at the beginning and end of the routine. The time for these transfers are captured in a new timer 'mpas_reconstruct_2d [ACC_data_xfer]'. This is enforced by the default(present) clauses. NOTE: coeffs_reconstruct, nEdgesOnCell, edgesOnCell, latCell, and lonCell are also fetched in mpas_reconstruct_2d. This is because this routine is called before these variables would be uploaded to the device during mpas_atm_dynamics_init as part of atmosphere_core initialization. The copyins will not execute anymore once the model starts timestepping and the OpenACC runtime sees the variables are present on the device.
5ed3990
to
607d858
Compare
@mgduda this should be good to go now! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great -- thanks!
This PR slightly modifies and adds OpenACC directives to
mpas_reconstruct_2d
so it can execute on GPU(s).Timing for the OpenACC data transfers in this routine is captured in the log file by a new timer:
mpas_reconstruct_2d [ACC_data_xfer]
.NOTE two things about this PR:
sin
andcos
functions on the GPU causes answer differences from previous GPU results. You can merge the commit in gdicker1/MPAS-Model:framework/acc_mpas_reconstruct_2d-correctness-sincosloops to run these loops on the CPU and verify this.coeffs_reconstruct
,nEdgesOnCell
,edgesOnCell
,latCell
, andlonCell
) are handled in bothmpas_reconstruct_2d
and inmpas_atm_dynamics_{init,finalize}
. Like Enable GPU execution of MPAS-Atmosphere initialization of coupled diagnostic fields via OpenACC #1216, this routine is called beforempas_atm_dynamics_init
.