-
Notifications
You must be signed in to change notification settings - Fork 359
Enable GPU execution of loops in atm_srk3 involving module level variables #1314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable GPU execution of loops in atm_srk3 involving module level variables #1314
Conversation
Push some changes addressing the review comments so far, and also added acc directives for the two loops left behind in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing major from me. This is looking good!
(Just holding off on my review state while the PR is being iterated on)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thanks!
29e2476
to
764bd32
Compare
Thanks for the review! Rebased and squashed to a single commit. |
Should this be reworded in the commit message? My current understanding is that the issue doesn't involve |
Thanks for the note! Yeah definitely related to the implicit copy of the scalar pointers, but my understanding was that adding the It might be better to reword this message to just say implicit copy and leave out |
Agreed on rewording, I remember these pointers also causing CUDA_ERROR_ILLEGAL_ADDRESS without Simple option, say less:
Something like the following would be really specific:
Another option:
|
…ables This commit ports the loops in the mpas_atm_time_integration and mpas_atm_core modules, which initialize the garbage cells of module level variables belonging to the mpas_atm_time_integration module, to OpenACC in preparation for the consolidating all data transfers between host and device to before and after each dynamics call. In order to do this, we also declare the allocatable module level variables in this scope using the OpenACC declare create statement, which instructs the nvhpc compiler to automatically create and delete the variable whenever it encounters an allocate or deallocate statement, respectively. This commit also removes these variables from manual data movement statements as required. This commit also introduces integer loop bounds, so as to dereference scalar integer pointers which the OpenACC parallel regions do not correctly copy to device memory.
764bd32
to
aa6de2b
Compare
This is useful - thanks! I don't think I have encountered the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now
This PR ports some of the loops in the
mpas_atm_time_integration
andmpas_atm_core
modules, which initialize module level variables belonging to thempas_atm_time_integration
module, to OpenACC in preparation for the consolidating all data transfers between host and device to before and after each dynamics call.In order to do this, we also declare the allocatable module level variables in this scope using the OpenACC
declare create
statement, which instructs the nvhpc compiler to automatically create and delete the variable whenever it encounters an allocate or deallocate statement, respectively. This commit also removes these variables from manual data movement statements as required.This PR also introduces some integers for loop bounds, so as to dereference scalar integer pointers which the OpenACC parallel regions do not correctly copy to device memory in the presence of a default(present) clause.