Skip to content

Enable GPU execution of loops in atm_srk3 involving module level variables #1314

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

abishekg7
Copy link
Collaborator

@abishekg7 abishekg7 commented May 13, 2025

This PR ports some of the loops in the mpas_atm_time_integration and mpas_atm_core modules, which initialize module level variables belonging to the mpas_atm_time_integration module, to OpenACC in preparation for the consolidating all data transfers between host and device to before and after each dynamics call.

In order to do this, we also declare the allocatable module level variables in this scope using the OpenACC declare create statement, which instructs the nvhpc compiler to automatically create and delete the variable whenever it encounters an allocate or deallocate statement, respectively. This commit also removes these variables from manual data movement statements as required.

This PR also introduces some integers for loop bounds, so as to dereference scalar integer pointers which the OpenACC parallel regions do not correctly copy to device memory in the presence of a default(present) clause.

@mgduda mgduda self-requested a review May 13, 2025 15:58
@mgduda mgduda added Atmosphere OpenACC Work related to OpenACC acceleration of code labels May 13, 2025
@mgduda mgduda requested a review from gdicker1 May 13, 2025 16:00
@abishekg7
Copy link
Collaborator Author

Push some changes addressing the review comments so far, and also added acc directives for the two loops left behind in atm_mpas_init_block .

Copy link
Collaborator

@gdicker1 gdicker1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing major from me. This is looking good!

(Just holding off on my review state while the PR is being iterated on)

Copy link
Contributor

@mgduda mgduda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks!

@abishekg7 abishekg7 force-pushed the atmosphere/port_atm_srk3_module_vars branch from 29e2476 to 764bd32 Compare May 14, 2025 12:13
@abishekg7
Copy link
Collaborator Author

Thanks for the review! Rebased and squashed to a single commit.

@gdicker1
Copy link
Collaborator

This commit also introduces some integers for loop bounds, so as to dereference
scalar integer pointers which the OpenACC parallel regions do not correctly
copy to device memory in the presence of a default(present) clause.

Should this be reworded in the commit message? My current understanding is that the issue doesn't involve default(present) clauses but rather the implicit copy of these scalar pointers. Or do you understand differently?

@abishekg7
Copy link
Collaborator Author

This commit also introduces some integers for loop bounds, so as to dereference
scalar integer pointers which the OpenACC parallel regions do not correctly
copy to device memory in the presence of a default(present) clause.

Should this be reworded in the commit message? My current understanding is that the issue doesn't involve default(present) clauses but rather the implicit copy of these scalar pointers. Or do you understand differently?

Thanks for the note! Yeah definitely related to the implicit copy of the scalar pointers, but my understanding was that adding the default(present) tells the compiler not to implicitly copy any non-scalar values, which also includes pointers to scalar values, and we get a resulting not found on device error for such variables.

It might be better to reword this message to just say implicit copy and leave out default(present) if there are other reasons why OpenACC doesn't like scalar pointers.

@gdicker1
Copy link
Collaborator

gdicker1 commented May 15, 2025

Agreed on rewording, I remember these pointers also causing CUDA_ERROR_ILLEGAL_ADDRESS without default(present) (see this comment in PR#1289). I think it would be best to not implicate default(present), and may be worth highlighting the implicit copy.

Simple option, say less:

This commit also introduces some integers for loop bounds, so as to dereference scalar integer pointers which the OpenACC parallel regions do not correctly copy to device memory.

Something like the following would be really specific:

This commit also introduces some integers for loop bounds, so as to dereference the pointers fetched by mpas_pool_get_dimension. Issues occur at run-time with OpenACC when scalar pointers are moved to the device by the implicit copy mechanism and used in a compute region.

Another option:

This commit also dereferences scalar integer pointers used as loop bounds in OpenACC parallel regions. This prevents run-time issues when non-array pointers in compute regions are moved to the device by an implicit copy.

…ables

This commit ports the loops in the mpas_atm_time_integration and mpas_atm_core
modules, which initialize the garbage cells of module level variables belonging
to the mpas_atm_time_integration module, to OpenACC in preparation for the
consolidating all data transfers between host and device to before and after
each dynamics call.

In order to do this, we also declare the allocatable module level variables in
this scope using the OpenACC declare create statement, which instructs the
nvhpc compiler to automatically create and delete the variable whenever it
encounters an allocate or deallocate statement, respectively. This commit also
removes these variables from manual data movement statements as required.

This commit also introduces integer loop bounds, so as to dereference scalar
integer pointers which the OpenACC parallel regions do not correctly copy to
device memory.
@abishekg7 abishekg7 force-pushed the atmosphere/port_atm_srk3_module_vars branch from 764bd32 to aa6de2b Compare May 15, 2025 01:23
@abishekg7
Copy link
Collaborator Author

This commit also introduces some integers for loop bounds, so as to dereference scalar integer pointers which the OpenACC parallel regions do not correctly copy to device memory.

This is useful - thanks! I don't think I have encountered the CUDA_ERROR_ILLEGAL_ADDRESS error in this context yet, but good to know. I've updated the commit message now (and kept it simple)

Copy link
Collaborator

@gdicker1 gdicker1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now

@mgduda mgduda merged commit 3efed62 into MPAS-Dev:develop May 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Atmosphere OpenACC Work related to OpenACC acceleration of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants