-
Notifications
You must be signed in to change notification settings - Fork 320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable GPU exection of atm_rk_integration_setup via OpenACC #1223
base: develop
Are you sure you want to change the base?
Enable GPU exection of atm_rk_integration_setup via OpenACC #1223
Conversation
249a183
to
7badda5
Compare
7badda5
to
40b7b75
Compare
@abishekg7 I tried the changes in this PR, everything works as expected. Though I wonder if we could still use I think mixing the levels of parallelism and the |
@gdicker1 I don't quite think I follow your comment. If my understanding is correct, we found that collapsing loops with different levels of parallelism leads to incorrect results, and the only place where we could collapse vector loops already has a |
I think it might be worth taking a fresh look at the commit message
and PR description. The text about splitting up the loop might not make sense to anyone who doesn't know the history of the porting of the |
@mgduda, I agree that being proscriptive with the parallelism should be best practice. With what I'd call a fully collapse-able loop (e.g. a 2-level loop, with code only in the innermost loop body, and not inside another loop) I am simply unsure what level of parallelism is assigned by the compiler when collapsed. So, that's where my preference to let "the compiler get it right for me" comes from - and it applies just to the fully collapse-able loops. I think the following example would be fine:
( I also would have expected |
- Removing the condition for obtaining num_scalars in subroutine atm_srk3. This condition introduced issues when running the Jablonowski-Williamson dycore case
40b7b75
to
88859a6
Compare
This PR enables the GPU execution of
atm_rk_integration_setup
subroutineAn initial OpenACC port of the array assignments in
atm_rk_integration_setup
. Needed to split up ACC loops withgang vector collapse(2)
into separate ACC loop statements to achieve correct results.