-
Notifications
You must be signed in to change notification settings - Fork 0
Framework support for halo exchange data structure reuse #1453
Conversation
I have a fork branch with the ocean/develop changes to use this feature ready to go. By manually splicing this PR and the core changes together I was able to run and get time integration performance numbers. Before: Gain: 9.6% |
I keep the ocean core change here: |
If I change the configuration from 33x2 to 33x4 then it gets a few more % faster. Gain 12.5% |
I tested this PR plus ocean changes that test them, i.e. I merged in https://github.com/Larofeticus/MPAS/tree/exchange_reuse_ocean_core. That is here: That passed the MPAS-Ocean nightly regression suite with intel and gnu, including bit-for-bit with previous, bit-for-bit restart, partition, and threading tests. |
@ndkeen I added this PR to the framework on all three cores, as well as the ocean changes to use them, here: https://github.com/ACME-Climate/ACME/tree/mark-petersen/mpas/reuse_halo_exchange This compiles and completes:
Please test for performance when convenient. Note the timer to watch for comparison is the ocean timer: |
Mark: Are you saying there is a different branch I should try? Can you paste the branch? |
@ndkeen , it’s a bit confusing, so here is a summary. You’ve been testing 1. Now there is also 2. Bill, does 2 include the changes in 1? IMPROVEMENT 1: Reduce thread barriers in mpas_dmpar halo packing routines IMPROVEMENT 2: reuse halo exchange data structure Yes, 2 is a new one to test in ACME. |
OK, I will refer to them as the "reduce" and "reuse" branch. It's not intuitive how to get the actual branch name from the links above. Note, looking at other runs from last night, I am still not seeing any overall improvement using the "reduce" branch. As soon as I have time, will look at the raw timer files (or can point you to them?). |
This PR also includes the packing/thread changes from my earlier PR |
@Larofeticus: The atmosphere folks (@mgduda, @climbfuji) would like to test this. Could you comment some instructions here on how to take a single halo exchange call and convert it to the reusable buffer version? I.e., change in subroutine name, arguments, and additional init/finalize routine. If you could push your latest ocean and sea ice branches that include the framework commits in this PR plus the example changes for that core, and point us to those branches, that would also be a helpful template to follow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR passes all MPAS-O tests (see comment #1453 (comment)). In particular, it is bfb between threaded and non-threaded versions, and before/after this PR. The code passes my visual inspection, but others like @mgduda have more experience with the design of the halo exchange.
I won't have time until at least the weekend to test this, I think - if @mgduda can do that instead for the atmosphere core, that would be great ... apologies! |
@climbfuji no problem on the timing. I added your name as a notification only. Thanks for letting us know. |
@mark-petersen refer to PR 1458 I just put up for the modifications to ocean that use this code. Unfortunately it's mixed with some other cleanup I did, so also see the diff on the CICE prototype I sent to noel: . > character (len=*), parameter :: subcycleGroupName = 'subcycleFields' |
Ok, I don't know why that didn't work. The line break change to the pack barrier reduction is why it said there is a conflict. I used the resolve conflicts button, changed it, pushed a green button and it acted like it was fixed. That commit looks like it should fix it, but github says the conflicts are still present? |
@Larofeticus or @mark-petersen Would either of you be willing to clean up the commit history of this PR and to rebase it on the latest |
d177aef
to
85f2541
Compare
@Larofeticus I rebased your changes onto the |
src/framework/mpas_dmpar.F
Outdated
type (mpas_exchange_group), pointer :: exchGroupPtr | ||
integer :: nLen | ||
|
||
integer :: mpi_ierr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this variable is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to remove the unused mpi_ierr
variable and to modify the merge commit message before merging this PR.
@Larofeticus Thanks for the commit to remove the unused Regarding the changes in 80974b1 , I'd propose to move these to a separate PR, pending input from @mark-petersen who last modified the affected code in PR #1464 , since those changes are not related to the addition of the three new routines in this PR. Not having actually tested the changes in 80974b1 , though, I'm a little suspicious: it looks like |
Yeah I should have done it somewhere else earlier. As for the correctness, there are no entries to the set of interface calls that don't go through the pack_buffer_field. It strikes me as silly that anywhere in any code is permitted or able to call a halo exchange with unassociated comm lists; i'm suspicious that the real issue is a bug in the ocean init stream that caused the segfault in the first place. |
@Larofeticus I agree that a halo exchange should not be called with an unassociated comm list. But that behavior was there before #1459, and we have to deal with one issue at a time. I just made a new issue #1477 to record it. Let's remove 80974b1 from this PR to keep it clean. |
src/framework/mpas_dmpar.F
Outdated
@@ -8262,8 +8471,6 @@ subroutine mpas_dmpar_exch_group_pack_buffer_field1d_integer(exchangeGroup, fiel | |||
iErr = MPAS_DMPAR_NOERR | |||
end if | |||
|
|||
commListPtr => exchangeGroup % sendList | |||
if (.not. associated(commListPtr)) return | |||
commListSize = commListPtr % commListSize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Larofeticus and @mark-petersen When I mentioned an unassociated commListPtr
, I was referring to this line and others like it after the removal of the two lines above it: commListPtr
is a local pointer that needs to be set before we can use it to access commListPtr % commListSize
. PR #1464 added the if-test, but commit 80974b1 removes the if-test as well as the line above it.
80974b1
to
61cee75
Compare
61cee75
to
f1ea1af
Compare
Add three subroutines to support reusable halo exchange group data structures: mpas_dmpar_exch_group_build_reusable_buffers: after creating an exchange group and adding fields, build communication lists and allocate message buffers mpas_dmpar_exch_group_reuse_halo_exch: Use pre-allocated data structures to execute a halo exchange on the exchange group. does not destroy data structures when finished mpas_dmpar_exch_group_destroy_reusable_buffers: destroy reusable halo exchange data structures No impact on pre-existing functionality. By itself this also has no impact on performance.
Add 3 subroutines to support reusable halo exchange group data structures:
mpas_dmpar_exch_group_build_reusable_buffers:
mpas_dmpar_exch_group_reuse_halo_exch:
mpas_dmpar_exch_group_destroy_reusable_buffers:
No impact on pre-existing functionality. By itself this also has no impact on performance.