Skip to content
This repository has been archived by the owner on Oct 23, 2020. It is now read-only.

Framework support for halo exchange data structure reuse #1453

Merged
merged 1 commit into from
Dec 20, 2017

Conversation

Larofeticus
Copy link
Contributor

@Larofeticus Larofeticus commented Nov 14, 2017

Add 3 subroutines to support reusable halo exchange group data structures:

mpas_dmpar_exch_group_build_reusable_buffers:

  • after creating an exchange group and adding fields, build communication lists and allocate message buffers

mpas_dmpar_exch_group_reuse_halo_exch:

  • Use pre-allocated data structures to execute a halo exchange on the exchange group. does not destroy data structures when finished

mpas_dmpar_exch_group_destroy_reusable_buffers:

  • destroy reusable halo exchange data structures

No impact on pre-existing functionality. By itself this also has no impact on performance.

@Larofeticus
Copy link
Contributor Author

I have a fork branch with the ocean/develop changes to use this feature ready to go.

By manually splicing this PR and the core changes together I was able to run and get time integration performance numbers.

Before:
avg 13.782122s -- (13.75133s, 13.75635s, 13.82115s, 13.71621s, 13.86557s)
After:
avg 12.45950s -- (12.33658s, 12.57770s, 12.41332s, 12.48657s, 12.45473s)

Gain: 9.6%

@Larofeticus
Copy link
Contributor Author

@Larofeticus
Copy link
Contributor Author

If I change the configuration from 33x2 to 33x4 then it gets a few more % faster.
avg 12.06371s -- (12.36817s, 12.05414s, 11.95627s, 11.94428s, 11.99571s)

Gain 12.5%

@mark-petersen
Copy link
Contributor

I tested this PR plus ocean changes that test them, i.e. I merged in https://github.com/Larofeticus/MPAS/tree/exchange_reuse_ocean_core. That is here:
https://github.com/mark-petersen/MPAS/tree/merge_bill_exchange_reuse_ocean_core

That passed the MPAS-Ocean nightly regression suite with intel and gnu, including bit-for-bit with previous, bit-for-bit restart, partition, and threading tests.

@mark-petersen
Copy link
Contributor

mark-petersen commented Nov 16, 2017

@ndkeen I added this PR to the framework on all three cores, as well as the ocean changes to use them, here: https://github.com/ACME-Climate/ACME/tree/mark-petersen/mpas/reuse_halo_exchange

This compiles and completes:

./create_test SMS_Ln9.T62_oQU240.GMPAS-IAF.cori-knl_gnu
./create_test SMS_Ln9.T62_oQU240.GMPAS-IAF.cori-knl_intel

Please test for performance when convenient. Note the timer to watch for comparison is the ocean timer: se btr vel for split explicit barotropic velocity. The sub-timers may also be helpful, like btr se subcycle loop and se halo subcycle. If those portions of the code show a speed-up, it would be worth making the changes elsewhere.

@ndkeen
Copy link

ndkeen commented Nov 16, 2017

Mark: Are you saying there is a different branch I should try? Can you paste the branch?
mark-petersen/mpas/reuse_halo_exchange ?

@mark-petersen
Copy link
Contributor

@ndkeen , it’s a bit confusing, so here is a summary. You’ve been testing 1. Now there is also 2. Bill, does 2 include the changes in 1?

IMPROVEMENT 1: Reduce thread barriers in mpas_dmpar halo packing routines
Mpas PR:
#1449
ACME branch:
https://github.com/ACME-Climate/ACME/tree/mark-petersen/mpas/reduce_thread_barriers

IMPROVEMENT 2: reuse halo exchange data structure
MPAS framework PR (this one)
#1453
then we will need an MPAS-Ocean PR to use those. These are both on an ACME branch here:
https://github.com/ACME-Climate/ACME/tree/mark-petersen/mpas/reuse_halo_exchange

Yes, 2 is a new one to test in ACME.

@ndkeen
Copy link

ndkeen commented Nov 16, 2017

OK, I will refer to them as the "reduce" and "reuse" branch. It's not intuitive how to get the actual branch name from the links above. mark-petersen/mpas/reuse_halo_exchange is the "reuse" branch. I'm trying it now.

Note, looking at other runs from last night, I am still not seeing any overall improvement using the "reduce" branch. As soon as I have time, will look at the raw timer files (or can point you to them?).

@Larofeticus
Copy link
Contributor Author

This PR also includes the packing/thread changes from my earlier PR

@mgduda mgduda self-requested a review December 4, 2017 17:53
@mark-petersen
Copy link
Contributor

@Larofeticus: The atmosphere folks (@mgduda, @climbfuji) would like to test this. Could you comment some instructions here on how to take a single halo exchange call and convert it to the reusable buffer version? I.e., change in subroutine name, arguments, and additional init/finalize routine.

If you could push your latest ocean and sea ice branches that include the framework commits in this PR plus the example changes for that core, and point us to those branches, that would also be a helpful template to follow.

Copy link
Contributor

@mark-petersen mark-petersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR passes all MPAS-O tests (see comment #1453 (comment)). In particular, it is bfb between threaded and non-threaded versions, and before/after this PR. The code passes my visual inspection, but others like @mgduda have more experience with the design of the halo exchange.

@climbfuji
Copy link
Contributor

I won't have time until at least the weekend to test this, I think - if @mgduda can do that instead for the atmosphere core, that would be great ... apologies!

@mark-petersen
Copy link
Contributor

@climbfuji no problem on the timing. I added your name as a notification only. Thanks for letting us know.

@Larofeticus
Copy link
Contributor Author

Larofeticus commented Dec 4, 2017

@mark-petersen refer to PR 1458 I just put up for the modifications to ocean that use this code. Unfortunately it's mixed with some other cleanup I did, so also see the diff on the CICE prototype I sent to noel:
`diff components/mpas-cice/model/src/core_cice/shared/mpas_cice_velocity_solver.F cori05% diff components/mpas-cice/model/src/core_cice/shared/mpas_cice_velocity_solver.F mpas_cice_velocity_solver_BILL.F
37a38,39

. > character (len=*), parameter :: subcycleGroupName = 'subcycleFields'
. >
. 1950a1953,1957
. > call mpas_dmpar_exch_group_create(domain, subcycleGroupName)
. > call mpas_dmpar_exch_group_add_field(domain, subcycleGroupName, 'uVelocity')
. > call mpas_dmpar_exch_group_add_field(domain, subcycleGroupName, 'vVelocity')
. > call mpas_dmpar_exch_group_build_reusable_buffers(domain, subcycleGroupName)
. >
. 1959a1967,1968
. > call mpas_dmpar_exch_group_destroy_reusable_buffers(domain, subcycleGroupName)
. >
. 2044,2045c2053,2055
. < call MPAS_dmpar_field_halo_exch(domain, 'uVelocity')
. < call MPAS_dmpar_field_halo_exch(domain, 'vVelocity')
. ---
. > !call MPAS_dmpar_field_halo_exch(domain, 'uVelocity')
. > !call MPAS_dmpar_field_halo_exch(domain, 'vVelocity')
. > call mpas_dmpar_exch_group_reuse_halo_exch(domain, subcycleGroupName)`

@Larofeticus
Copy link
Contributor Author

Ok, I don't know why that didn't work. The line break change to the pack barrier reduction is why it said there is a conflict. I used the resolve conflicts button, changed it, pushed a green button and it acted like it was fixed. That commit looks like it should fix it, but github says the conflicts are still present?

@mgduda
Copy link
Contributor

mgduda commented Dec 13, 2017

@Larofeticus or @mark-petersen Would either of you be willing to clean up the commit history of this PR and to rebase it on the latest develop branch? Also, I think it would be worth including a "hello, world" example of using the reusable buffer routines in the merge commit message (i.e., the first comment in this PR), and to perhaps remove the remarks about needing to place these changes in ocean/develop to support future changes. In my opinion, only information that is of general interest should be included in the actual commit (and merge commit) messages.

@mark-petersen
Copy link
Contributor

@Larofeticus I rebased your changes onto the develop branch and squashed to a single commit. This is identical to your previous code, except for a few white characters clean-up.

type (mpas_exchange_group), pointer :: exchGroupPtr
integer :: nLen

integer :: mpi_ierr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this variable is not needed.

Copy link
Contributor

@mgduda mgduda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to remove the unused mpi_ierr variable and to modify the merge commit message before merging this PR.

@mgduda
Copy link
Contributor

mgduda commented Dec 18, 2017

@Larofeticus Thanks for the commit to remove the unused mpi_ierr variable -- I'll squash this into 85f2541 to keep the commit history tidy.

Regarding the changes in 80974b1 , I'd propose to move these to a separate PR, pending input from @mark-petersen who last modified the affected code in PR #1464 , since those changes are not related to the addition of the three new routines in this PR. Not having actually tested the changes in 80974b1 , though, I'm a little suspicious: it looks like commListPtr may not actually be associated in the lines like commListSize = commListPtr % commListSize after your commit.

@Larofeticus
Copy link
Contributor Author

Yeah I should have done it somewhere else earlier.

As for the correctness, there are no entries to the set of interface calls that don't go through the pack_buffer_field. It strikes me as silly that anywhere in any code is permitted or able to call a halo exchange with unassociated comm lists; i'm suspicious that the real issue is a bug in the ocean init stream that caused the segfault in the first place.

@mark-petersen
Copy link
Contributor

@Larofeticus I agree that a halo exchange should not be called with an unassociated comm list. But that behavior was there before #1459, and we have to deal with one issue at a time. I just made a new issue #1477 to record it. Let's remove 80974b1 from this PR to keep it clean.

@@ -8262,8 +8471,6 @@ subroutine mpas_dmpar_exch_group_pack_buffer_field1d_integer(exchangeGroup, fiel
iErr = MPAS_DMPAR_NOERR
end if

commListPtr => exchangeGroup % sendList
if (.not. associated(commListPtr)) return
commListSize = commListPtr % commListSize
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Larofeticus and @mark-petersen When I mentioned an unassociated commListPtr, I was referring to this line and others like it after the removal of the two lines above it: commListPtr is a local pointer that needs to be set before we can use it to access commListPtr % commListSize. PR #1464 added the if-test, but commit 80974b1 removes the if-test as well as the line above it.

@mark-petersen mark-petersen merged commit f1ea1af into MPAS-Dev:develop Dec 20, 2017
mark-petersen added a commit that referenced this pull request Dec 20, 2017
Add three subroutines to support reusable halo exchange group data
structures:
mpas_dmpar_exch_group_build_reusable_buffers:
  after creating an exchange group and adding fields, build
  communication lists and allocate message buffers
mpas_dmpar_exch_group_reuse_halo_exch:
  Use pre-allocated data structures to execute a halo exchange on the
  exchange group. does not destroy data structures when finished
mpas_dmpar_exch_group_destroy_reusable_buffers:
  destroy reusable halo exchange data structures

No impact on pre-existing functionality. By itself this also has no
impact on performance.
@mark-petersen mark-petersen deleted the exchange_reuse branch December 20, 2017 17:47
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants