Skip to content
This repository has been archived by the owner on Oct 23, 2020. It is now read-only.

Threading optimization: NON-bfb ocean changes by Abhinav #1236

Conversation

mark-petersen
Copy link
Contributor

This PR replaces #1151, and includes only NON bfb changes to the ocean core. This includes:

  1. Statements reorganization in 'btr se subcycle loop' to avoid redundant computations, loop fusions to fuse initialization loops with the main loops.
  2. Removal of unnecessary threading barriers.
  3. Implementation of threading into the mpas reconstruct routine.
  4. Reorganization of statements in 'diagnostic solve' to merge initializations with main loops, removal of extra barriers, vectorization and reorders.
  5. Changing MPI threading level from multiple to funneled.
  6. Reorganization in buffer pack and unpack in halo exchanges to minimize use of barriers.
  7. Implementation of threaded memory buffer initializations.

@mark-petersen mark-petersen changed the title Threading optimization: NON-bfb ocean changes by Abhinov Threading optimization: NON-bfb ocean changes by Abhinav Feb 9, 2017
@mark-petersen
Copy link
Contributor Author

This was NOT bit-for-bit on the ocean model nightly regression suite using gnu, but did pass all the tests for restart, decomposition, completion, etc. On QU240 16 cores one node it was 0.7% faster.

@mark-petersen
Copy link
Contributor Author

Tested with QU120 for 90 days using all three changes, including #1235 and #1237. Compared to branch point on ocean/develop. Changes are not bfb, as expected, but simulation is acceptable for merging. Here sold is ocean/develop, dash is changes:
kineticenergycellavg
kineticenergycellmax
salinity
temperature

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant