MOM_hor_visc: horizontal_viscosity loop reorder #1287

marshallward · 2021-01-12T20:05:09Z

This patch reorders many of the loops in horizontal_viscosity in order
to improve vectorization of the Laplacian and biharmonic viscosities.

Specifically, a single loop containing many different computations were
broken up into many loops of individual operations. This patch required
introduction of several new 2D arrays.

On Gaea's Broadwell CPUs (E5-2697 v4), this is a ~80% speedup on a
32x32x75 benchmark configuration. Smaller speedups were observed on
AMD processors.

On the Gaea nodes, performance appears to be limited by the very large
number of variables in the function stack, and the high degree of stack
spill. Further loop reordering may cause slowdowns unless the stack
usage is reduced.

No answers should be changed by this patch, but deserves extra scrutiny
given the fundamental role of this function in nearly all simulations.

This patch reorders many of the loops in horizontal_viscosity in order to improve vectorization of the Laplacian and biharmonic viscosities. Specifically, a single loop containing many different computations were broken up into many loops of individual operations. This patch required introduction of several new 2D arrays. On Gaea's Broadwell CPUs (E5-2697 v4), this is a ~80% speedup on a 32x32x75 `benchmark` configuration. Smaller speedups were observed on AMD processors. On the Gaea nodes, performance appears to be limited by the very large number of variables in the function stack, and the high degree of stack spill. Further loop reordering may cause slowdowns unless the stack usage is reduced. No answers should be changed by this patch, but deserves extra scrutiny given the fundamental role of this function in nearly all simulations.

codecov-io · 2021-01-14T14:49:57Z

Codecov Report

Merging #1287 (5c93def) into dev/gfdl (0bd16f4) will increase coverage by 0.07%.
The diff coverage is 70.11%.

@@             Coverage Diff              @@
##           dev/gfdl    #1287      +/-   ##
============================================
+ Coverage     45.82%   45.90%   +0.07%     
============================================
  Files           227      225       -2     
  Lines         71552    71507      -45     
============================================
+ Hits          32791    32825      +34     
+ Misses        38761    38682      -79

Impacted Files	Coverage Δ
src/parameterizations/lateral/MOM_hor_visc.F90	`65.70% <70.11%> (-0.94%)`	⬇️
src/framework/MOM_io.F90	`42.95% <0.00%> (-10.62%)`	⬇️
src/framework/MOM_restart.F90	`37.31% <0.00%> (-0.06%)`	⬇️
src/core/MOM_open_boundary.F90	`30.76% <0.00%> (ø)`
src/core/MOM_dynamics_unsplit.F90	`91.76% <0.00%> (ø)`
src/diagnostics/MOM_sum_output.F90	`63.88% <0.00%> (ø)`
src/tracer/MOM_tracer_diabatic.F90	`38.09% <0.00%> (ø)`
src/core/MOM_dynamics_split_RK2.F90	`88.94% <0.00%> (ø)`
src/core/MOM_dynamics_unsplit_RK2.F90	`92.71% <0.00%> (ø)`
src/framework/MOM_io_wrapper.F90
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0bd16f4...7114d63. Read the comment docs.

Hallberg-NOAA · 2021-01-14T14:59:11Z

src/parameterizations/lateral/MOM_hor_visc.F90

@@ -519,22 +528,25 @@ subroutine horizontal_viscosity(u, v, h, diffu, diffv, MEKE, VarMix, G, GV, US,
    ! shearing strain advocated by Smagorinsky (1993) and discussed in
    ! Griffies and Hallberg (2000).

-    ! Calculate horizontal tension
-    do j=Jsq-1,Jeq+2 ; do i=Isq-1,Ieq+2
+    do j=Jsq-2,Jeq+2 ; do i=Isq-2,Ieq+2


By fusing the two loops here (the do-lop blocks that were previously at lines 523 and 533), there are a mix of locations, and the stencil size for the values of u and v that are used are expanded by one point in the i- and j- directions respectively. Also the case-sensitive indexing convention that we once considered using to automatically flip between a SW and a NE indexing convention is broken. Elsewhere in this commit, performance is improved by breaking up large loops, and the only variables shared between these loops are u and v, and not any of the metric arrays. Is the performance gains from fusing these two specific loops large enough to justify these potential costs? Would you consider undoing this particular loop fusion?

Yes, I was a bit worried abotu this one for the reasons that you mention (potentially invalid range and loss of IJ/ij syntax).

I fused these because there was a measureable speedup, and decided that was justification enough, but in the end it was one small improvement over many and could be sacrificed.

But we should probably discuss it a bit further, given that it is faster to fuse them.

On my home machine (AMD Ryzen 5 2600, 2133 MT/s RAM), my benchmark case is about 1-2% slower if I revert this change. (0.998s, 1.012s vs 1.026s 1.020s).

Given that this is actually measurable, perhaps we (aka I) should try to work out some solution which preserves both the speedup and the loop separation.

Maybe the right answer here is to split the loops so we can accept this PR, despite the performance hits, and then look at options for recovering this speedup with a second PR. Alternately, given that this is holding up a number of other PRs but does not seem to interact with them, you could green-light deferring action on this PR so we can move on and sort out the interactions between the next several commits and get ready for the pre-FMS2 PR to main.

I'll revert the loop but add a comment explaining the potential speedup.

I am almost certain this is going to come up more and more as we dig into these loops, so at the least I want to remind myself of the issue.

Hallberg-NOAA · 2021-01-14T15:06:52Z

src/parameterizations/lateral/MOM_hor_visc.F90

        if (CS%add_LES_viscosity) then
-          if (CS%Smagorinsky_Kh) Kh = Kh + CS%Laplac2_const_xx(i,j) * Shear_mag
-          if (CS%Leith_Kh) Kh = Kh + CS%Laplac3_const_xx(i,j) * vert_vort_mag*inv_PI3
+          if (CS%Smagorinsky_Kh) &


Would the code in the 13-line block from the new lines 888 - 900 be a good candidate for restructuring to put the logical tests outside of the do-loops, as was found to be more efficient elsewhere in this PR?

Yes, this is one example of several others, but unfortunately I started to experience some of the "stack spill" issues that we discussed privately, where the function begins to slow down rather than speed up.

At this point, I am too worried to make further changes until we figure out how to reduce the function stack (and perhaps confirm a little more thoroughly that this is what is actually going on, and that reducing the function stack would help here)

This comment block appears later in a similar section. Perhaps another disclaimer is needed here...?

! NOTE: The following do-block can be decomposed and vectorized, but ! appears to cause slowdown on some machines. Evidence suggests that ! this is caused by excessive spilling of stack variables. ! TODO: Vectorize these loops after stack usage has been reduced..

The fusion of tension and shear strains yields a 1-2% speedup, but also breaks the style convention of capitalization of vertex points, and also evaluates the tension terms over a slightly larger domain, so it has been reverted. A note has been added to investigate this later.

Hallberg-NOAA

With the changes in this latest commit, all of my concerns have been addressed, and I think this PR is acceptable (and a very nice contribution).

Hallberg-NOAA · 2021-01-19T18:04:09Z

This PR passed the pipeline testing at https://gitlab.gfdl.noaa.gov/ogrp/MOM6/-/pipelines/11881 .

marshallward and others added 2 commits January 12, 2021 14:14

Merge branch 'dev/gfdl' into hor_visc_2x

7114d63

Hallberg-NOAA reviewed Jan 14, 2021

View reviewed changes

Hallberg-NOAA approved these changes Jan 19, 2021

View reviewed changes

Hallberg-NOAA merged commit be5fb70 into mom-ocean:dev/gfdl Jan 19, 2021

marshallward mentioned this pull request Feb 19, 2021

Dev gfdl main candidate 2021-02-19 #1328

Merged

marshallward deleted the hor_visc_2x branch May 7, 2021 02:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MOM_hor_visc: horizontal_viscosity loop reorder #1287

MOM_hor_visc: horizontal_viscosity loop reorder #1287

marshallward commented Jan 12, 2021

codecov-io commented Jan 14, 2021

Hallberg-NOAA Jan 14, 2021

marshallward Jan 14, 2021

marshallward Jan 18, 2021

Hallberg-NOAA Jan 18, 2021

marshallward Jan 18, 2021

Hallberg-NOAA Jan 14, 2021

marshallward Jan 14, 2021 •

edited

Loading

marshallward Jan 14, 2021

Hallberg-NOAA left a comment

Hallberg-NOAA commented Jan 19, 2021

MOM_hor_visc: horizontal_viscosity loop reorder #1287

MOM_hor_visc: horizontal_viscosity loop reorder #1287

Conversation

marshallward commented Jan 12, 2021

codecov-io commented Jan 14, 2021

Codecov Report

Hallberg-NOAA Jan 14, 2021

Choose a reason for hiding this comment

marshallward Jan 14, 2021

Choose a reason for hiding this comment

marshallward Jan 18, 2021

Choose a reason for hiding this comment

Hallberg-NOAA Jan 18, 2021

Choose a reason for hiding this comment

marshallward Jan 18, 2021

Choose a reason for hiding this comment

Hallberg-NOAA Jan 14, 2021

Choose a reason for hiding this comment

marshallward Jan 14, 2021 • edited Loading

Choose a reason for hiding this comment

marshallward Jan 14, 2021

Choose a reason for hiding this comment

Hallberg-NOAA left a comment

Choose a reason for hiding this comment

Hallberg-NOAA commented Jan 19, 2021

marshallward Jan 14, 2021 •

edited

Loading