Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of Zanna-Bolton-2020 equation discovery model of mesoscale momentum fluxes. Combined commits. #356

Merged
merged 8 commits into from
Jun 26, 2023

Conversation

Pperezhogin
Copy link

Hi @marshallward!

As you suggested, I created a new PR instead of #344.

One possible combination of the namelist parameters:

USE_ZB2020=True
STRESS_SMOOTH_PASS=4
ZB_SCALING=0.75

@codecov
Copy link

codecov bot commented Apr 25, 2023

Codecov Report

Merging #356 (e424754) into dev/gfdl (dd1ee34) will decrease coverage by 0.16%.
The diff coverage is 1.77%.

❗ Current head e424754 differs from pull request most recent head a9f7e2a. Consider uploading reports for the commit a9f7e2a to get more accurate results

@@             Coverage Diff              @@
##           dev/gfdl     #356      +/-   ##
============================================
- Coverage     38.38%   38.22%   -0.16%     
============================================
  Files           268      269       +1     
  Lines         76021    76359     +338     
  Branches      13987    14025      +38     
============================================
+ Hits          29183    29191       +8     
- Misses        41601    41929     +328     
- Partials       5237     5239       +2     
Impacted Files Coverage Δ
src/parameterizations/lateral/MOM_Zanna_Bolton.F90 0.91% <0.91%> (ø)
src/parameterizations/lateral/MOM_hor_visc.F90 50.17% <33.33%> (-0.14%) ⬇️

... and 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@marshallward marshallward requested a review from adcroft May 3, 2023 15:11
Copy link
Member

@Hallberg-NOAA Hallberg-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for this revised commit. After this passes our regression testing (which I am confident it will because this code is not yet being exercised by those tests), we will be able to merge this into dev/gfdl.

Everything here seems correct after my visual inspection of the code. However, it looks to me like this new contribution is using a very large number of halo updates and global reductions, which may make this code inefficient on some large PE-count applications. Specifically, for every model layer, there are 44 calls to get global minimum or maximum values and a total of (22 + 4 x ssd_iter + 6 x HPF_iter x HPF_order + 6 x LPF_iter x LPR_order) halo updates. If, for example, one of these is a Laplacian filter and the other biharmonic, and we use 3 iterations on each step, in a 75 layer model, this will be a total of (88 x 75) = 6600 blocking halo updates and (44 x 75) = 3300 global reductions per call to Zanna_Bolton_2020. For comparison, the barotropic solver is usually the part of the model that has the worst scaling in MOM6, and it typically has about 50 blocking halo updates and might have 1 global reduction (to determine the maximum stable timestep) per call. These blocking halo update calls can be relatively expensive, depending on your configuration and on your computer.

The good news is that the answers would not change if we were to refactor this code (e.g., with some combination of 3-d arrays, wide halos or grouped calculations) to reduce the number of blocking halo updates. The bad news is that such refactoring could take a lot of work. My recommendation is that we should take in this code now, and defer this restructuring and refactoring until someone finds a case where their model becomes unacceptably slow when ZB2020 is turned on. The one thing, though, that I think would be a good idea right now is to add cpu-clock timers (calls to cpu_clock_begin() and cpu_clock_end()) around the call to ZanaBolton_2020 with a CLOCK_ROUTINE granularity to help users to detect if this starts routine starts taking up too large a fraction of the model's overall run-time.

@Pperezhogin
Copy link
Author

Hi @Hallberg-NOAA!

Some refactoring of the code to improve computational efficiency will indeed be needed. Currently, it works same fast as MEKE parameterization in Double Gyre. A few tests in NeverWorld2 show that using this parameterization can increase the runtime twice.

I implemented filters in a very general case, but a typical configuration of filter parameters will require around 4 filter passes per stress tensor element. A similar amount of filtering is used in backscatter parameterization in FESOM (Juricke 2019, Juricke 2020).

Reduction operation (min, max) is retained for debugging reasons and will likely be needed only for one test run in every new configuration.

I will discuss the computational efficiency and timing of the code with Alistair...

Copy link
Member

@Hallberg-NOAA Hallberg-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the reduction operations are only used for debugging, they should be wrapped in a logical test for the runtime parameter DEBUG stored in the control structure of this module, as is done for other debugging code elsewhere in the MOM6 code (look for CS%debug. This should be added, along with the CPU timers around the call to Zanna_Bolton_2020, before this is merged into the MOM6 code, as it can be done with a modest number of added lines.

The refactoring to work with 3-d arrays to reduce the number of blocking pass_var() calls could be done in a future refactoring step, as it will take more work and it should not change answers.

@marshallward
Copy link
Member

marshallward commented Jun 26, 2023

Despite some potential performance issues with excessive collectives halo updates applied per-layer and associated with debug output, we'll go ahead and merge this into dev/gfdl.

A separate issue will be opened to address these problems in the future.

@marshallward marshallward removed the request for review from adcroft June 26, 2023 17:13
@marshallward marshallward dismissed Hallberg-NOAA’s stale review June 26, 2023 17:14

These issues will be resolved in a separate PR.

@marshallward
Copy link
Member

Gaea regression: https://gitlab.gfdl.noaa.gov/ogrp/MOM6/-/pipelines/19630 ✔️ 🟡

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants