-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of Zanna-Bolton-2020 equation discovery model of mesoscale momentum fluxes. Combined commits. #356
Implementation of Zanna-Bolton-2020 equation discovery model of mesoscale momentum fluxes. Combined commits. #356
Conversation
Codecov Report
@@ Coverage Diff @@
## dev/gfdl #356 +/- ##
============================================
- Coverage 38.38% 38.22% -0.16%
============================================
Files 268 269 +1
Lines 76021 76359 +338
Branches 13987 14025 +38
============================================
+ Hits 29183 29191 +8
- Misses 41601 41929 +328
- Partials 5237 5239 +2
... and 1 file with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for this revised commit. After this passes our regression testing (which I am confident it will because this code is not yet being exercised by those tests), we will be able to merge this into dev/gfdl.
Everything here seems correct after my visual inspection of the code. However, it looks to me like this new contribution is using a very large number of halo updates and global reductions, which may make this code inefficient on some large PE-count applications. Specifically, for every model layer, there are 44 calls to get global minimum or maximum values and a total of (22 + 4 x ssd_iter + 6 x HPF_iter x HPF_order + 6 x LPF_iter x LPR_order) halo updates. If, for example, one of these is a Laplacian filter and the other biharmonic, and we use 3 iterations on each step, in a 75 layer model, this will be a total of (88 x 75) = 6600 blocking halo updates and (44 x 75) = 3300 global reductions per call to Zanna_Bolton_2020. For comparison, the barotropic solver is usually the part of the model that has the worst scaling in MOM6, and it typically has about 50 blocking halo updates and might have 1 global reduction (to determine the maximum stable timestep) per call. These blocking halo update calls can be relatively expensive, depending on your configuration and on your computer.
The good news is that the answers would not change if we were to refactor this code (e.g., with some combination of 3-d arrays, wide halos or grouped calculations) to reduce the number of blocking halo updates. The bad news is that such refactoring could take a lot of work. My recommendation is that we should take in this code now, and defer this restructuring and refactoring until someone finds a case where their model becomes unacceptably slow when ZB2020 is turned on. The one thing, though, that I think would be a good idea right now is to add cpu-clock timers (calls to cpu_clock_begin()
and cpu_clock_end()
) around the call to ZanaBolton_2020 with a CLOCK_ROUTINE granularity to help users to detect if this starts routine starts taking up too large a fraction of the model's overall run-time.
Hi @Hallberg-NOAA! Some refactoring of the code to improve computational efficiency will indeed be needed. Currently, it works same fast as MEKE parameterization in Double Gyre. A few tests in NeverWorld2 show that using this parameterization can increase the runtime twice. I implemented filters in a very general case, but a typical configuration of filter parameters will require around 4 filter passes per stress tensor element. A similar amount of filtering is used in backscatter parameterization in FESOM (Juricke 2019, Juricke 2020). Reduction operation (min, max) is retained for debugging reasons and will likely be needed only for one test run in every new configuration. I will discuss the computational efficiency and timing of the code with Alistair... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the reduction operations are only used for debugging, they should be wrapped in a logical test for the runtime parameter DEBUG stored in the control structure of this module, as is done for other debugging code elsewhere in the MOM6 code (look for CS%debug
. This should be added, along with the CPU timers around the call to Zanna_Bolton_2020, before this is merged into the MOM6 code, as it can be done with a modest number of added lines.
The refactoring to work with 3-d arrays to reduce the number of blocking pass_var()
calls could be done in a future refactoring step, as it will take more work and it should not change answers.
Despite some potential performance issues with excessive A separate issue will be opened to address these problems in the future. |
These issues will be resolved in a separate PR.
Gaea regression: https://gitlab.gfdl.noaa.gov/ogrp/MOM6/-/pipelines/19630 ✔️ 🟡 |
Hi @marshallward!
As you suggested, I created a new PR instead of #344.
One possible combination of the namelist parameters: