-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excess pass_var calls in ZB2020 #389
Comments
Hi @Hallberg-NOAA , @adcroft. I am working on accelerating and refactoring my code. My code is probably too complicated. While simplicity is preferable for acceleration. During the implementation of the ZB model, I made a pretty strange choice. I used the mask of outcropped points as a land mask for setting boundary conditions in filters. So, while I perform many iterations of the filter, the predicted subgrid stress does not spread to the outcropped regions. While this decision was somewhat natural to me a year ago, today I am not sure if it was a good choice. It is difficult to explain any physical/mathematical or numerical profit out of it. For example, filters in GME use a simple land mask and the rest of the MOM6 code in most cases does not use any mask for stencil operations (for example, for interpolation). What do you both think, should I keep this feature or I should use GME filters instead? There are additional complicated parts: high-order hypervisosity, and highly scale-selective filters. While I was using these features in early stages of my research, currently I do not use them. So, for simplicity of accelerating the code, these features will be removed. |
Removing unused features is a good strategy. If you ever need to recover the code, it will still be in the history. |
In an experimental branch I prepared a faster code implementation. It is 4 times faster on a single core compared to the previous code and orders of magnitude faster on many cores. Example of runtime in NW2 at 1/2-degree resolution:
Optimization of filters includes marching halo with non-blocking grouped MPI exchanges and implementation of the filter with minimum amount operations (tensor product of 1D filters, reducing amount of multiplications). I wondering, can we try this branch for a couple of model days in OM4 model and if we happy with the performance I will prepare a formal PR. The optimal namelist parameters in NW2:
Measurement of runtime works with clock_grain='ROUTINE'. UPD. See Pull Request for newest code. |
Fixed by #484 |
The recently merged ZB2020 implementation is currently usable but appears to suffer from performance issues. The following changes have been suggested:
Halo updates applied for individual 2D layers could be deferred and applied to the full 3D field.
There are instances of halo updates applied before and after a computation. The halo should account for previous computation and only one should be required.
Many individual halo updates could be bundled into a
do_group_pass
.Expensive collective
min_max
tests for monotonicity may be better suited under a debug-like flag (either the global MOM debug flag or a ZB2020-specific flag.CPU clocks around calls to ZB2020 would be useful for diagnosing future issues.
These are discussed in detail in #356.
The text was updated successfully, but these errors were encountered: