Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactors the EOS modules to use classes #522

Merged
merged 2 commits into from
Nov 22, 2023

Conversation

adcroft
Copy link
Member

@adcroft adcroft commented Nov 15, 2023

We've speculated that branching (select case or deeply nested if trees) are contributing to poor performance. The equation of state (EOS) is a part of MOM6 where the module functions are called many times from all over the code and we have multiple choices to select from. I did try out procedure pointers but the code got messy quickly. Writing each EOS as an extension of a base class allows selection of the EOS when allocating/constructing the instance. This does manage to deliver a branch-free code and seem to deliver a performance improvement with the caveat that I've only evaluated performance using a new micro test (see section below). I suggest some proper performance analysis by a professional before we commit to this approach.

Changes

  • Added a base class in MOM_EOS_base_type.F90
  • All EOS modules now extend this base class
    • This reduces some replicated code between the EOS modules
  • All existing APIs in MOM_EOS now avoid branching associated with the type of EOS and ultimately pass through to a low-level elemental function implementation of the actual EOS
  • Added a new elemental function exposed by MOM_EOS (currently not used in the main model but will be in a future regridding algorithm)
  • There is a speed up over the previous form of EOS due to the reduced branching
    • For some functions, a local implementation of the base class member is needed to gain performance. I deliberately did not implement this optimization for UNESCO or Jackett06 so that the generic implementation of the base class is utilized and we have code coverage.

Micro test results

The micro test in config_src/drivers/timing_tests/time_MOM_EOS.F90 was added previously. The perfmon workflow runs the tests using the new code and PR-target code and presents the results as a percentage change. We are somewhat skeptical that timings can be relied upon in a noisy and shared environment such as the cloud (GitHub actions) but the following screenshot of a results is surprisingly representative of what I obtained in a quiet environment on gaea. There is undoubtedly some variability so we should not heavily rely on GH actions performance but it looks like we could use the GH workflows as a soft indicator.

Screenshot of Github actions perfmon (using gnu compiler) (https://github.com/adcroft/MOM6/actions/runs/6881376496/job/18717587399?pr=2)
image

Screenshot of timing comparison on Gaea c5 (AMD using intel compiler)
image

In these tests I am using the base class functions for Jackett06 and UNESCO in order to exercise those generic functions (i.e. get code coverage) but it has the advantage of showing the advantage of inlining by the compiler when the elemental functions are in the same module as the wrapper function.

Copy link

codecov bot commented Nov 15, 2023

Codecov Report

Attention: 95 lines in your changes are missing coverage. Please review.

Comparison is base (cce4b3d) 37.46% compared to head (98b1c93) 37.49%.

Files Patch % Lines
src/equation_of_state/MOM_EOS.F90 38.09% 45 Missing and 7 partials ⚠️
src/equation_of_state/MOM_EOS_base_type.F90 54.09% 27 Missing and 1 partial ⚠️
src/equation_of_state/MOM_EOS_Roquet_SpV.F90 97.61% 2 Missing and 1 partial ⚠️
src/equation_of_state/MOM_EOS_Roquet_rho.F90 97.34% 2 Missing and 1 partial ⚠️
src/equation_of_state/MOM_EOS_TEOS10.F90 90.90% 3 Missing ⚠️
src/equation_of_state/MOM_EOS_Wright_full.F90 96.51% 2 Missing and 1 partial ⚠️
src/equation_of_state/MOM_EOS_Wright_red.F90 96.47% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           dev/gfdl     #522      +/-   ##
============================================
+ Coverage     37.46%   37.49%   +0.03%     
============================================
  Files           270      271       +1     
  Lines         79763    79533     -230     
  Branches      14830    14816      -14     
============================================
- Hits          29880    29818      -62     
+ Misses        44349    44175     -174     
- Partials       5534     5540       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Hallberg-NOAA Hallberg-NOAA added the refactor Code cleanup with no changes in functionality or results label Nov 16, 2023
Copy link
Member

@Hallberg-NOAA Hallberg-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have examined all of these changes, and they do seem to make sense to me. Apart from noting that there are some units descriptions in repeated comments (highlighted in a separate comment attached to one specific example), I think that this is on the right track, assuming that the performance improvements are robust. I think that these unit descriptions need to be fixed, but otherwise this could be close to ready to go.

The one other comment that I have is that the variable name "this" that is used repeatedly for a variable described as "This EOS" strikes me as odd. In this context, "this" is effectively acting as an adjective on the noun "EOS", so wouldn't it make more sense to use "EOS" - the noun - as the variable name instead?

@marshallward
Copy link
Member

The one other comment that I have is that the variable name "this" that is used repeatedly for a variable described as "This EOS" strikes me as odd. In this context, "this" is effectively acting as an adjective on the noun "EOS", so wouldn't it make more sense to use "EOS" - the noun - as the variable name instead?

this% is fairly standard in object oriented programming, despite its grammatically odd usage.

- Added a base class in MOM_EOS_base_type.F90
- All EOS modules now extend this base class
  - This reduces replicated code between the EOS modules
- All existing APIs in MOM_EOS now avoid branching for the type of
  EOS and ultimately pass through to a low-level elemental function
  implementation of the actual EOS
- Added a new elemental function exposed by MOM_EOS
  (currently not used in the main model)
- There is a speed up over the previous form of EOS due to the
  reduced branching
  - For some functions, a local implementation of the base class member is
    needed to gain performance. I deliberately did not implement this
    optimization for UNESCO or Jackett06 so that the generic implementation
    of the base class is utilized and we have code coverage.
- Added rules to .testing/Makefile to invoke build.timing, run.timing for the
  "target" code checked out for regression tests
- Appended to existing GH "perfmon" workflow
@marshallward
Copy link
Member

Gaea regression: https://gitlab.gfdl.noaa.gov/ogrp/MOM6/-/pipelines/21436 ✔️

@marshallward marshallward merged commit 7cc191a into NOAA-GFDL:dev/gfdl Nov 22, 2023
12 checks passed
@adcroft adcroft deleted the eos-class branch October 24, 2024 12:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactor Code cleanup with no changes in functionality or results
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants