Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev gfdl main candidate 2021-02-19 #1328

Merged
merged 559 commits into from
Feb 24, 2021
Merged

Conversation

marshallward
Copy link
Collaborator

@marshallward marshallward commented Feb 19, 2021

This update to main contains a significant rewrite of MOM6 calls to its
underlying framework (FMS), as well as many new features, bugfixes, updates,
and refactoring of several physical and system modules.

Reviews requested:

Abstract interface to framework

The largest change is the replacement of explicit FMS calls with an abstract
interface to the framework, denoted by the "infra" suffix. All internal MOM6
calls to FMS now pass through this interface, whose implementation is defined
in the config_src directory and is selected at compile time (currently with
mkmf).

(NOTE: The move to config_src is not yet implemented, and will be in a future PR.)

This change will allow MOM6 to continue supporting both the legacy FMS and
newer FMS2 interfaces, as well as future frameworks based on FMS or otherwise.

At this time, only the legacy FMS implementation is provided, but an interface
to the new FMS API ("FMS2") will be included in a future update.

Future contributions to MOM6 should use the API defined in src/framework
rather than native FMS calls.

Couplers and other external drivers are not required to use the MOM6 infra
layer, but may wish to consider switching to the MOM6 interface.

Framework interface abstraction:

New features and changes

The following new features have been added.

Coupler/Driver:

SPEAR:

Sponge:

Ice Shelf:

Shear driven diffusivity (kappa_shear):

Bottom boundary layer:

Diagnostics:


Several bugfixes have been identified and addressed in patches below. In most
cases, these fixes will cause answer changes, but flags have been provided in
order to restore the original behavior.

Bugfixes:


Infrastructure:

Several components have been refactored in order to reduce memory usage, remove
unnecessary operations, and improve performance. In some cases, this required
a change of the internal APIs.

Automated testing has been moved from Travis to GitHub Actions, due to new
resource limits set by Travis. Testing is now faster and more fine-grained.

Documentation has been greatly expanded, and the documentation infrastructure
has been updated.

Some global diagnostics were modified to preserve reproducibility. Numerical
solutions are unaffected.

Refactoring:

Verification testing:

Performance improvements:

Reproducibility:

Documentation:

Build automation:

Misc:


Contributors:

Hallberg-NOAA and others added 30 commits December 4, 2020 10:27
  Corrected non-standard spacing around semicolons separating do loops or if
tests on a single line, following MOM6 code standards.  (If Marshall wins his
planned argument against allowing such constructs, this enforcement of standards
will help in getting rid of them.)  All answers are bitwise identical.
  Added GV to the shared declaration in an openMP directive.  All answers are
bitwise identical.
- Remove "\n" from ALIASES as it tends to be passed through to latex and causes it to fail.
- Add/move notes around in README.md and details/.
Doxygen support for 1.8.13 through 1.8.20
This patch trims some of the netCDF flag management and adds more
explicit tests for the FMS and MOM6 flags.

- We only test for -I flags when compiling the FMS library, since links
  are not required at compile-time.

  (NOTE: This may need verification for static builds on Gaea).

- Further separation of C and Fortran flags.

- libnetcdf support is explicitly added to MOM6.  Previously it was not
  tested and linked.   This is in preparation for FMS 2020.04 support.
+Add option to fix z-file layer initialization bug
The --cflags/--fflags arguments of nc-config/nf-config can be volatile,
since they generally represent library install-time flags, so we instead
replace them with --includedir/--libdir flags, which explicitly denote
the installation paths.

The only real issue here is that nf-config does not have a --libdir
flag, so we use $(nf-config --prefix)/lib and hope that this is where
it's installed.
The verification tests in .testing are updated to run on FMS 2020.04.

Also, several documentation outputs in .gitignore were moved to
docs/.gitignore, as well as some older files and directories that were
interfering with content in ac/deps.
Eliminate unneeded variables in diabatic_driver
Update CPT fork of MOM6 to latest (12/10/2020) from NOAA-GFDL
Doxygen was not assigning some docstrings for multiline variable
declarations, so this patch splits those variables to one per line, with
updated docstrings.

Whitespace was also condensed in several of the declarations.
Reduction in line length of any lines exceeding 120 characters.
@codecov
Copy link

codecov bot commented Feb 19, 2021

Codecov Report

❗ No coverage uploaded for pull request base (main@fe5e605). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1328   +/-   ##
=======================================
  Coverage        ?   45.75%           
=======================================
  Files           ?      234           
  Lines           ?    72542           
  Branches        ?        0           
=======================================
  Hits            ?    33189           
  Misses          ?    39353           
  Partials        ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fe5e605...daee0bb. Read the comment docs.

@marshallward
Copy link
Collaborator Author

Regarding the regressions:

Diagnostic regressions are due to the removal of Kd_layer diagnostic. Our current definition of a diagnostic regression is when both diagnostic checksum outputs change. We will try to improve the diagnostic reporting in a future release.

The tc4 stats regression is genuine, and are likely a consequence of changes to the sponge boundary code.

@kshedstrom
Copy link
Collaborator

I approve this PR.

@gustavo-marques
Copy link
Collaborator

@alperaltuntas will evaluate the PR on NCAR's side.

@@ -267,6 +265,10 @@ subroutine ocean_model_init(Ocean_sfc, OS, Time_init, Time_in, wind_stagger, gas
endif
allocate(OS)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what is the intent of commented code. Perhaps it is needed for ...? It would help to have a single comment to state the purpose and why it is commented (right now).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good question. I think that @MJHarrison-GFDL would be the right person to answer the question of why there are now commented out statements allocating the OS%fluxes, OS%forces and OS%flux_tmp elements of the ocean_state_type. These are actual types in the declaration, so they can not be allocated, but at one point last autumn there was the idea to make these into pointers, but I believe that we decided to backtrack on that to avoid introducing inconsistencies with the other drivers. I don't recall more of the details about this conversation than this. These commented-out lines could be deleted, but as they do not impact the code directly, perhaps this could be done in a subsequent commit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sanAkel Thanks for pointing out this remnant comment related to a redacted version of the code mentioned by @Hallberg-NOAA which was an attempt to handle the rotation of the fluxes and forcing types as part of the tc regressions. These comments can be safely deleted.

@@ -726,7 +722,7 @@ end subroutine ocean_model_end
subroutine ocean_model_save_restart(OS, Time, directory, filename_suffix)
type(ocean_state_type), pointer :: OS !< A pointer to the structure containing the
!! internal ocean state (in).
type(time_type), intent(in) :: Time !< The model time at this call, needed for mpp_write calls.
type(time_type), intent(in) :: Time !< The model time at this call, needed for writing files.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writing restart files?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment could be revised as suggested.

@marshallward
Copy link
Collaborator Author

I have confirmed that the tc4 regression is due to #1244 and #1216.

#1244 fixes an explicit bug in the sponge indexing, and #1216 adds parentheses to the kinetic energy calculation in ocean.stats to restore reproducibility.

If anyone is unable to restore their answers, then they should try reverting these changes.

Copy link
Collaborator

@sanAkel sanAkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments/suggestions. This PR is good; comments can be handled in a future PR.

@@ -887,6 +877,18 @@ subroutine convert_state_to_ocean_type(sfc_state, Ocean_sfc, G, US, patm, press_
enddo ; enddo
endif

if (allocated(sfc_state%melt_potential)) then
do j=jsc_bnd,jec_bnd ; do i=isc_bnd,iec_bnd
Ocean_sfc%melt_potential(i,j) = US%Q_to_J_kg*US%RZ_to_kg_m2 * sfc_state%melt_potential(i+i0,j+j0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😄 to see the units, Thank you!

integer, intent(in) :: jsd !< Data start index in j direction
integer, intent(in) :: jed !< Data end index in j direction
integer, intent(in) :: nk !< Number of levels in k direction
integer, intent(in) :: ntau !< Unknown
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow Unknown! After all, it is an intent in argument, so there must be some expectation, no?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, except the code this was copied from is not documented so we don't know what these arguments are. The BGC API needs documenting.

@@ -61,7 +71,7 @@ subroutine generic_tracer_source(Temp,Salt,rho_dzt,dzt,hblt_depth,ilb,jlb,tau,dt
frunoff,grid_ht, current_wave_stress, sosga)
real, dimension(ilb:,jlb:,:), intent(in) :: Temp !< Potential temperature [deg C]
real, dimension(ilb:,jlb:,:), intent(in) :: Salt !< Salinity [psu]
real, dimension(ilb:,jlb:,:), intent(in) :: rho_dzt
real, dimension(ilb:,jlb:,:), intent(in) :: rho_dzt !< Unknown
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please expand/clarify Unknown dimension for rho_dzt and max_wavelength_band

!> Example type for ocean ensemble DA state
type, public :: OCEAN_CONTROL_STRUCT
integer :: ensemble_size
integer :: ensemble_size !< ensemble size
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as it stands, variable name is same as in documentation how does that help the reader?

suggest: number of ensemble members?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Goog point @sanAkel

end type ocean_profile_type

!> Example forward operator type.
type, public :: forward_operator_type
integer :: num
integer :: num !< how many?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how many of what?

@alperaltuntas
Copy link
Collaborator

@marshallward, looks like the infra interfaces are defined in src/framework in this candidate, and not in config_src. Is the following comment has a typo (i.e., config_src instead of src/framework), or am I misinterpreting it?

All internal MOM6 calls to FMS now pass through [an abstract interface to the framework, denoted by the "infra" suffix], whose implementation is defined in the config_src directory and is selected at compile time (currently with
mkmf).

@marshallward
Copy link
Collaborator Author

@marshallward, looks like the infra interfaces are defined in src/framework in this candidate, and not in config_src. Is the following comment has a typo (i.e., config_src instead of src/framework), or am I misinterpreting it?

You are right, I was getting ahead of myself. This PR introduces the implementation, which will be moved to config_src in a later PR. I'll update the commit notes, thanks for noticing this.

@alperaltuntas
Copy link
Collaborator

We approve this PR. All NCAR tests are passing with no answer changes.

@jiandewang
Copy link
Collaborator

works fine in EMC UFS, passed all regression test

i = CS%col_i(c) ; j = CS%col_j(c)
damp = dt * CS%Iresttime_col(c)
I1pdamp = 1.0 / (1.0 + damp)
tmp_val2(1:nz_data) = CS%Ref_val(m)%p(1:nz_data,c)
if (CS%time_varying_sponges) then
call remapping_core_h(CS%remap_cs, nz_data, CS%Ref_val(m)%h(1:nz_data,c), tmp_val2, &
CS%nz, h(i,j,:), tmp_val1, h_neglect, h_neglect_edge)
CS%nz, h_col, tmp_val1, h_neglect, h_neglect_edge)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

h_col needs to be defined as h(i,j,:) before this line, otherwise it is always 0.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch, good spotting @abozec. I agree, this needs a patch ASAP, and will need re-testing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, and good catch. I'm not sure why h_col was introduced here. The array notation that was replaced looks correct and valid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not change answers for tc4, which is confusing. This oversight applies to both the time_varying and static sponge targets at L967.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only matter if DEFAULT_2018_ANSWERS is set to False (it is True in all the tc cases) due to a masking issue with the ALE sponge target which rendered this bug inconsequential for tc4. Hazards of not exercising code in production runs where we actually look at the results.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @abozec and @MJHarrison-GFDL , this fix has been applied.

MJHarrison-GFDL and others added 4 commits February 23, 2021 13:31
	- this addresses a bug in the apply_ALE_sponge routine which
  	had been passing zero values to the remapping routine for the
	target grid thicknesses.
@marshallward
Copy link
Collaborator Author

Could the others review the bugfix to the ALE sponge?

Copy link
Collaborator

@abozec abozec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the bug fix on the sponge. COAPS approves the release now.

@kshedstrom
Copy link
Collaborator

I approve as well.

@jiandewang
Copy link
Collaborator

@marshallward the fixing has no impact on EMC UFS

@alperaltuntas
Copy link
Collaborator

No impact on NCAR test suite as well. (We didn't have sponge tests, but will add after this gets merged.)

@marshallward
Copy link
Collaborator Author

All groups have accepted, so this is now getting merged. Thanks to everyone.

@marshallward marshallward merged commit 00c2819 into main Feb 24, 2021
@adcroft adcroft deleted the dev-gfdl-main-candidate-2021-02-19 branch November 16, 2021 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.