Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Depth list repro #914

Merged
merged 7 commits into from
Apr 26, 2019
Merged

Depth list repro #914

merged 7 commits into from
Apr 26, 2019

Conversation

marshallward
Copy link
Collaborator

@marshallward marshallward commented Apr 24, 2019

Pull request for the Depth_list.nc code changes. Primary commit log attached below.


Checksum support for Depth_list.nc

This patch appends a checksum for the dependencies of the depth and area lists stored in the Depth_list.nc file, which are used to compute diagnostics based on APE.

The data in Depth_list.nc depends on the grid fields, and may not be reproducible when such grids are constructed internally using compiled code within the executable. This issue was observed in the double_gyre experiment when a PGI-compiled executable was tested using a Depth_list.nc file generated by a GNU-compiled executable.

By appending a checksum for the grid fields used to compute Depth_list.nc, we can ensure that the data is consistent with the experiment grid data. Grid data which is read from external files, such as mosaic or topography fields, are unaffected by this issue.

This patch improves the reproducibilty of standard diagnostics, such as total energy, but has no impact on the reproducibility of the internal model dynamics, which does not depend on Depth_list.nc.

Checksums are computed for the G%bathyT and masked G%areaT grid fields using the FMS mpp_checksum subroutine, which require collective operations, and are stored as hex strings in global attributes of the netCDF file. Attribute names are based on the grid variable names.

Two flags have been introduced to control this behavior:

REQUIRE_DEPTH_LIST_CHECKSUM (default: True)

This flag will abort the run if the Depth_list.nc file is present and checksums are absent from the file. Although this could impose greater restrictions on existing runs, few runs are configured to save the depth list file (READ_DEPTH_LIST) and the default behavior is to reconstruct these lists on every run.

UPDATE_DEPTH_LIST_CHECKSUM (default: False)

When REQUIRE_DEPTH_LIST_CHECKSUM is set to false, this flag will automatically update the checksums of the Depth_list.nc file. While this can affect the reproducibility of APE diagnostics, it will ensure the reproducibility of such diagnostics in subsequent runs.

This variable defaults to false, since we do not want to deliberately modify an existing file without user consent. As newer files are constructed with the checksums, this will presumably be less of an issue.

This patch appends a checksum for the dependencies of the depth and area
lists stored in the Depth_list.nc file, which are used to compute
diagnostics based on APE.

The data in Depth_list.nc depends on the grid fields, and may not be
reproducible when such grids are constructed internally using compiled
code within the executable.  This issue was observed in the
'double_gyre' experiment when a PGI-compiled executable was tested using
a Depth_list.nc file generated by a GNU-compiled executable.

By appending a checksum for the grid fields used to compute
Depth_list.nc, we can ensure that the data is consistent with the
experiment grid data.  Grid data which is read from external files, such
as mosaic or topography fields, are unaffected by this issue.

This patch improves the reproducibilty of standard diagnostics, such as
total energy, but has no impact on the reproducibility of the internal
model dynamics, which does not depend on Depth_list.nc.

Checksums are computed for the G%bathyT and masked G%areaT grid fields
using the FMS mpp_checksum subroutine, which require collective
operations, and are stored as hex strings in global attributes of the
netCDF file.  Strings are used to remain consistent with FMS restart
checksums, and to avoid an observed re-casting of 8-byte integers to
4-bytes by the netCDF library.  Attribute names are based on the grid
variable names.

Two flags have been introduced to control this behavior:

REQUIRE_DEPTH_LIST_CHECKSUMS (default: True)
    This flag will abort the run if the Depth_list.nc file is present
    and checksums are absent from the file.  Although this could impose
    greater restrictions on existing runs, few runs are configured to
    save the depth list file (READ_DEPTH_LIST) and the default behavior
    is to reconstruct these lists on every run.

UPDATE_DEPTH_LIST_CHECKSUMS (default: False)
    When REQUIRE_DEPTH_LIST_CHECKSUMS is set to false, this flag will
    automatically update the checksums of the Depth_list.nc file.  While
    this can affect the reproducibility of APE diagnostics, it will
    ensure the reproducibility of such diagnostics in subsequent runs.
Additional documentation of the parameters used to store Depth_list.nc
attribute names was added.
@Hallberg-NOAA
Copy link
Collaborator

This PR is being tested with https://gitlab.gfdl.noaa.gov/ogrp/MOM6/pipelines/7818

@Hallberg-NOAA
Copy link
Collaborator

Thanks for this contribution @marshallward. I like the improved error checking with this new capability.

Looking over the code, I have a couple of suggestions:

  1. I think that it would be preferrable to do the checksum on the product of G%bathyT and G%mask2dT, so that the checksums are unaltered by missing out any all-land PEs in the virtual layout.

  2. The default values of REQUIRE_DEPTH_LIST_CHECKSUMS and UPDATE_DEPTH_LIST_CHECKSUMS might warrant some discussion. My preference would be for the default to have existing configurations work as they do now, while gracefully updating any existing depth_list file to use the new checks. This would suggest defaults of REQUIRE_DEPTH_LIST_CHECKSUMS = False and UPDATE_DEPTH_LIST_CHECKSUMS = True for the transition; in this case we are not checking that the same topography is being used by default, but the answers are always the same as if there were no depth list file being carried over between run segments. Other opinions may differ on this point.

The depth checksum is now replaced with masked depth, mask2dT * bathyT,
and the calculation of the depth list has also been updated to use the
masked depth.

Various style conformance changes, such as contraction of do and if
terminations (enddo, endif) and reduction of whitespace in various
multiline function call, has also been applied.

Finally, the attribute name docstrings were updated for clarity.
@marshallward
Copy link
Collaborator Author

PR has been updated, including the use of masked depth. Style has also been updated to better match the codebase.

As discussed offline, we will leave the default parameter values as they are, and update the double_gyre test to use non-default values.

@Hallberg-NOAA
Copy link
Collaborator

This updated PR is being tested with https://gitlab.gfdl.noaa.gov/ogrp/MOM6/pipelines/7824.

@marshallward
Copy link
Collaborator Author

marshallward commented Apr 25, 2019

Confirmed that changing

     Dlist(list_pos) = G%bathyT(i,j)

to

     Dlist(list_pos) = G%mask2dT(i,j) * G%bathyT(i,j)

in create_depth_list (L1131 for me) has changed the answers. For the Baltic test:

[nid00699:Marshall.Ward/Gaea-stats-MOM6-examples] $ diff output/ice_ocean_SIS2/Baltic/ocean.stats.pgi regressions/ice_ocean_SIS2/Baltic/ocean.stats.pgi 
3,23c3,23
<      0,       0.000,     0, En 2.440512189549E+00, CFL  0.00000, SL  5.2047E-14, M 6.39565E+16, S 23.7473, T  5.4079, Me  0.00E+00, Se  0.00E+00, Te  0.00E+00
<      3,       0.250,     0, En 1.935570266967E+00, CFL  0.02411, SL  7.4261E-04, M 6.39571E+16, S 23.7471, T  5.4024, Me  6.89E-14, Se  2.19E-12, Te  8.95E-13
<      6,       0.500,     0, En 1.811627537745E+00, CFL  0.02328, SL  7.2648E-04, M 6.39571E+16, S 23.7471, T  5.3936, Me  9.24E-14, Se  3.05E-12, Te  1.30E-12
<      9,       0.750,     0, En 1.782352687401E+00, CFL  0.02238, SL  6.6340E-04, M 6.39570E+16, S 23.7471, T  5.3860, Me  6.77E-14, Se  2.23E-12, Te  9.36E-13
<     12,       1.000,     0, En 1.743474828893E+00, CFL  0.03460, SL  5.9632E-04, M 6.39570E+16, S 23.7471, T  5.3779, Me  5.77E-14, Se  1.87E-12, Te  7.64E-13
...
---
>      0,       0.000,     0, En 2.440512190050E+00, CFL  0.00000, SL -1.0267E-04, M 6.39565E+16, S 23.7473, T  5.4079, Me  0.00E+00, Se  0.00E+00, Te  0.00E+00
>      3,       0.250,     0, En 1.935570266965E+00, CFL  0.02411, SL  7.4286E-04, M 6.39571E+16, S 23.7471, T  5.4024, Me  6.89E-14, Se  2.19E-12, Te  8.95E-13
>      6,       0.500,     0, En 1.811627537742E+00, CFL  0.02328, SL  7.2673E-04, M 6.39571E+16, S 23.7471, T  5.3936, Me  9.24E-14, Se  3.05E-12, Te  1.30E-12
>      9,       0.750,     0, En 1.782352687399E+00, CFL  0.02238, SL  6.6365E-04, M 6.39570E+16, S 23.7471, T  5.3860, Me  6.77E-14, Se  2.23E-12, Te  9.36E-13
>     12,       1.000,     0, En 1.743474828891E+00, CFL  0.03460, SL  5.9657E-04, M 6.39570E+16, S 23.7471, T  5.3779, Me  5.77E-14, Se  1.87E-12, Te  7.64E-13
...

and reverting the change restores the old results.

Using the masked depth (mask2dT * bathyT) was observed to change energy
values within floating point precision, so the changes have been
reverted.  This may be revised at a later time, when we are prepared to
update the energy stats to the new values in the regression tests.

The depth checksum attribute has also been renamed to reflect this
change.  This will allow us to re-define the variable as masked at some
later date, and can distinguish between the masked and unmasked
checksums during testing.
@adcroft
Copy link
Collaborator

adcroft commented Apr 26, 2019

Last commit passed on pipeline https://gitlab.gfdl.noaa.gov/ogrp/MOM6/pipelines/7840

@adcroft adcroft merged commit a91c0a5 into mom-ocean:dev/gfdl Apr 26, 2019
@marshallward marshallward deleted the depth_list_repro branch June 17, 2019 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants