Depth list repro #914

marshallward · 2019-04-24T15:08:19Z

Pull request for the Depth_list.nc code changes. Primary commit log attached below.

Checksum support for `Depth_list.nc`

This patch appends a checksum for the dependencies of the depth and area lists stored in the Depth_list.nc file, which are used to compute diagnostics based on APE.

The data in Depth_list.nc depends on the grid fields, and may not be reproducible when such grids are constructed internally using compiled code within the executable. This issue was observed in the double_gyre experiment when a PGI-compiled executable was tested using a Depth_list.nc file generated by a GNU-compiled executable.

By appending a checksum for the grid fields used to compute Depth_list.nc, we can ensure that the data is consistent with the experiment grid data. Grid data which is read from external files, such as mosaic or topography fields, are unaffected by this issue.

This patch improves the reproducibilty of standard diagnostics, such as total energy, but has no impact on the reproducibility of the internal model dynamics, which does not depend on Depth_list.nc.

Checksums are computed for the G%bathyT and masked G%areaT grid fields using the FMS mpp_checksum subroutine, which require collective operations, and are stored as hex strings in global attributes of the netCDF file. Attribute names are based on the grid variable names.

Two flags have been introduced to control this behavior:

`REQUIRE_DEPTH_LIST_CHECKSUM` (default: True)

This flag will abort the run if the Depth_list.nc file is present and checksums are absent from the file. Although this could impose greater restrictions on existing runs, few runs are configured to save the depth list file (READ_DEPTH_LIST) and the default behavior is to reconstruct these lists on every run.

`UPDATE_DEPTH_LIST_CHECKSUM` (default: False)

When REQUIRE_DEPTH_LIST_CHECKSUM is set to false, this flag will automatically update the checksums of the Depth_list.nc file. While this can affect the reproducibility of APE diagnostics, it will ensure the reproducibility of such diagnostics in subsequent runs.

This variable defaults to false, since we do not want to deliberately modify an existing file without user consent. As newer files are constructed with the checksums, this will presumably be less of an issue.

This patch appends a checksum for the dependencies of the depth and area lists stored in the Depth_list.nc file, which are used to compute diagnostics based on APE. The data in Depth_list.nc depends on the grid fields, and may not be reproducible when such grids are constructed internally using compiled code within the executable. This issue was observed in the 'double_gyre' experiment when a PGI-compiled executable was tested using a Depth_list.nc file generated by a GNU-compiled executable. By appending a checksum for the grid fields used to compute Depth_list.nc, we can ensure that the data is consistent with the experiment grid data. Grid data which is read from external files, such as mosaic or topography fields, are unaffected by this issue. This patch improves the reproducibilty of standard diagnostics, such as total energy, but has no impact on the reproducibility of the internal model dynamics, which does not depend on Depth_list.nc. Checksums are computed for the G%bathyT and masked G%areaT grid fields using the FMS mpp_checksum subroutine, which require collective operations, and are stored as hex strings in global attributes of the netCDF file. Strings are used to remain consistent with FMS restart checksums, and to avoid an observed re-casting of 8-byte integers to 4-bytes by the netCDF library. Attribute names are based on the grid variable names. Two flags have been introduced to control this behavior: REQUIRE_DEPTH_LIST_CHECKSUMS (default: True) This flag will abort the run if the Depth_list.nc file is present and checksums are absent from the file. Although this could impose greater restrictions on existing runs, few runs are configured to save the depth list file (READ_DEPTH_LIST) and the default behavior is to reconstruct these lists on every run. UPDATE_DEPTH_LIST_CHECKSUMS (default: False) When REQUIRE_DEPTH_LIST_CHECKSUMS is set to false, this flag will automatically update the checksums of the Depth_list.nc file. While this can affect the reproducibility of APE diagnostics, it will ensure the reproducibility of such diagnostics in subsequent runs.

Additional documentation of the parameters used to store Depth_list.nc attribute names was added.

Hallberg-NOAA · 2019-04-24T17:21:06Z

This PR is being tested with https://gitlab.gfdl.noaa.gov/ogrp/MOM6/pipelines/7818

Hallberg-NOAA · 2019-04-24T17:38:41Z

Thanks for this contribution @marshallward. I like the improved error checking with this new capability.

Looking over the code, I have a couple of suggestions:

I think that it would be preferrable to do the checksum on the product of G%bathyT and G%mask2dT, so that the checksums are unaltered by missing out any all-land PEs in the virtual layout.
The default values of REQUIRE_DEPTH_LIST_CHECKSUMS and UPDATE_DEPTH_LIST_CHECKSUMS might warrant some discussion. My preference would be for the default to have existing configurations work as they do now, while gracefully updating any existing depth_list file to use the new checks. This would suggest defaults of REQUIRE_DEPTH_LIST_CHECKSUMS = False and UPDATE_DEPTH_LIST_CHECKSUMS = True for the transition; in this case we are not checking that the same topography is being used by default, but the answers are always the same as if there were no depth list file being carried over between run segments. Other opinions may differ on this point.

The depth checksum is now replaced with masked depth, mask2dT * bathyT, and the calculation of the depth list has also been updated to use the masked depth. Various style conformance changes, such as contraction of do and if terminations (enddo, endif) and reduction of whitespace in various multiline function call, has also been applied. Finally, the attribute name docstrings were updated for clarity.

marshallward · 2019-04-24T20:22:59Z

PR has been updated, including the use of masked depth. Style has also been updated to better match the codebase.

As discussed offline, we will leave the default parameter values as they are, and update the double_gyre test to use non-default values.

Hallberg-NOAA · 2019-04-24T20:39:42Z

This updated PR is being tested with https://gitlab.gfdl.noaa.gov/ogrp/MOM6/pipelines/7824.

marshallward · 2019-04-25T07:29:00Z

Confirmed that changing

     Dlist(list_pos) = G%bathyT(i,j)

to

     Dlist(list_pos) = G%mask2dT(i,j) * G%bathyT(i,j)

in create_depth_list (L1131 for me) has changed the answers. For the Baltic test:

[nid00699:Marshall.Ward/Gaea-stats-MOM6-examples] $ diff output/ice_ocean_SIS2/Baltic/ocean.stats.pgi regressions/ice_ocean_SIS2/Baltic/ocean.stats.pgi 
3,23c3,23
<      0,       0.000,     0, En 2.440512189549E+00, CFL  0.00000, SL  5.2047E-14, M 6.39565E+16, S 23.7473, T  5.4079, Me  0.00E+00, Se  0.00E+00, Te  0.00E+00
<      3,       0.250,     0, En 1.935570266967E+00, CFL  0.02411, SL  7.4261E-04, M 6.39571E+16, S 23.7471, T  5.4024, Me  6.89E-14, Se  2.19E-12, Te  8.95E-13
<      6,       0.500,     0, En 1.811627537745E+00, CFL  0.02328, SL  7.2648E-04, M 6.39571E+16, S 23.7471, T  5.3936, Me  9.24E-14, Se  3.05E-12, Te  1.30E-12
<      9,       0.750,     0, En 1.782352687401E+00, CFL  0.02238, SL  6.6340E-04, M 6.39570E+16, S 23.7471, T  5.3860, Me  6.77E-14, Se  2.23E-12, Te  9.36E-13
<     12,       1.000,     0, En 1.743474828893E+00, CFL  0.03460, SL  5.9632E-04, M 6.39570E+16, S 23.7471, T  5.3779, Me  5.77E-14, Se  1.87E-12, Te  7.64E-13
...
---
>      0,       0.000,     0, En 2.440512190050E+00, CFL  0.00000, SL -1.0267E-04, M 6.39565E+16, S 23.7473, T  5.4079, Me  0.00E+00, Se  0.00E+00, Te  0.00E+00
>      3,       0.250,     0, En 1.935570266965E+00, CFL  0.02411, SL  7.4286E-04, M 6.39571E+16, S 23.7471, T  5.4024, Me  6.89E-14, Se  2.19E-12, Te  8.95E-13
>      6,       0.500,     0, En 1.811627537742E+00, CFL  0.02328, SL  7.2673E-04, M 6.39571E+16, S 23.7471, T  5.3936, Me  9.24E-14, Se  3.05E-12, Te  1.30E-12
>      9,       0.750,     0, En 1.782352687399E+00, CFL  0.02238, SL  6.6365E-04, M 6.39570E+16, S 23.7471, T  5.3860, Me  6.77E-14, Se  2.23E-12, Te  9.36E-13
>     12,       1.000,     0, En 1.743474828891E+00, CFL  0.03460, SL  5.9657E-04, M 6.39570E+16, S 23.7471, T  5.3779, Me  5.77E-14, Se  1.87E-12, Te  7.64E-13
...

and reverting the change restores the old results.

Using the masked depth (mask2dT * bathyT) was observed to change energy values within floating point precision, so the changes have been reverted. This may be revised at a later time, when we are prepared to update the energy stats to the new values in the regression tests. The depth checksum attribute has also been renamed to reflect this change. This will allow us to re-define the variable as masked at some later date, and can distinguish between the masked and unmasked checksums during testing.

adcroft · 2019-04-26T14:23:01Z

Last commit passed on pipeline https://gitlab.gfdl.noaa.gov/ogrp/MOM6/pipelines/7840

marshallward added 3 commits April 24, 2019 10:53

Further documentation of Depth_list.nc checksums

f95d9eb

Additional documentation of the parameters used to store Depth_list.nc attribute names was added.

Depth list documentation typo fix

ce93243

Token bugfix

61a96a8

marshallward added 2 commits April 25, 2019 15:02

Bugfix: Checksum the unmasked depth

0b33904

adcroft merged commit a91c0a5 into mom-ocean:dev/gfdl Apr 26, 2019

marshallward deleted the depth_list_repro branch June 17, 2019 13:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Depth list repro #914

Depth list repro #914

marshallward commented Apr 24, 2019 •

edited

Loading

Hallberg-NOAA commented Apr 24, 2019

Hallberg-NOAA commented Apr 24, 2019

marshallward commented Apr 24, 2019

Hallberg-NOAA commented Apr 24, 2019

marshallward commented Apr 25, 2019 •

edited

Loading

adcroft commented Apr 26, 2019

Depth list repro #914

Depth list repro #914

Conversation

marshallward commented Apr 24, 2019 • edited Loading

Checksum support for Depth_list.nc

REQUIRE_DEPTH_LIST_CHECKSUM (default: True)

UPDATE_DEPTH_LIST_CHECKSUM (default: False)

Hallberg-NOAA commented Apr 24, 2019

Hallberg-NOAA commented Apr 24, 2019

marshallward commented Apr 24, 2019

Hallberg-NOAA commented Apr 24, 2019

marshallward commented Apr 25, 2019 • edited Loading

adcroft commented Apr 26, 2019

marshallward commented Apr 24, 2019 •

edited

Loading

Checksum support for `Depth_list.nc`

`REQUIRE_DEPTH_LIST_CHECKSUM` (default: True)

`UPDATE_DEPTH_LIST_CHECKSUM` (default: False)

marshallward commented Apr 25, 2019 •

edited

Loading