release/public-v2: b4b reproducibility for restart runs #417

climbfuji · 2021-02-11T23:56:01Z

Description

This PR and associated PRs below fix the b4b reproducibility issues for the release/public-v2 branches. With these changes, restart runs are b4b identical to continues runs for regional and global applications with both GFS v15p2 and RRFS v1 alpha physics.

Changes:

add global and regional restart regression tests for both suites (GFS v15p2 and RRFS v1alpha

I expect/hope that this is the last PR for the release/public-v2 branches of the ufs-weather-model and its submodules.

Issue(s) addressed

Fixes #288

Yay! Finally.

Testing

New baselines are required because of the additional fields in the restart files and because of the bugfixes in FV3/io/FV3GFS_io.F90 (no physics changes).

Regression tests passed on all tier-1 platforrms for the UFS SRW App release 1.0: hera.intel, orion.intel, cheyenne.gnu, cheyenne.intel, jet.intel, gaea.intel.

Dependencies

NCAR/ccpp-physics#519
NOAA-EMC/fv3atm#246
#417

…nd testing

climbfuji · 2021-02-16T14:46:14Z

This PR is ready to merge.

tests/fv3_conf/ccpp_regional_run.IN

## DESCRIPTION OF CHANGES: * In set_extrn_mdl_params.sh, add stanzas for the "NAM" external model as well as for the "CHEYENNE" platform. * In run_experiments.sh, add stanzas for "NAM". * WE2E tests: * Add 3 new WE2E tests for the NAM on the RRFS_CONUS_25km grid. These use NAM for both the ICs and LBCs, run 24hr forecasts, and use an LBC update interval of 3 hours. They all run the 2015060212 case, which is currently the only date for which NAM data is available (provided by Bill Gallus and Jonathan Thielen). The difference between the three tests is that they run the FV3_GSD_SAR, FV3_HRRR, and FV3_RRFS_v1beta suites, respectively. * Modify the four WE2E tests that run on the RRFS_CONUS_25km grid with HRRR/RAP ICs/LBCs so that they all use the 2020 Derecho case (2020081000) and go out to 24 hours (with a boundary update every 3 hours). This is because the Derecho case is of more interest to the community than the (randomly chosen) previous case 2020080100. The four tests run using the GSD_SAR, HRRR, RRFS_v1alpha, and RRFS_v1beta suites, respectively. * For comparison purposes between the ICs/LBCs combinations of NAM/NAM, HRRR/RAP, and HRRR/HRRR, add 2 WE2E tests on the RRFS_COUNS_25km grid that use HRRR for both ICs and LBCs (and run the 2020 Derecho case for 24 hrs with a 3-hour boundary update). * Modify script that gets experiments' workflow status so that it ignores all non-experiment directories as well as all "inactive" (i.e. renamed) experiment directories under the specified experiments base directory. * Modify the default value of DT_ATMOS on the RRFS_CONUS_25km grid to be 40sec (due to test results below). ## TESTS CONDUCTED: On Cheyenne, ran the following tests on the RRFS_CONUS_25km grid with both DT_ATMOS=300sec and DT_ATMOS=40sec. The 40sec tests all succeeded (although some with error messages) while some of the 300sec tests succeeded while others failed. Full results are as follows: * grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_GSD_SAR * DT_ATMOS = 300sec: succeeded * DT_ATMOS = 40sec: succeeded * grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_HRRR * DT_ATMOS = 300sec: succeeded * DT_ATMOS = 40sec: succeeded * grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_RRFS_v1beta * DT_ATMOS = 300sec: succeeded * DT_ATMOS = 40sec: succeeded * grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_GSD_SAR * DT_ATMOS = 300sec: succeeded * DT_ATMOS = 40sec: succeeded * grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_HRRR * DT_ATMOS = 300sec: succeeded * DT_ATMOS = 40sec: succeeded * grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta * DT_ATMOS = 300sec: succeeded * DT_ATMOS = 40sec: succeeded * grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GSD_SAR (CDATE = 2015060212) * DT_ATMOS = 300sec: failed * DT_ATMOS = 40sec: succeeded * grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_HRRR (CDATE = 2015060212) * DT_ATMOS = 300sec: failed * DT_ATMOS = 40sec: succeeded * grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_RRFS_v1beta (CDATE = 2015060212) * DT_ATMOS = 300sec: failed * DT_ATMOS = 40sec: succeeded It is not yet clear what causes the failures with the NAM experiments, but since these succeed with DT_ATMOS=40sec, we change the default value of DT_ATMOS to 40 sec for the RRFS_CONUS_25km grid. ## CONTRIBUTORS (optional): Bill Gallus

…nce again. (#417) * Update modulefiles/build_hera_intel and modulefiles/srw_common to allow the SRW to build and run on Hera following update to HPC-stack. * Update modulefiles/build_jet_intel and modulefiles/build_orion_intel so that NetCDF will be loaded before nccmp.

climbfuji added 3 commits February 11, 2021 11:00

Add coldstart and warmstart test for global fv3_ccpp_gfs_v15p2

4b042d8

coldstart and warmstart test for global fv3_ccpp_rrfs_v1alpha

b28d218

Update .gitmodules and submodule pointer for fv3atm for code review a…

fe4cb79

…nd testing

This was referenced Feb 11, 2021

release/public-v2: b4b reproducibility for restart runs NOAA-EMC/fv3atm#246

Merged

release/public-v5: update of scientific documentation NCAR/ccpp-physics#519

Merged

Add regional restart tests and update regression test baseline date tag

70222c8

climbfuji changed the title ~~WORK IN PROGRESS release/public-v2: b4b reproducibility for restart runs~~ release/public-v2: b4b reproducibility for restart runs Feb 12, 2021

Regression test log for orion.intel

45afb3e

climbfuji marked this pull request as ready for review February 12, 2021 23:54

climbfuji requested review from junwang-noaa, DusanJovic-NOAA, jwolff-ncar and mkavulich February 12, 2021 23:54

climbfuji added 4 commits February 15, 2021 08:29

Regression test logs for hera.intel, cheyenne.intel, cheyenne.gnu

0ddae3e

Update submodule pointer for fv3atm

c6f95d7

Regression test log for jet.intel

aea570b

Regression test log for gaea.intel

e346a32

climbfuji mentioned this pull request Feb 16, 2021

Issues with model runs / regression tests for release/public-v2 branch #288

Closed

Revert change to .gitmodules and update submodule pointer for fv3atm

2c073e0

DusanJovic-NOAA approved these changes Feb 16, 2021

View reviewed changes

junwang-noaa reviewed Feb 16, 2021

View reviewed changes

tests/fv3_conf/ccpp_regional_run.IN Show resolved Hide resolved

junwang-noaa approved these changes Feb 16, 2021

View reviewed changes

junwang-noaa merged commit 71b9974 into ufs-community:release/public-v2 Feb 16, 2021

This was referenced Feb 16, 2021

NoahMP restart runs likely not b4b identical NCAR/ccpp-physics#367

Closed

Restarting model changes final result NOAA-EMC/fv3atm#42

Closed

jwolff-ncar mentioned this pull request Feb 17, 2021

Regional restart run is creating different results. #408

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release/public-v2: b4b reproducibility for restart runs #417

release/public-v2: b4b reproducibility for restart runs #417

climbfuji commented Feb 11, 2021 •

edited

Loading

climbfuji commented Feb 16, 2021

release/public-v2: b4b reproducibility for restart runs #417

release/public-v2: b4b reproducibility for restart runs #417

Conversation

climbfuji commented Feb 11, 2021 • edited Loading

Description

Issue(s) addressed

Testing

Dependencies

climbfuji commented Feb 16, 2021

climbfuji commented Feb 11, 2021 •

edited

Loading