Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release/public-v2: b4b reproducibility for restart runs #417

Conversation

climbfuji
Copy link
Collaborator

@climbfuji climbfuji commented Feb 11, 2021

Description

This PR and associated PRs below fix the b4b reproducibility issues for the release/public-v2 branches. With these changes, restart runs are b4b identical to continues runs for regional and global applications with both GFS v15p2 and RRFS v1 alpha physics.

Changes:

  • add global and regional restart regression tests for both suites (GFS v15p2 and RRFS v1alpha

I expect/hope that this is the last PR for the release/public-v2 branches of the ufs-weather-model and its submodules.

Issue(s) addressed

Fixes #288

Yay! Finally.

Testing

New baselines are required because of the additional fields in the restart files and because of the bugfixes in FV3/io/FV3GFS_io.F90 (no physics changes).

Regression tests passed on all tier-1 platforrms for the UFS SRW App release 1.0: hera.intel, orion.intel, cheyenne.gnu, cheyenne.intel, jet.intel, gaea.intel.

Dependencies

NCAR/ccpp-physics#519
NOAA-EMC/fv3atm#246
#417

@climbfuji climbfuji changed the title WORK IN PROGRESS release/public-v2: b4b reproducibility for restart runs release/public-v2: b4b reproducibility for restart runs Feb 12, 2021
@climbfuji climbfuji marked this pull request as ready for review February 12, 2021 23:54
@climbfuji
Copy link
Collaborator Author

This PR is ready to merge.

@junwang-noaa junwang-noaa merged commit 71b9974 into ufs-community:release/public-v2 Feb 16, 2021
epic-cicd-jenkins pushed a commit that referenced this pull request Apr 17, 2023
## DESCRIPTION OF CHANGES:
* In set_extrn_mdl_params.sh, add stanzas for the "NAM" external model as well as for the "CHEYENNE" platform.
* In run_experiments.sh, add stanzas for "NAM".
* WE2E tests:
  * Add 3 new WE2E tests for the NAM on the RRFS_CONUS_25km grid.  These use NAM for both the ICs and LBCs, run 24hr forecasts, and use an LBC update interval of 3 hours.  They all run the 2015060212 case, which is currently the only date for which NAM data is available (provided by Bill Gallus and Jonathan Thielen).  The difference between the three tests is that they run the FV3_GSD_SAR, FV3_HRRR, and FV3_RRFS_v1beta suites, respectively.
  * Modify the four WE2E tests that run on the RRFS_CONUS_25km grid with HRRR/RAP ICs/LBCs so that they all use the 2020 Derecho case (2020081000) and go out to 24 hours (with a boundary update every 3 hours).  This is because the Derecho case is of more interest to the community than the (randomly chosen) previous case 2020080100.  The four tests run using the GSD_SAR, HRRR, RRFS_v1alpha, and RRFS_v1beta suites, respectively.  
  * For comparison purposes between the ICs/LBCs combinations of NAM/NAM, HRRR/RAP, and HRRR/HRRR, add 2 WE2E tests on the RRFS_COUNS_25km grid that use HRRR for both ICs and LBCs (and run the 2020 Derecho case for 24 hrs with a 3-hour boundary update).
  * Modify script that gets experiments' workflow status so that it ignores all non-experiment directories as well as all "inactive" (i.e. renamed) experiment directories under the specified experiments base directory.
* Modify the default value of DT_ATMOS on the RRFS_CONUS_25km grid to be 40sec (due to test results below).

## TESTS CONDUCTED:
On Cheyenne, ran the following tests on the RRFS_CONUS_25km grid with both DT_ATMOS=300sec and DT_ATMOS=40sec.  The 40sec tests all succeeded (although some with error messages) while some of the 300sec tests succeeded while others failed.  Full results are as follows:

* grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_GSD_SAR
  * DT_ATMOS = 300sec:  succeeded
  * DT_ATMOS = 40sec:   succeeded
* grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_HRRR
  * DT_ATMOS = 300sec:  succeeded
  * DT_ATMOS = 40sec:   succeeded
* grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_RRFS_v1beta
  * DT_ATMOS = 300sec:  succeeded
  * DT_ATMOS = 40sec:   succeeded
* grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_GSD_SAR
  * DT_ATMOS = 300sec:  succeeded
  * DT_ATMOS = 40sec:   succeeded
* grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_HRRR
  * DT_ATMOS = 300sec:  succeeded
  * DT_ATMOS = 40sec:   succeeded
* grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
  * DT_ATMOS = 300sec:  succeeded
  * DT_ATMOS = 40sec:   succeeded
* grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GSD_SAR (CDATE = 2015060212)
  * DT_ATMOS = 300sec:  failed
  * DT_ATMOS = 40sec:   succeeded
* grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_HRRR (CDATE = 2015060212)
  * DT_ATMOS = 300sec:  failed
  * DT_ATMOS = 40sec:   succeeded
* grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_RRFS_v1beta (CDATE = 2015060212)
  * DT_ATMOS = 300sec:  failed
  * DT_ATMOS = 40sec:   succeeded

It is not yet clear what causes the failures with the NAM experiments, but since these succeed with DT_ATMOS=40sec, we change the default value of DT_ATMOS to 40 sec for the RRFS_CONUS_25km grid.


## CONTRIBUTORS (optional):
Bill Gallus
epic-cicd-jenkins pushed a commit that referenced this pull request Apr 17, 2023
…nce again. (#417)

* Update modulefiles/build_hera_intel and modulefiles/srw_common to allow the SRW to build and run on Hera following update to HPC-stack.

* Update modulefiles/build_jet_intel and modulefiles/build_orion_intel so that NetCDF will be loaded before nccmp.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants