Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated FV3 with RRTMGP improvements. #178

Merged
merged 24 commits into from
Aug 8, 2020
Merged

Updated FV3 with RRTMGP improvements. #178

merged 24 commits into from
Aug 8, 2020

Conversation

dustinswales
Copy link
Collaborator

@dustinswales dustinswales commented Jul 29, 2020

Description

Included in this PR is the added functionality to use the GFS suite definition files (SDFs) with the RRTMGP radiation scheme. v15p2 and v16beta.
Additional changes are described in the ccpp-physics PR (link below)

Issue(s) addressed

N/A

Testing

Regression tests passed on Hera using Intel.

Dependencies

Copy link
Collaborator

@climbfuji climbfuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the regression test files - almost there!

@@ -0,0 +1,85 @@
###############################################################################
#
# FV3 CCPP GFS v15.2 w/ RRTMGP compiled with 32-bit dynamics test
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is really picky, but the dynamics are not compiled in 32bit for your tests (see COMPILE lines in rt.conf, they don't have 32BIT=Y). Best to simply remove "compiled with 32-bit dynamics"

#
###############################################################################

export TEST_DESCR="Compare FV3 32bit CCPP GFS v15.2 w/ RRTMGP results with previous trunk version"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, simply remove "32bit"

@@ -0,0 +1,85 @@
###############################################################################
#
# FV3 CCPP GFS v15.2 w/ RRTMGP compiled with 32-bit dynamics test in DEBUG mode
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above

#
###############################################################################

export TEST_DESCR="Run FV3 32bit CCPP GFS v15.2 w/ RRTMGP in DEBUG mode"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above

@@ -0,0 +1,85 @@
###############################################################################
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this file should be tests/tests/fv3_ccpp_gfs_v15p2_RRTMGP_debug and not tests/tests/fv3_ccpp_gfs_v15p2_debug_RRTMGP?

#
###############################################################################

export TEST_DESCR="Compare FV3 32bit CCPP GFS v16beta w/ RRTMGP results with previous trunk version"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

32bit

@@ -0,0 +1,85 @@
###############################################################################
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this file should be tests/tests/fv3_ccpp_gfs_v16beta_RRTMGP_debug and not tests/tests/fv3_ccpp_gfs_v16beta_debug_RRTMGP?

@@ -0,0 +1,85 @@
###############################################################################
#
# FV3 CCPP GFS v16beta w/ RRTMGP compiled with 32-bit dynamics test in DEBUG mode
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

32bit

#
###############################################################################

export TEST_DESCR="Run FV3 32bit CCPP GFS v16beta w/ RRTMGP in DEBUG mode"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

32bit


export TEST_DESCR="Run FV3 32bit CCPP GFS v16beta w/ RRTMGP in DEBUG mode"

export CNTL_DIR=fv3_gfs_v16beta_debug_RRTMGP
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_debug comes last

#
###############################################################################

export TEST_DESCR="Run FV3 32bit CCPP GFS v16beta w/ RRTMGP in DEBUG mode"
export TEST_DESCR="Run FV3 CCPP GFS v16beta w/ RRTMGP in DEBUG mode"

export CNTL_DIR=fv3_gfs_v16beta_debug_RRTMGP
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last ones: change export CNTL_DIR=fv3_gfs_v16beta_debug_RRTMGP to export CNTL_DIR=fv3_gfs_v16beta_RRTMGP_debug (and similar for GFS v15p2 debug) so that the run directories are consistent with the test names, please - avoids confusion.

Copy link
Collaborator

@climbfuji climbfuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me, thanks for accommodating all my requests for changes. I'll get started with testing this, approval will follow after the current commit is in and this PR is updated.

@climbfuji
Copy link
Collaborator

climbfuji commented Aug 5, 2020

I checked out the code using

git clone -b develop --recursive https://github.com/dustinswales/ufs-weather-model

On hera.intel, I get failures for the following regression tests:

fv3_ccpp_wrtGlatlon_netcdf 008 failed in check_result
fv3_ccpp_wrtGauss_netcdf_esmf 006 failed in check_result
fv3_ccpp_control 001 failed in check_result
fv3_ccpp_lheatstrg 014 failed in check_result
fv3_ccpp_wrtGauss_nemsio 009 failed in check_result
fv3_ccpp_satmedmf 033 failed in check_result
fv3_ccpp_2threads 003 failed in check_result
fv3_ccpp_satmedmfq 034 failed in check_result
fv3_ccpp_wrtGauss_netcdf 007 failed in check_result
fv3_ccpp_control_32bit 019 failed in check_result
fv3_ccpp_stochy 011 failed in check_result
fv3_ccpp_decomp 002 failed in check_result
fv3_ccpp_restart 004 failed in check_result
fv3_ccpp_iau 012 failed in check_result
fv3_ccpp_read_inc 005 failed in check_result
fv3_ccpp_wrtGauss_nemsio_c192 010 failed in check_result
fv3_ccpp_wrtGauss_nemsio_c768 017 failed in check_result

I checked two of them, both fail for the same reason (this is from the err output file):

  0:  in fcst comp init, ntasks=         144
  0:  num_atmos_calls=          72 time_init=        2016          10           3
  0:            0           0           0 time_atmos=        2016          10
  0:            3           0           0           0 time_end=        2016
  0:           10           3          12           0           0 dt_atmos=
  0:          600 Run_length=       43200
  0:  INPUT source not found  F  set source=No Source Attribute
118: forrtl: severe (19): invalid reference to variable in NAMELIST input, unit -5, file Internal Formatted NML Read, line -1, position 26
118: Image              PC                Routine            Line        Source
118: fv3.exe            000000000304F142  Unknown               Unknown  Unknown
118: fv3.exe            0000000003083B3F  Unknown               Unknown  Unknown
118: fv3.exe            0000000003081A6B  Unknown               Unknown  Unknown
118: fv3.exe            00000000025E9437  Unknown               Unknown  Unknown
118: fv3.exe            00000000025055C3  Unknown               Unknown  Unknown
118: fv3.exe            00000000023CACE5  Unknown               Unknown  Unknown
118: fv3.exe            0000000001937D61  Unknown               Unknown  Unknown
118: fv3.exe            00000000018FCBEB  Unknown               Unknown  Unknown
118: fv3.exe            0000000000797CA1  _ZN5ESMCI6FTable1        2010  ESMCI_FTable.C
118: fv3.exe            000000000079B886  ESMCI_FTableCallE         746  ESMCI_FTable.C
118: fv3.exe            0000000000AA15AA  _ZN5ESMCI2VM5ente        1178  ESMCI_VM.C
118: fv3.exe            00000000007992D7  c_esmc_ftablecall         898  ESMCI_FTable.C
118: fv3.exe            0000000000634041  esmf_compmod_mp_e        1209  ESMF_Comp.F90
118: fv3.exe            000000000059C644  esmf_gridcompmod_        1405  ESMF_GridComp.F90
118: fv3.exe            00000000018F2E66  Unknown               Unknown  Unknown
118: fv3.exe            0000000000797CA1  _ZN5ESMCI6FTable1        2010  ESMCI_FTable.C
118: fv3.exe            000000000079B886  ESMCI_FTableCallE         746  ESMCI_FTable.C
118: fv3.exe            0000000000AA15AA  _ZN5ESMCI2VM5ente        1178  ESMCI_VM.C
118: fv3.exe            00000000007992D7  c_esmc_ftablecall         898  ESMCI_FTable.C
118: fv3.exe            0000000000634041  esmf_compmod_mp_e        1209  ESMF_Comp.F90
118: fv3.exe            000000000059C644  esmf_gridcompmod_        1405  ESMF_GridComp.F90
118: fv3.exe            0000000000AD652D  nuopc_driver_mp_l        2201  NUOPC_Driver.F90
118: fv3.exe            0000000000AF45BE  nuopc_driver_mp_i        1078  NUOPC_Driver.F90
118: fv3.exe            0000000000AFBC22  nuopc_driver_mp_i         383  NUOPC_Driver.F90
118: fv3.exe            0000000000797CA1  _ZN5ESMCI6FTable1        2010  ESMCI_FTable.C
118: fv3.exe            000000000079B886  ESMCI_FTableCallE         746  ESMCI_FTable.C
118: fv3.exe            0000000000AA15AA  _ZN5ESMCI2VM5ente        1178  ESMCI_VM.C
118: fv3.exe            00000000007992D7  c_esmc_ftablecall         898  ESMCI_FTable.C
118: fv3.exe            0000000000634041  esmf_compmod_mp_e        1209  ESMF_Comp.F90
118: fv3.exe            000000000059C644  esmf_gridcompmod_        1405  ESMF_GridComp.F90
118: fv3.exe            00000000004293E7  Unknown               Unknown  Unknown
118: fv3.exe            0000000000797CA1  _ZN5ESMCI6FTable1        2010  ESMCI_FTable.C
118: fv3.exe            000000000079B886  ESMCI_FTableCallE         746  ESMCI_FTable.C
118: fv3.exe            0000000000AA15AA  _ZN5ESMCI2VM5ente        1178  ESMCI_VM.C

Note: when I grep for failed tests using

grep -le FAIL log_hera.intel/*

I get more failures:

log_hera.intel/rt_001_fv3_ccpp_control_prod.log
log_hera.intel/rt_002_fv3_ccpp_decomp_prod.log
log_hera.intel/rt_003_fv3_ccpp_2threads_prod.log
log_hera.intel/rt_004_fv3_ccpp_restart_prod.log
log_hera.intel/rt_005_fv3_ccpp_read_inc_prod.log
log_hera.intel/rt_006_fv3_ccpp_wrtGauss_netcdf_esmf_prod.log
log_hera.intel/rt_007_fv3_ccpp_wrtGauss_netcdf_prod.log
log_hera.intel/rt_008_fv3_ccpp_wrtGlatlon_netcdf_prod.log
log_hera.intel/rt_009_fv3_ccpp_wrtGauss_nemsio_prod.log
log_hera.intel/rt_010_fv3_ccpp_wrtGauss_nemsio_c192_prod.log
log_hera.intel/rt_011_fv3_ccpp_stochy_prod.log
log_hera.intel/rt_012_fv3_ccpp_iau_prod.log
log_hera.intel/rt_014_fv3_ccpp_lheatstrg_prod.log
log_hera.intel/rt_017_fv3_ccpp_wrtGauss_nemsio_c768_prod.log
log_hera.intel/rt_019_fv3_ccpp_control_32bit_prod.log
log_hera.intel/rt_033_fv3_ccpp_satmedmf_prod.log
log_hera.intel/rt_034_fv3_ccpp_satmedmfq_prod.log
log_hera.intel/run_001_fv3_ccpp_control_prod.log
log_hera.intel/run_002_fv3_ccpp_decomp_prod.log
log_hera.intel/run_003_fv3_ccpp_2threads_prod.log
log_hera.intel/run_004_fv3_ccpp_restart_prod.log
log_hera.intel/run_005_fv3_ccpp_read_inc_prod.log
log_hera.intel/run_006_fv3_ccpp_wrtGauss_netcdf_esmf_prod.log
log_hera.intel/run_007_fv3_ccpp_wrtGauss_netcdf_prod.log
log_hera.intel/run_008_fv3_ccpp_wrtGlatlon_netcdf_prod.log
log_hera.intel/run_009_fv3_ccpp_wrtGauss_nemsio_prod.log
log_hera.intel/run_010_fv3_ccpp_wrtGauss_nemsio_c192_prod.log
log_hera.intel/run_011_fv3_ccpp_stochy_prod.log
log_hera.intel/run_012_fv3_ccpp_iau_prod.log
log_hera.intel/run_014_fv3_ccpp_lheatstrg_prod.log
log_hera.intel/run_017_fv3_ccpp_wrtGauss_nemsio_c768_prod.log
log_hera.intel/run_019_fv3_ccpp_control_32bit_prod.log
log_hera.intel/run_026_fv3_ccpp_control_debug_prod.log
log_hera.intel/run_033_fv3_ccpp_satmedmf_prod.log
log_hera.intel/run_034_fv3_ccpp_satmedmfq_prod.log

@DusanJovic-NOAA @junwang-noaa Important. Using rt_gnu.conf I found that errors are not reported, and jobs fail silently. The same thing happens for Intel, where some of the failed tests are simply not reported in fail_test!

...
+ echo REGRESSION TEST WAS SUCCESSFUL
...

but:

[Dom.Heinzeller@hfe08 tests]$ grep -e FAIL log_hera.gnu/*
log_hera.gnu/run_011_fv3_ccpp_control_debug_prod.log:10208723                   FAILED        rt_114896_011
log_hera.gnu/run_011_fv3_ccpp_control_debug_prod.log:10208723.ba+               FAILED                batch
log_hera.gnu/run_011_fv3_ccpp_control_debug_prod.log:10208723.0                 FAILED              fv3.exe
log_hera.gnu/run_011_fv3_ccpp_control_debug_prod.log:3 min. TEST 011 fv3_ccpp_control_debug is FAILED,  status: - jobid 10208723

@climbfuji
Copy link
Collaborator

Here is the error with traceback:

  0:  INPUT source not found  F  set source=No Source Attribute
 14: forrtl: severe (19): invalid reference to variable in NAMELIST input, unit -5, file Internal Formatted NML Read, line -1, position 26
 14: Image              PC                Routine            Line        Source
 14: fv3.exe            00000000089CC292  Unknown               Unknown  Unknown
 14: fv3.exe            0000000008A00C8F  Unknown               Unknown  Unknown
 14: fv3.exe            00000000089FEBBB  Unknown               Unknown  Unknown
 14: fv3.exe            0000000006B575EF  gfs_typedefs_mp_c        3449  GFS_typedefs.F90
 14: fv3.exe            0000000006911F27  gfs_driver_mp_gfs         187  GFS_driver.F90
 14: fv3.exe            00000000065B216E  ipd_driver_mp_ipd          57  IPD_driver.F90
 14: fv3.exe            0000000001AB7BC0  atmos_model_mod_m         646  atmos_model.F90
 14: fv3.exe            0000000001A7C1A5  module_fcst_grid_         380  module_fcst_grid_comp.F90
 14: fv3.exe            0000000000656A89  Unknown               Unknown  Unknown
 14: fv3.exe            000000000065A65B  Unknown               Unknown  Unknown
 14: fv3.exe            00000000009433F5  Unknown               Unknown  Unknown
 14: fv3.exe            00000000006580EA  Unknown               Unknown  Unknown
 14: fv3.exe            0000000000F3C261  Unknown               Unknown  Unknown
 14: fv3.exe            0000000000D3E27F  Unknown               Unknown  Unknown
 14: fv3.exe            0000000001A535CA  fv3gfs_cap_mod_mp         576  fv3_cap.F90
 14: fv3.exe            0000000000656A89  Unknown               Unknown  Unknown
 14: fv3.exe            000000000065A65B  Unknown               Unknown  Unknown
 14: fv3.exe            00000000009433F5  Unknown               Unknown  Unknown

and the offendling line 3449 from GFS_typedefs.F90:

    read(Model%input_nml_file, nml=gfs_physics_nml)

Thus, something in the gfs_physics_nml section.

@climbfuji
Copy link
Collaborator

I removed all the old RRTMGP stuff from parm/ccpp_control.nml.IN, that file had a few old RRTMGP namelist parameters that no longer exist. I guess this will do the trick; if it works, I'll send @dustinswales a PR to update his ufs-weather-model PR (and add/update the tests in tests/rt_gnu.conf and tests/rt_orion.conf)

@dustinswales
Copy link
Collaborator Author

I removed all the old RRTMGP stuff from parm/ccpp_control.nml.IN, that file had a few old RRTMGP namelist parameters that no longer exist. I guess this will do the trick; if it works, I'll send @dustinswales a PR to update his ufs-weather-model PR (and add/update the tests in tests/rt_gnu.conf and tests/rt_orion.conf)

@climbfuji
Yes there were some namelist changes. I wasn't sure if I set up the testing stuff correctly to pull in the correct nml. My bad

@climbfuji
Copy link
Collaborator

climbfuji commented Aug 6, 2020

I got the above tests fixed. The major problem of rt.sh not reporting failures correctly still exists, however.

@DusanJovic-NOAA @junwang-noaa Please have a look at /work/noaa/gmtb/dheinzel/ufs-weather-model/ufs-weather-model-dustin-rrtmgp-gfdlmp-20200805/tests on orion or /scratch1/BMC/gmtb/Dom.Heinzeller/ufs-weather-model/ufs-weather-model-dustin-rrtmgp-gfdlmp-20200805/{intel,gnu}/tests on hera. The new RRTMGP tests fail, but silently (the output rt.sh is REGRESSION TEST WAS SUCCESSFUL).

I assume the reason they fail is because the failure occurs in the run scripts that set up the regression test directories and submit the job. For example, on orion: cat log_orion.intel/run_039_fv3_ccpp_gfs_v15p2_RRTMGP_prod.log

...
+ cp /work/noaa/gmtb/dheinzel/ufs-weather-model/ufs-weather-model-dustin-rrtmgp-gfdlmp-20200805/parm/params_grib2_tbl_new params_grib2_tbl_new
+ SRCD=/work/noaa/gmtb/dheinzel/ufs-weather-model/ufs-weather-model-dustin-rrtmgp-gfdlmp-20200805
+ RUND=/work/noaa/stmp/dheinzel/stmp/dheinzel/FV3_RT/rt_21291/fv3_ccpp_gfs_v15p2_RRTMGP_prod
+ atparse
/work/noaa/gmtb/dheinzel/ufs-weather-model/ufs-weather-model-dustin-rrtmgp-gfdlmp-20200805/tests/run_test.sh: line 86: /work/noaa/gmtb/dheinzel/ufs-weather-model/ufs-weather-model-dustin-rrtmgp-gfdlmp-20200805/tests/fv3_conf/ccpp_gfs_v15_rrtmgp_run.IN: No such file or directory
+ '[' 1 -eq 0 ']'
+ write_fail_test
+ [[ false == true ]]

Of course, @dustinswales needs to fix the actual error by creating tests/fv3_conf/ccpp_gfs_v15_rrtmgp_run.IN and tests/fv3_conf/ccpp_gfs_v16_rrtmgp_run.IN, but rt.sh should have reported this error.

@dustinswales
Copy link
Collaborator Author

@climbfuji
I don't know what you did, but it did the trick!

@climbfuji
Copy link
Collaborator

Now it looks like everything is up to date. I think we should get started by running the tests on hera.intel and hera.gnu against the existing baseline (to make sure all non-RRTMGP tests still pass). Should I do that?

@dustinswales
Copy link
Collaborator Author

Now it looks like everything is up to date. I think we should get started by running the tests on hera.intel and hera.gnu against the existing baseline (to make sure all non-RRTMGP tests still pass). Should I do that?

Yes

@climbfuji
Copy link
Collaborator

They are running now on hera for intel and gnu.

@climbfuji
Copy link
Collaborator

climbfuji commented Aug 6, 2020

Regression testing on hera.intel and hera.gnu against existing baselines. All non-RRTMGP tests pass, all RRTMGP tests fail because of missing baselines (but they all run to completion).

rt_hera_gnu.log
rt_hera_gnu_against_existing_baseline_fail_test.log
rt_hera_intel.log
rt_hera_intel_against_existing_baseline_fail_test.log

Creating new baselines now on hera.gnu, hera.intel, orion.intel.

@climbfuji
Copy link
Collaborator

Created new baselines on orion.intel, hera.intel, hera.gnu; all tests passed.

rt_orion_intel_create.log
rt_hera_intel_create.log
rt_hera_gnu_create.log

@climbfuji
Copy link
Collaborator

Regression tests passed on all tier-1 platforms. Sent a PR to @dustinswales to update his branch used for this PR: https://github.com/dustinswales/ufs-weather-model/pull/6

@SMoorthi-emc
Copy link
Contributor

SMoorthi-emc commented Aug 7, 2020 via email

@DusanJovic-NOAA DusanJovic-NOAA merged commit 7e29c33 into ufs-community:develop Aug 8, 2020
pjpegion pushed a commit to NOAA-PSL/ufs-weather-model.p7b that referenced this pull request Jul 20, 2021
…ry update in dycore (ufs-community#178)

* contributions from @SMoorthi-emc to fix the global restart reproducibility and to keep compiling without CCPP
* updates the submodule pointers for GFDL_atmos_cubed_sphere and ccpp-physics
* bugfix in ccpp/CMakeLists.txt to correctly set AVX2 flags or not (discovered by Yunheng)
* changes mod_name of non-phys tendencies in GFS_diagnostics.F90 to gfs_dyn from gfs_phys (from @grantfirl)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants