Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coupled run using benchmark at C384 crashed when frac_grid=T #268

Closed
ShanSunNOAA opened this issue Nov 10, 2020 · 30 comments
Closed

Coupled run using benchmark at C384 crashed when frac_grid=T #268

ShanSunNOAA opened this issue Nov 10, 2020 · 30 comments
Assignees
Labels
bug Something isn't working

Comments

@ShanSunNOAA
Copy link
Collaborator

Description

The coupled model runs well in the benchmark case at C384 with frac_grid=F & frac_grid_input=T. It crashed with frac_grid=T, on line 778 in module_gfdl_cloud_microphys.F90 where dz (i, j, k) becomes zero somewhere:
776 dz0 (k) = dz (i, j, k)
777
778 den0 (k) = - dp1 (k) / (grav * dz0 (k)) ! density of dry air

To Reproduce:

This can be reproduced in https://github.com/shansun6/ufs-weather-model, -b frac_bm_20201108. To run it, do
rt.sh -l rt.conf_bmark

Hera keeps frozen today, so I don't have output yet. When it is back to normal, I will add the output dir here.

@ShanSunNOAA ShanSunNOAA added the bug Something isn't working label Nov 10, 2020
@yangfanglin
Copy link
Collaborator

Shan, Can you repeat this run in debug mode to get more information ? How soon the model crashed ?

@ShanSunNOAA
Copy link
Collaborator Author

ShanSunNOAA commented Nov 10, 2020 via email

@ShanSunNOAA
Copy link
Collaborator Author

I made one modification today: slmsk=floor(landfrac) when frac_grid=T, to be consistent with ICs.

However, the model still crashed at the same place(Line 778 of module_gfdl_cloud_microphys.F90) during the 1st time step. The error using debug=Y is at /scratch2/BMC/gsd-fv3-dev/Shan.Sun/FV3_RT/rt_94187/cpld_bmark_frac_prod/ on hera.

@junwang-noaa
Copy link
Collaborator

Shan, the dz is computed from interface pressure phii in module_gfdl_cloud_microphys.F90. The phii is updated in get_phi_fv3, and it should not have same value at two consecutive layers unless the tmp (gt0) or there are two levels with same pressure in model physics state. I'd suggest to find the (i,j) location of dz=0 in dz(i,k) = (phii(i,kk)-phii(i,kk+1))*onebg in module_gfdl_cloud_microphys.F90, then check it in get_phi_fv3_run to see where tmp becomes 0.

@ShanSunNOAA
Copy link
Collaborator Author

Jun, thanks for your suggestion. I inserted a print statement after dz(i,k) = (phii(i,kk)-phii(i,kk+1))onebg in gfdl_cloud_microphys.F90:
if (abs(dz(i,k))<1.e-12) write(
,'(a,2i4,a,2es10.2,2(a,es10.2))') 'warning1 dz=0 at i,k=',i,k,' phii=',phii(i,kk),phii(i,kk+1),' dz=',dz(i,k)

Here is the output: phii went bad at many points and different k:

109: warning1 dz=0 at i,k= 8 1 phii= 2.55E+71 2.55E+71 dz= 0.00E+00
109: warning1 dz=0 at i,k= 10 1 phii= 1.16E+73 1.16E+73 dz= 0.00E+00
109: warning1 dz=0 at i,k= 14 1 phii= 1.98E+72 1.98E+72 dz= 0.00E+00
109: warning1 dz=0 at i,k= 15 1 phii= 3.17E+72 3.17E+72 dz= 0.00E+00
109: warning1 dz=0 at i,k= 8 2 phii= 2.55E+71 2.55E+71 dz= 0.00E+00
109: warning1 dz=0 at i,k= 10 2 phii= 1.16E+73 1.16E+73 dz= 0.00E+00
109: warning1 dz=0 at i,k= 14 2 phii= 1.98E+72 1.98E+72 dz= 0.00E+00
109: warning1 dz=0 at i,k= 15 2 phii= 3.17E+72 3.17E+72 dz= 0.00E+00
109: warning1 dz=0 at i,k= 8 3 phii= 2.55E+71 2.55E+71 dz= 0.00E+00
109: warning1 dz=0 at i,k= 10 3 phii= 1.16E+73 1.16E+73 dz= 0.00E+00
. . .

see /scratch2/BMC/gsd-fv3-dev/Shan.Sun/FV3_RT/rt_73625/cpld_bmark_frac_prod/out

Also the error in "err" has switched to a different routine (it is no longer Line 778 of module_gfdl_cloud_microphys.F90):

139: forrtl: error (72): floating overflow
139: Image PC Routine Line Source
139: fv3.exe 000000000D25E6BF Unknown Unknown Unknown
139: libpthread-2.17.s 00002B7DB5698630 Unknown Unknown Unknown
139: fv3.exe 0000000005EE862F samfshalcnv_mp_sa 776 samfshalcnv.f

where Line 776 in samfshalcnv.f is the calculation of eta:
774 dz = zi(i,k) - zi(i,k-1)
775 ptem = 0.5*(xlamue(i,k)+xlamue(i,k-1))-xlamud(i)
776 eta(i,k) = eta(i,k-1) * (1 + ptem * dz)

Any suggestions where to go from here? Thanks,

@SMoorthi-emc
Copy link
Contributor

SMoorthi-emc commented Nov 11, 2020 via email

@ShanSunNOAA
Copy link
Collaborator Author

Thank you, Moorthi, for your info. Should I try your ccpp-physics branch SM/SM_Oct102020, or is there one routine that I can cherry pick? I just want to make the "frac+gfdl" run to complete first. Please advice. Thanks!

@SMoorthi-emc
Copy link
Contributor

SMoorthi-emc commented Nov 12, 2020 via email

@SMoorthi-emc
Copy link
Contributor

SMoorthi-emc commented Nov 12, 2020 via email

@junwang-noaa
Copy link
Collaborator

junwang-noaa commented Nov 12, 2020 via email

@SMoorthi-emc
Copy link
Contributor

SMoorthi-emc commented Nov 12, 2020 via email

@ShanSunNOAA
Copy link
Collaborator Author

ShanSunNOAA commented Nov 12, 2020 via email

@ShanSunNOAA
Copy link
Collaborator Author

It appears that the coupled model benchmark case can run successfully with the combination of "frac_grid=T and gfdl MP", with a minimum change of GFS_surface_composites.F90 & GFS_surface_composites.meta from Moorthi's ccpp-physics branch of SM_Oct102020!

Moorthi, thank you so much for your help! If you don't have plan to create a PR just for these two routines, may I do one for you, and what comments do you want to go with this PR? Thanks again.

@SMoorthi-emc
Copy link
Contributor

SMoorthi-emc commented Nov 12, 2020 via email

@ShanSunNOAA
Copy link
Collaborator Author

ShanSunNOAA commented Nov 12, 2020 via email

@ShanSunNOAA
Copy link
Collaborator Author

I ran GFS_surface_composites.F90/meta & sfc_sice.f/meta from SMoorthi-emc/ccpp-physics with the latest develop of ufs-weather-model in the coupled model set up by Denise, and the restart failed at a few lake points on tile3 and 1 lake point on tile2. Most of these lake points are covered with ice to begin with, and no ice left by the time of restart (hr12). I use slmks=floor(landfrac) consistently in the ICs and in FV3, thus these lake points have slmsk of either 0 or 2. However, it still cannot restart reproducibly. Will keep looking. Thanks.

@SMoorthi-emc
Copy link
Contributor

SMoorthi-emc commented Nov 12, 2020 via email

@ShanSunNOAA
Copy link
Collaborator Author

Moorthi, thanks for your info. Let me see if I understand you correctly.

(1) I checked out your branch SM_Oct102020 of ccpp-physics. It has 6 files modified since Dom's commit of f3e6761 on Oct. 9:

physics/GFS_surface_composites.F90
physics/GFS_surface_composites.meta
physics/micro_mg3_0.F90
physics/sfc_sice.f
physics/sfc_sice.meta
physics/GFS_surface_generic.F90

(2) I checked out your branch SM_Oct102020 of FV3. Changes in FV3GFS_io.F90 & GFS_typedefs.F90 seem unrelated to restart, and the rest are IPD related, since Dom's commit on Oct. 9. So I skipped this.

(3) I used these 6 files from (1) above to run with the develop of ufs-weather-model in the coupled mode, it won't reproduce after restart, and the difference remains to be on the icy lake points.

Any suggestions? Thanks,
Shan

@SMoorthi-emc
Copy link
Contributor

SMoorthi-emc commented Nov 13, 2020 via email

@ShanSunNOAA
Copy link
Collaborator Author

Thank Moorthi, for fixing the bug of setting lake ice to zero in atmos_model.F90.
I still suspect this restart issue with fractional grid and CCPP has something to do with slmsk, as most of these failed points (over icy lake points) now failed in the nonfrac case before slmsk was updated as showed by Denise, except now we have 1 failed lake point on tile 2 which has no ice to begin with. But the chase after slmsk went nowhere. Need to find some new clues. Thanks,
Shan

@SMoorthi-emc
Copy link
Contributor

SMoorthi-emc commented Nov 13, 2020 via email

@ShanSunNOAA
Copy link
Collaborator Author

ShanSunNOAA commented Nov 14, 2020 via email

@SMoorthi-emc
Copy link
Contributor

SMoorthi-emc commented Nov 14, 2020 via email

@DeniseWorthen
Copy link
Collaborator

Shouldn't we try to figure out why the tsfco temperatures are below freezing?

@ShanSunNOAA
Copy link
Collaborator Author

Denise, good point. These below-freezing water temperature occurred over lake points that started without ice. Without a lake model, no new lake ice can form at lake points with 100% open water, since sfc_sice.f will skip points without ice. Maybe gcycle can introduce ice at these cold lake points. How about setting water temperature not-below-freezing only at the initial time, and not at restart, to guaranteer restart reproducibility?

@junwang-noaa
Copy link
Collaborator

junwang-noaa commented Nov 15, 2020 via email

@SMoorthi-emc
Copy link
Contributor

SMoorthi-emc commented Nov 15, 2020 via email

@ShanSunNOAA
Copy link
Collaborator Author

ShanSunNOAA commented Nov 15, 2020 via email

@SMoorthi-emc
Copy link
Contributor

SMoorthi-emc commented Nov 15, 2020 via email

ShanSunNOAA added a commit to ShanSunNOAA/ufs-weather-model that referenced this issue Nov 19, 2020
…was fixed by Moorthi's modifications in ccpp/physics;

-- Restart in the coupled mode with the default physics is reproducible, when
   (1) bad water temperature is only filtered at the initial time;
   (2) stop initializing fice to zero everywhere in the FV3 cap in order to keep lake ice untouched;

    These address Issues ufs-community#268, ufs-community#285 & ufs-community#286.

    Co-authored-by: Shrinivas Moorthi <shrinivas.moorthi@noaa.gov>
    Co-authored-by: Denise Worthen <Denise.Worthen@noaa.gov>
@ShanSunNOAA
Copy link
Collaborator Author

This issue is resolved in the commit today. Thanks Moorthi for fixing this bug.

pjpegion pushed a commit to NOAA-PSL/ufs-weather-model.p7b that referenced this issue Jul 20, 2021
… zorl interstitial, ocn -> wat, merra2 threading (ufs-community#279)

* changed .gitmodules to point to merra2 ccpp/physics
* remove GFDL_atmos_cubed_sphere and ccpp-framework from .git module
* remove IPD gfsphysics
* Update .gitmodules and submodule pointer for ccpp-physics for code review and testing
* Remove interstitial zorl composites
* Update .gitmodules and submodule pointer fpor ccpp-physics for code review and testing
* Remove or replace references to IPD in comments in atmos_model.F90
* Initialize Sfcprop%zorlx to clear_val instead of huge
* Update submodule pointer for ccpp-physics
* Rename Fortran variables and CCPP standard names / long names of surface composites from ocean to water
* Rename Sfcprop%zorlw to Sfcprop%zorlwav
* Rename Sfcprop%zorlo to Sfcprop%zorlw
* update submodule pointer for ccpp-physics
* Revert change to .gitmodules and update submodule pointer for ccpp-physics
Co-authored-by: anning.cheng <anning.cheng@noaa.gov>
epic-cicd-jenkins pushed a commit that referenced this issue Apr 17, 2023
* Remove all references to /lfs3 on Jet

* Add Ben and Ratko to the CODEOWNERS file

* Replace hard-coded make_orog module file with build-level module file in UFS_UTILS

* Remove hard-coded make_sfc_climo module file

* Rename all FV3-SAR and SAR-FV3 to FV3-LAM, rename all JPgrid to ESGgrid.  Remove fix files in regional_workflow and source from fix_am and EMC_post.

* Add alpha/kappa parameter back in exregional_make_grid.sh

* Remove dash from FV3LAM_wflow.xml

* Change FIXam to FIXgsm to source Thompson CCN file

* Remove old, unused grid stanza from exregional_run_post.sh

* Change Jet locations of fix_am/fix_orog to EMC paths
epic-cicd-jenkins pushed a commit that referenced this issue Apr 17, 2023
Enables the build and test workflow to run on the Hera and Jet
platforms.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants