Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simulations failed due to unrealistic high soil temperature #5803

Open
jingtao-lbl opened this issue Jul 12, 2023 · 11 comments
Open

Simulations failed due to unrealistic high soil temperature #5803

jingtao-lbl opened this issue Jul 12, 2023 · 11 comments

Comments

@jingtao-lbl
Copy link

Got an error (below) due to unrealistic high soil temperature in May during the AD spinup mode. I'm using the latest version (fc9f903) with pm-cpu.

138: lnd2atm_vars%t_soisno_grc(g, 1) is 401.120579771595
138: ENDRUN:
138: lnd2atm ERROR: lnd2atm_vars%t_soisno_grc > 400 Kelvin degree.ERROR in lnd2atm
138: Mod.F90 at line 468

This error occurred for simulations using both GSWP3v1 and CRUNCEP. I feel it is not related to climate forcing since this problem did not arise in previous versions of the model when using these forcings. Any recent changes to the model that might cause this problem?

image

@jinyuntang
Copy link
Contributor

jinyuntang commented Jul 12, 2023 via email

@jingtao-lbl
Copy link
Author

Some updates regarding this issue. I have tested global runs with compset 1850_DATM%CRU_ELM%CNPECACNTBC (I1850CRUCNPECACNTBC) at both resolutions of r05_r05 and f19_g16, and the f19_g16 simulation ran normally without any problems, but the r05_r05 run stopped due to the high soil temperature issue.

Not customized domains for these two simulations, and everything (e.g., fsurdata, paramfile, etc.) is on default. Same forcing for the two simulations, and both use intel on pm-cpu. Model version is fc9f903.

Also, the errors occur at different locations when repeating the same 0.5x0.5 deg simulations, as shown below.
image
image

@jingtao-lbl
Copy link
Author

Recent tests show that global runs (no matter what compsets, including CNPECACNTBC, CNPRDCTCBC, and BGC-FATES) at 0.5x0.5 deg resolution all failed due to this high ground temperature problem with the current Master version. The grid cells that give this problem are different with different forcing (e.g., GSWP3v1 vs. CRUNCEP_qianFill). If using different NPROCS, these grid cells also appear in different locations.

However, the same simulations at other resolutions, e.g., 1.9x2.5 or 4x5, work fine.

@rljacob @bishtgautam @peterdschwartz @glemieux

@rljacob
Copy link
Member

rljacob commented Aug 23, 2023

Does this only happen on pm-cpu? Might try the same case on another platform to see if its a compiler issue.

@jingtao-lbl
Copy link
Author

Does this only happen on pm-cpu? Might try the same case on another platform to see if its a compiler issue.

Thank you! Yes, I only tested it on pm-cpu. Jess helped me test it with gnu and she also got the error. Will try it on LRC.

@rljacob
Copy link
Member

rljacob commented Aug 23, 2023

I checked our testing and we do run tests at r05 but for 5 days or less and with only a few BGC options. You said a previous version ran fine. What version was it exactly? The git hash of the code that ran would be best.

@rljacob
Copy link
Member

rljacob commented Aug 23, 2023

These tests are passing fine:

  • SMS.r05_r05.I1850ELMCN._.elm-qian_1948
  • SMS.r05_r05.IELM._.elm-topounit
  • SMS_Ld2.ne30pg2_r05_EC30to60E2r2.BGCEXP_CNTL_CNPECACNT_1850._.elm-bgcexp
  • SMS_Ld2.ne30pg2_r05_EC30to60E2r2.BGCEXP_CNTL_CNPRDCTC_1850._.elm-bgcexp
  • SMS_Ln5.ne30pg2_r05_EC30to60E2r2.BGCEXP_LNDATM_CNPRDCTC_1850
  • SMS_Ln5.ne30pg2_r05_EC30to60E2r2.BGCEXP_LNDATM_CNPRDCTC_20TR

@jingtao-lbl
Copy link
Author

Oh great to know these tests are passing fine! How long did the test run last? I usually got the error after a couple of months, and Jess said the error popped out immediately for her simulation (when using FATES). Would you mind sharing the script for SMS_Ld2.ne30pg2_r05_EC30to60E2r2.BGCEXP_CNTL_CNPECACNT_1850._.elm-bgcexp here? Thank you so much!

@rljacob
Copy link
Member

rljacob commented Aug 23, 2023

By default the tests run for 5 days but "Ld2" in the above means run for 2 days. We don't use run scripts for the tests. Everything is done with a single "create_test" command. Go to E3SM/cime/scripts and type "./create_test SMS_Ld2.ne30pg2_r05_EC30to60E2r2.BGCEXP_CNTL_CNPECACNT_1850.pm-cpu_intel.elm-bgcexp" Replace the machine, compiler string as needed.

@jingtao-lbl
Copy link
Author

I checked our testing and we do run tests at r05 but for 5 days or less and with only a few BGC options. You said a previous version ran fine. What version was it exactly? The git hash of the code that ran would be best.

I found my version is quite old, and I used to run it on Cori without any problem. But when I bring it on Perlmutter, there are some dependency problems during compiling...

@jingtao-lbl
Copy link
Author

By default the tests run for 5 days but "Ld2" in the above means run for 2 days. We don't use run scripts for the tests. Everything is done with a single "create_test" command. Go to E3SM/cime/scripts and type "./create_test SMS_Ld2.ne30pg2_r05_EC30to60E2r2.BGCEXP_CNTL_CNPECACNT_1850.pm-cpu_intel.elm-bgcexp" Replace the machine, compiler string as needed.

Thank you! I will check the test run and see if it can pass if running a bit longer. Will keep you posted!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants