Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UFS-dev PR#53 #91

Merged
merged 18 commits into from
Apr 6, 2023
Merged

UFS-dev PR#53 #91

merged 18 commits into from
Apr 6, 2023

Conversation

grantfirl
Copy link
Collaborator

binli2337 and others added 9 commits February 22, 2023 16:14
…ces and turning off output; update FAQ documentation (was 1608); update drag suite intent mods (was 1612) (ufs-community#1597)

* update cdeps

* use fv3atm from PR 1612

* Changed UGWP diagnostic variable declaration intents from out to inout

* Docs/faqupdate (NCAR#8)

Co-authored-by: Denise Worthen <denise.worthen@noaa.gov>
Co-authored-by: jkbk2004 <jong.kim@noaa.gov>
Co-authored-by: Brian Curtis <brian.curtis@noaa.gov>
* MOM6 writes restarts in YYYYMMDD.HHMMSS.MOM*nc format
* the final restart is timestamped
* make changes in scripts to remove 'suffix-hours' strings

* remove STORE_CORIOLIS from MOM_input templates

* add timestamp in the end forecast restart filenames for GFS

* FV3atm restart filename format
* Update FV3

* Merge pull request NCAR#67 from dustinswales/accumulated_cleanup

* Updated physics

* Reverting DISKNM/gaea to original

* add new BL_DATE
…1578)

* Faster Compile option turned on for a compile and test per project

* added sample tests from each project to be tested with faster compile

* HAFS app to be compiled with 64bit
…ble usage of shared pio (ufs-community#1645)

* Gaea system: change in hpc-stack location, miniconda3 (EPIC-managed)

* move DISABLE_FMA to one if at the end

* only disable fma on wcoss if FASTER=ON

* Update CMakeModules to develop; remove STATIC requirement from PIO find.

Co-authored-by: Natalie Perlin <Natalie.Perlin@noaa.gov>
Co-authored-by: ulmononian <cameron_book@alumni.brown.edu>
* fix clock initialization for a restart in WW3

* update WW3 submodule

* update to develop WW3
…le pointer update for ufs-community#462 (ufs-community#1654)

* update FV3 submodule and .gitmodules for testing of 20230313_combo

* turn off cpld_control_p8_faster cheyenne
@grantfirl grantfirl mentioned this pull request Mar 31, 2023
@grantfirl
Copy link
Collaborator Author

No changes to baselines expected due to ufs-community#1658 directly.

@grantfirl
Copy link
Collaborator Author

Failed tests expected from ufs-community#1599

FAILED TESTS:
Test cpld_control_p8_mixedmode 001 failed in check_result failed
Test cpld_control_p8_mixedmode 001 failed in run_test failed
Test cpld_control_gfsv17 002 failed in check_result failed
Test cpld_control_gfsv17 002 failed in run_test failed
Test cpld_control_p8 003 failed in check_result failed
Test cpld_control_p8 003 failed in run_test failed
Test cpld_2threads_p8 005 failed in check_result failed
Test cpld_2threads_p8 005 failed in run_test failed
Test cpld_esmfthreads_p8 006 failed in check_result failed
Test cpld_esmfthreads_p8 006 failed in run_test failed
Test cpld_decomp_p8 007 failed in check_result failed
Test cpld_decomp_p8 007 failed in run_test failed
Test cpld_mpi_p8 008 failed in check_result failed
Test cpld_mpi_p8 008 failed in run_test failed
Test cpld_control_ciceC_p8 009 failed in check_result failed
Test cpld_control_ciceC_p8 009 failed in run_test failed
Test cpld_control_c192_p8 010 failed in check_result failed
Test cpld_control_c192_p8 010 failed in run_test failed
Test cpld_bmark_p8 012 failed in check_result failed
Test cpld_bmark_p8 012 failed in run_test failed
Test cpld_control_noaero_p8 014 failed in check_result failed
Test cpld_control_noaero_p8 014 failed in run_test failed
Test cpld_control_nowave_noaero_p8 015 failed in check_result failed
Test cpld_control_nowave_noaero_p8 015 failed in run_test failed
Test cpld_debug_p8 016 failed in check_result failed
Test cpld_debug_p8 016 failed in run_test failed
Test cpld_debug_noaero_p8 017 failed in check_result failed
Test cpld_debug_noaero_p8 017 failed in run_test failed
Test cpld_control_noaero_p8_agrid 018 failed in check_result failed
Test cpld_control_noaero_p8_agrid 018 failed in run_test failed
Test cpld_control_c48 019 failed in check_result failed
Test cpld_control_c48 019 failed in run_test failed
Test cpld_warmstart_c48 020 failed in check_result failed
Test cpld_warmstart_c48 020 failed in run_test failed
Test datm_cdeps_control_cfsr 131 failed in check_result failed
Test datm_cdeps_control_cfsr 131 failed in run_test failed
Test datm_cdeps_control_gefs 133 failed in check_result failed
Test datm_cdeps_control_gefs 133 failed in run_test failed
Test datm_cdeps_iau_gefs 134 failed in check_result failed
Test datm_cdeps_iau_gefs 134 failed in run_test failed
Test datm_cdeps_stochy_gefs 135 failed in check_result failed
Test datm_cdeps_stochy_gefs 135 failed in run_test failed
Test datm_cdeps_ciceC_cfsr 136 failed in check_result failed
Test datm_cdeps_ciceC_cfsr 136 failed in run_test failed
Test datm_cdeps_bulk_cfsr 137 failed in check_result failed
Test datm_cdeps_bulk_cfsr 137 failed in run_test failed
Test datm_cdeps_bulk_gefs 138 failed in check_result failed
Test datm_cdeps_bulk_gefs 138 failed in run_test failed
Test datm_cdeps_mx025_cfsr 139 failed in check_result failed
Test datm_cdeps_mx025_cfsr 139 failed in run_test failed
Test datm_cdeps_mx025_gefs 140 failed in check_result failed
Test datm_cdeps_mx025_gefs 140 failed in run_test failed
Test datm_cdeps_3072x1536_cfsr 142 failed in check_result failed
Test datm_cdeps_3072x1536_cfsr 142 failed in run_test failed
Test datm_cdeps_gfs 143 failed in check_result failed
Test datm_cdeps_gfs 143 failed in run_test failed
Test datm_cdeps_debug_cfsr 144 failed in check_result failed
Test datm_cdeps_debug_cfsr 144 failed in run_test failed
hera.gnu

FAILED TESTS:
Test cpld_control_p8 046 failed in run_test failed
Test cpld_control_nowave_noaero_p8 047 failed in check_result failed
Test cpld_control_nowave_noaero_p8 047 failed in run_test failed
Test cpld_debug_p8 048 failed in check_result failed
Test cpld_debug_p8 048 failed in run_test failed
Test datm_cdeps_control_cfsr 049 failed in check_result failed
Test datm_cdeps_control_cfsr 049 failed in run_test failed

@grantfirl
Copy link
Collaborator Author

For ufs-community#1645 (shouldn't show up on Hera and Cheyenne RTs):

Changes are expected to the following tests: hafs_regional_storm_following_1nest_atm_ocn_wav on WCOSS2 and Acorn only.

@grantfirl
Copy link
Collaborator Author

@dustinswales Can I go ahead and add RT labels for this to verify failed tests?

@dustinswales
Copy link
Collaborator

@dustinswales Can I go ahead and add RT labels for this to verify failed tests?

Do it.

@dustinswales

This comment was marked as outdated.

@dustinswales

This comment was marked as outdated.

@dustinswales

This comment was marked as outdated.

@dustinswales

This comment was marked as outdated.

@dustinswales
Copy link
Collaborator

Automated RT Failure Notification
Machine: cheyenne
Compiler: gnu
Job: BL
[BL] Repo location: /glade/scratch/epicufsrt/GMTB/ufs-weather-model/RT/auto_RT/Pull_Requests/1298023507/20230405103013/ufs-weather-model
[BL] Baseline creation and move successful
[RT] Repo location: /glade/scratch/epicufsrt/GMTB/ufs-weather-model/RT/auto_RT/Pull_Requests/1298023507/20230405105958/ufs-weather-model
Please make changes and add the following label back: cheyenne-gnu-BL

@dustinswales
Copy link
Collaborator

Automated RT Failure Notification
Machine: cheyenne
Compiler: intel
Job: BL
[BL] Repo location: /glade/scratch/epicufsrt/GMTB/ufs-weather-model/RT/auto_RT/Pull_Requests/1298023507/20230405113544/ufs-weather-model
[BL] Baseline creation and move successful
[RT] Repo location: /glade/scratch/epicufsrt/GMTB/ufs-weather-model/RT/auto_RT/Pull_Requests/1298023507/20230405132802/ufs-weather-model
[RT] Error: Test hafs_regional_atm_ocn 113 failed in check_result failed
[RT] Error: Test hafs_regional_atm_ocn 113 failed in run_test failed
[RT] Error: Test hafs_global_1nest_atm 118 failed in check_result failed
[RT] Error: Test hafs_global_1nest_atm 118 failed in run_test failed
Please make changes and add the following label back: cheyenne-intel-BL

@grantfirl
Copy link
Collaborator Author

@dustinswales I'm not sure what to make of the cheyenne/intel failures. Does this message mean that the new baselines were created but that the tests against the new baseline failed (which might signal a reproducibility problem)?

@dustinswales
Copy link
Collaborator

@grantfirl Ugh, I'm not sure what's going on here...

In /glade/scratch/epicufsrt/GMTB/ufs-weather-model/RT/auto_RT/Pull_Requests/1298023507/20230405113544/ufs-weather-model/tests/RegressionTests_cheyenne.intel.log there are some conflicting pieces of information. At the end of the file it says the REGRESSION TEST WAS SUCCESSFUL, but at the top of the file there is a compilation fail for compile 008?
But with all that, the BL are still created and moved! (see /glade/scratch/epicufsrt/GMTB/ufs-weather-model/RT/NCAR/main-20230403/INTEL/)

Then in /glade/scratch/epicufsrt/GMTB/ufs-weather-model/RT/auto_RT/Pull_Requests/1298023507/20230405132802/ufs-weather-model/tests/RegressionTests_cheyenne.intel.log there are failures when comparing to the baselines?

As you pointed out, this suggests reproducibility problems, but there shouldn't be?

@dustinswales
Copy link
Collaborator

Automated RT Failure Notification
Machine: cheyenne
Compiler: intel
Job: BL
[BL] Repo location: /glade/scratch/epicufsrt/GMTB/ufs-weather-model/RT/auto_RT/Pull_Requests/1298023507/20230405190011/ufs-weather-model
[BL] ERROR: Baseline location exists before creation:
/glade/scratch/epicufsrt/GMTB/ufs-weather-model/RT/NCAR/main-20230403/INTEL
Please make changes and add the following label back: cheyenne-intel-BL

@grantfirl
Copy link
Collaborator Author

@dustinswales I manually ran the hafs_regional_atm_ocn test against the baseline and it failed again. I'd sorta like to recreate the cheyenne intel baselines and see if the same error happens. Could you delete /glade/scratch/epicufsrt/GMTB/ufs-weather-model/RT/NCAR/main-20230403/INTEL please?

@dustinswales
Copy link
Collaborator

@grantfirl I just deleted /glade/scratch/epicufsrt/GMTB/ufs-weather-model/RT/NCAR/main-20230403/INTEL.

@dustinswales
Copy link
Collaborator

Automated RT Failure Notification
Machine: cheyenne
Compiler: intel
Job: BL
[BL] Repo location: /glade/scratch/epicufsrt/GMTB/ufs-weather-model/RT/auto_RT/Pull_Requests/1298023507/20230406091511/ufs-weather-model
[BL] Error: Test compile_008 failed in run_compile failed
Please make changes and add the following label back: cheyenne-intel-BL

@grantfirl
Copy link
Collaborator Author

@dustinswales FWIW, I manually ran rt.ncar.sh on the hafs_regional_atm_ocn test using the -c and then the -m option, creating a new local baseline and checking against it, and it was successful. So, I'm guessing that this is a glitch? It looks like the compile_008 error above is a time-out. I'm fairly confident that we should be able to merge this, but we do need the BL creation to succeed so that we can test future PRs.

@grantfirl
Copy link
Collaborator Author

The compilation timeout is using this line in rt.conf:
COMPILE | -DAPP=S2SWA -DCCPP_SUITES=FV3_GFS_v17_coupled_p8,FV3_GFS_cpld_rasmgshocnsstnoahmp_ugwp -DFASTER=ON | | fv3 |
RUN | cpld_control_p8_faster | - cheyenne.intel | fv3 |

I'm going to set the walltime in compile_qsub.IN_cheyenne to a larger number temporarily to see if it will pass.

@grantfirl
Copy link
Collaborator Author

Looks like compile_008 takes ~33 minutes to run for future reference.

@dustinswales
Copy link
Collaborator

@grantfirl Good sleuthing! I wonder why we are having timeout issues on cheyenne intel, but not on the UWM side?
Also, there is the error we get after a successful BL creation? I will investigate how I set up the autoRTs, something may be off.

on-behalf-of @NCAR <dswales@ucar.edu>
@dustinswales
Copy link
Collaborator

Automated RT Failure Notification
Machine: cheyenne
Compiler: intel
Job: BL
[BL] Repo location: /glade/scratch/epicufsrt/GMTB/ufs-weather-model/RT/auto_RT/Pull_Requests/1298023507/20230406111508/ufs-weather-model
[BL] Baseline creation and move successful
[RT] Repo location: /glade/scratch/epicufsrt/GMTB/ufs-weather-model/RT/auto_RT/Pull_Requests/1298023507/20230406123505/ufs-weather-model
Please make changes and add the following label back: cheyenne-intel-BL

@grantfirl
Copy link
Collaborator Author

@dustinswales Please review/approve NCAR/ccpp-physics#1006, NCAR/fv3atm#88, and this so that we can merge. All tests completed. The failed tests on cheyenne.intel was a wild goose chase.

@dustinswales dustinswales self-requested a review April 6, 2023 20:32
@grantfirl grantfirl merged commit a72d438 into NCAR:main Apr 6, 2023
SamuelTrahanNOAA added a commit to SamuelTrahanNOAA/ufs-weather-model that referenced this pull request Aug 19, 2023
SamuelTrahanNOAA added a commit to SamuelTrahanNOAA/ufs-weather-model that referenced this pull request Aug 24, 2023
…ng PR#1863) (ufs-community#1844)

* Changes to logging and initialization of the CLM Lake Model.
* merge ccpp-physics NCAR#91 (UFS-SRW v3.0.0 SciDoc updates)

1. Use ice thickness hice(i) to find the level in the lake where ice is
   zero.
2. Do not allow lake temperature to be below freezing point if there is
   no ice.
3. If there is no snow or ice, do not allow surface lake temperature to
   be below freezing point.
   These changes fixed the problem with large errors in the energy budget
   at the beginning of the cold-start run with lakes.
4. Added flag to turn on debug print statements in the CLM lake model.

* explicitly turn of frac_ice for flake

* t_grnd(i) should be t_grnd(c)
-------------------------------------------------------------------
Co-authored-by: Samuel Trahan <samuel.trahan@noaa.gov>
Co-authored-by: Grant Firl <grant.firl@noaa.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants