Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update WW3 for PIO/netCDF restarts #2445

Merged

Conversation

DeniseWorthen
Copy link
Collaborator

@DeniseWorthen DeniseWorthen commented Sep 24, 2024

Commit Queue Requirements:

  • Fill out all sections of this template.
  • All sub component pull requests have been reviewed by their code managers.
  • Run the full Intel+GNU RT suite (compared to current baselines) on either Hera/Derecho/Hercules
  • Commit 'test_changes.list' from previous step

Description:

Commit Message:

* UFSWM - add PIO settings to WAV attributes in ufs.configure templates
* UFSWM - update ww3_shel.nml to allow for the ice field to be written to the restart file when required (i.e, waves in the slow loop)
* UFSWM - add WW3 restart files to comparison lists
  * WW3 - Add netCDF PIO capability for restarts and run-time history

Priority:

  • High - required for GFSv17

Git Tracking

UFSWM:

Sub component Pull Requests:

UFSWM Blocking Dependencies:


Changes

Regression Test Changes (Please commit test_changes.list):

  • PR Updates/Changes Baselines.

New Baselines are required for all tests which include the WAV component. Answers do not change, but the comparison lists will now include a WW3 netCDF restart file. Note we do not currently compare the WW3 binary restart files for any global coupled test because they don't in general reproduce themselves.

To verify no answer changes, the WW3 restarts were temporarily removed from comparison lists but with netcdf restarts written and used for restart tests. All baselines passed against the develop-20240909 on hercules at 0b0a048

I've continued to test this PR against the current develop branch using the method of temporarily removing the netCDF WW3 restart files from the comparison lists. This feature branch has continued to pass as the final changes were made to the WW3 feature branch, most recently using 79cfd42.

I've also created a baseline using this PR at the above hash and verified against it. In this case, the netCDF restart files are being compared. All baselines pass.

In testing, it was found that Hercules+GNU failed for the subset rearranger, but worked for box. The relevant tests were switched to box only for Hercules+GNU tests. To verify that the problem is a platform (Hercules) issue, GNU tests were then run on Derecho against a self-baseline and all tests passed at 677cfd9.

rt_cpld_control_nowave_noaero_p8_gnu.log:Test cpld_control_nowave_noaero_p8_gnu PASS
rt_cpld_control_p8_gnu.log:Test cpld_control_p8_gnu PASS
rt_cpld_control_pdlib_p8_gnu.log:Test cpld_control_pdlib_p8_gnu PASS
rt_cpld_debug_p8_gnu.log:Test cpld_debug_p8_gnu PASS
rt_cpld_debug_pdlib_p8_gnu.log:Test cpld_debug_pdlib_p8_gnu PASS

On Hercules, a full RT test_changes.list has been committed. Examining the log files shows test failues are due to missing netCDF WW3 restarts. For these tests, no files were found to 'not compare'.

rt_atmwav_control_noaero_p8_intel.log: Comparing ufs.atmw.ww3.r.2021-03-22-64800.nc ............MISSING baseline
rt_cpld_2threads_p8_intel.log: Comparing ufs.cpld.ww3.r.2021-03-23-21600.nc ............MISSING baseline
rt_cpld_bmark_p8_intel.log: Comparing ufs.cpld.ww3.r.2013-04-01-21600.nc ............MISSING baseline
rt_cpld_control_c192_p8_intel.log: Comparing ufs.cpld.ww3.r.2021-03-23-43200.nc ............MISSING baseline
rt_cpld_control_ciceC_p8_intel.log: Comparing ufs.cpld.ww3.r.2021-03-23-21600.nc ............MISSING baseline
rt_cpld_control_gfsv17_intel.log: Comparing ufs.cpld.ww3.r.2021-03-23-21600.nc ............MISSING baseline
rt_cpld_control_noaero_p8_intel.log: Comparing ufs.cpld.ww3.r.2021-03-23-21600.nc ............MISSING baseline
rt_cpld_control_p8.v2.sfc_intel.log: Comparing ufs.cpld.ww3.r.2021-03-23-21600.nc ............MISSING baseline
rt_cpld_control_p8_faster_intel.log: Comparing ufs.cpld.ww3.r.2021-03-23-21600.nc ............MISSING baseline
rt_cpld_control_p8_intel.log: Comparing ufs.cpld.ww3.r.2021-03-23-21600.nc ............MISSING baseline
rt_cpld_control_p8_mixedmode_intel.log: Comparing ufs.cpld.ww3.r.2021-03-23-21600.nc ............MISSING baseline
rt_cpld_control_pdlib_p8_gnu.log: Comparing ufs.cpld.ww3.r.2021-03-23-21600.nc ............MISSING baseline
rt_cpld_control_pdlib_p8_intel.log: Comparing ufs.cpld.ww3.r.2021-03-23-21600.nc ............MISSING baseline
rt_cpld_control_qr_p8_intel.log: Comparing ufs.cpld.ww3.r.2021-03-23-21600.nc ............MISSING baseline
rt_cpld_debug_gfsv17_intel.log: Comparing ufs.cpld.ww3.r.2021-03-22-32400.nc ............MISSING baseline
rt_cpld_debug_noaero_p8_intel.log: Comparing ufs.cpld.ww3.r.2021-03-22-32400.nc ............MISSING baseline
rt_cpld_debug_p8_intel.log: Comparing ufs.cpld.ww3.r.2021-03-22-32400.nc ............MISSING baseline
rt_cpld_debug_pdlib_p8_gnu.log: Comparing ufs.cpld.ww3.r.2021-03-22-32400.nc ............MISSING baseline
rt_cpld_debug_pdlib_p8_intel.log: Comparing ufs.cpld.ww3.r.2021-03-22-32400.nc ............MISSING baseline
rt_cpld_decomp_p8_intel.log: Comparing ufs.cpld.ww3.r.2021-03-23-21600.nc ............MISSING baseline
rt_cpld_mpi_gfsv17_intel.log: Comparing ufs.cpld.ww3.r.2021-03-23-21600.nc ............MISSING baseline
rt_cpld_mpi_p8_intel.log: Comparing ufs.cpld.ww3.r.2021-03-23-21600.nc ............MISSING baseline
rt_hafs_regional_atm_ocn_wav_intel.log: Comparing ufs.hafs.ww3.r.2019-08-29-21600.nc ............MISSING baseline
rt_hafs_regional_atm_wav_intel.log: Comparing ufs.hafs.ww3.r.2019-08-29-21600.nc ............MISSING baseline

Input data Changes:

  • None

Testing Log:

  • RDHPCS
    • Hera
    • Orion
    • Hercules
    • Jet
    • Gaea
    • Derecho
  • WCOSS2
    • Dogwood/Cactus
    • Acorn
  • CI
  • opnReqTest (complete task if unnecessary)

DeniseWorthen and others added 30 commits July 27, 2024 15:14
at cc70186, the following files do not compare

rt_cpld_mpi_gfsv17_intel.log: Comparing ufs.cpld.ww3.r.2021-03-23-21600 .....USING CMP......NOT IDENTICAL
rt_cpld_mpi_pdlib_p8_intel.log: Comparing ufs.cpld.ww3.r.2021-03-23-21600 .....USING CMP......NOT IDENTICAL
rt_cpld_restart_bmark_p8_intel.log: Comparing ufs.cpld.ww3.r.2013-04-01-21600 .....USING CMP......NOT IDENTICAL
rt_cpld_restart_c192_p8_intel.log: Comparing ufs.cpld.ww3.r.2021-03-23-43200 .....USING CMP......NOT IDENTICAL
*add trho fix to w3iors, these ww3.r files do not compare
*tested against bl.trhofix

rt_cpld_mpi_gfsv17_intel.log:Test cpld_mpi_gfsv17_intel FAIL
rt_cpld_mpi_pdlib_p8_intel.log:Test cpld_mpi_pdlib_p8_intel FAIL
rt_cpld_restart_bmark_p8_intel.log:Test cpld_restart_bmark_p8_intel FAIL
rt_cpld_restart_c192_p8_intel.log:Test cpld_restart_c192_p8_intel FAIL
* no write/read of fpis. these ww3.r files do not compare. tested
against bl.trhofix.nofpis. all other files compare b4b

rt_cpld_mpi_gfsv17_intel.log: Comparing ufs.cpld.ww3.r.2021-03-23-21600 .....USING CMP......NOT IDENTICAL
rt_cpld_mpi_pdlib_p8_intel.log: Comparing ufs.cpld.ww3.r.2021-03-23-21600 .....USING CMP......NOT IDENTICAL
* fix typo in use_historync
* remove mediator_present flag (unneeded)
* following pass baseline
cpld_debug_noaero_p8
cpld_debug_pdlib_p8
hafs_regional_storm_following_1nest_atm_ocn_wav_mom6
* tested all wave-containing tests with modifications for restart
file naming to allow for the custom filenaming of binary restarts.
This feature is present in the current WW3 code, but will be removed
once we enable netcdf restarts. Temporary code was added to allow the
binary restart to have the existing format of casename+ww3.r+timestring.
With this modification, all baselines were B4B.
* ww3 hash 4674dae passes against a self-generated baseline except
for cpld_restart_gfsv17_intel
* compare cmeps restart files of this uwm-hash against current baseline
at develop-20240904. All are identical except for cpld_control_gfsv17_iau_intel
* ww3 0ad634c9 still fails slow restart, even though my
sandbox testing passed.
* additional restart fields for WW3/slow loop coupling are
requested via ww3 nml setting
@FernandoAndrade-NOAA FernandoAndrade-NOAA added Ready for Commit Queue The PR is ready for the Commit Queue. All checkboxes in PR template have been checked. jenkins-ort run ORT testing and removed jenkins-ort run ORT testing labels Nov 12, 2024
@FernandoAndrade-NOAA
Copy link
Collaborator

Leaving a note that Orion is consistently failing hafs_regional_atm_ocn_wav_intel during baseline creation runtime due to time limit, I'm rerunning with an increased time limit. Gaea RTs showed changes in intelllvm tests as well. They've been regenerated and are rerunning. Jet is still running.

@DeniseWorthen
Copy link
Collaborator Author

@FernandoAndrade-NOAA Can you confirm that the Gaea LLVM RTs failed because the WW3 netcdf restarts would have been missing from the baseline?

@FernandoAndrade-NOAA
Copy link
Collaborator

@FernandoAndrade-NOAA Can you confirm that the Gaea LLVM RTs failed because the WW3 netcdf restarts would have been missing from the baseline?

Correct, the errors in the logs were due to missing baselines.

@jkbk2004
Copy link
Collaborator

Baseline develop-20241112 was created ok on Derecho: /glade/derecho/scratch/epicufsrt/ufs-weather-model/RT/NEMSfv3gfs/develop-20241112. Due to ecflow and rocoto issue on the machine, we will skip Derecho RT log.

@jkbk2004
Copy link
Collaborator

anyway, regarding derecho ecflow issue:

Error: request( --begin=regtest_14784 ) failed!  Server reply: BeginCmd::doHandleRequest:  Begin failed as suite 'regtest_14784' is not loaded.

in contact with sys admin. Rocoto issue on derecho: rocotorun and rocotstat iteration gets stuck with long list of jobs.

@FernandoAndrade-NOAA
Copy link
Collaborator

Ok, we should be all set then. I'll leave a note in the subcomponent PR.

@FernandoAndrade-NOAA FernandoAndrade-NOAA merged commit 6b0f516 into ufs-community:develop Nov 14, 2024
4 checks passed
tsga added a commit to tsga/ufs-weather-model that referenced this pull request Nov 16, 2024
…-model into feature/lnd_iau

* 'feature/lnd_iau' of https://github.com/tsga/ufs-weather-model:
  fix gitmodules to point to fv3 feature branch
  Update WW3 for PIO/netCDF restarts (ufs-community#2445)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Baseline Updates Current baselines will be updated. Ready for Commit Queue The PR is ready for the Commit Queue. All checkboxes in PR template have been checked.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add netcdf restart and history files using PIO (parallel netCDF) for dev/ufs-weather-model
5 participants