Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow use of downscaled warmstart files for cpld_control_sfs test #2375

Merged
merged 31 commits into from
Aug 21, 2024

Conversation

DeniseWorthen
Copy link
Collaborator

@DeniseWorthen DeniseWorthen commented Jul 19, 2024

Commit Queue Requirements:

  • Fill out all sections of this template.
  • All sub component pull requests have been reviewed by their code managers. NA
  • Run the full Intel+GNU RT suite (compared to current baselines) on either Hera/Derecho/Hercules
  • Commit 'test_changes.list' from previous step

Description:

Updates the MOM_input templates and RT scripts to allow use of downscaled MOM6 and CICE6 warmstarts.

Commit Message:

* UFSWM - Update the MOM_input templates and RT scripts to allow use of downscaled MOM6 and CICE6 warmstarts.

Priority:

  • Normal

Git Tracking

UFSWM:

Sub component Pull Requests:

  • None

UFSWM Blocking Dependencies:

  • None

Changes

Regression Test Changes (Please commit test_changes.list):

  • PR Updates/Changes Baselines.

The cpld_control_ufs test will require a baseline update.

Input data Changes:

  • New input data.

Downscaled warmstarts are from the current baseline for the ocnice_prep utility in UFS_UTILS (47705d5) and renamed.

An updated input-data directory has been staged at /scratch1/NCEPDEV/stmp4/Denise.Worthen/input-data-20240501

Library Changes/Upgrades:

  • No Updates

Testing Log:

  • RDHPCS
    • Hera
    • Orion
    • Hercules
    • Jet
    • Gaea
    • Derecho
  • WCOSS2
    • Dogwood/Cactus
    • Acorn
  • CI
  • opnReqTest (complete task if unnecessary)

@DeniseWorthen DeniseWorthen changed the title Allow use of downscaled warmstart files for cpld_control_ufs test Allow use of downscaled warmstart files for cpld_control_ufs test; Switch to 64 bit compile Jul 20, 2024
@DeniseWorthen DeniseWorthen marked this pull request as ready for review July 22, 2024 15:13
@DeniseWorthen DeniseWorthen added New Input Data Req'd This PR requires new data to be sync across platforms Baseline Updates Current baselines will be updated. labels Jul 22, 2024
@DeniseWorthen DeniseWorthen changed the title Allow use of downscaled warmstart files for cpld_control_ufs test; Switch to 64 bit compile Allow use of downscaled warmstart files for cpld_control_ufs test; Switch SFS test to 64 bit compile Jul 22, 2024
@DeniseWorthen DeniseWorthen changed the title Allow use of downscaled warmstart files for cpld_control_ufs test; Switch SFS test to 64 bit compile Allow use of downscaled warmstart files for cpld_control_ufs test Jul 23, 2024
@junwang-noaa junwang-noaa changed the title Allow use of downscaled warmstart files for cpld_control_ufs test Allow use of downscaled warmstart files for cpld_control_sfs test Jul 29, 2024
@jkbk2004
Copy link
Collaborator

jkbk2004 commented Aug 2, 2024

@zach1221 @FernandoAndrade-NOAA can we rsync up *warmstart.nc file from /scratch1/NCEPDEV/stmp4/Denise.Worthen/input-data-20240501/MOM6_IC and CICE_IC ?

@DeniseWorthen DeniseWorthen self-assigned this Aug 16, 2024
@DeniseWorthen
Copy link
Collaborator Author

@jkbk2004 Please do not push to my repos.

@DeniseWorthen
Copy link
Collaborator Author

DeniseWorthen commented Aug 20, 2024

@jkbk2004 The baseline 0813 on Hera is apparently corrupted. The MOM restart for the sfs test in 0813 is identical to the MOM restart in 0819. It should not be. It appears you've overwritten the control_sfs baseline in 0813 w/ the new baseline.

@jiandewang
Copy link
Collaborator

I ran UWM yesterday and all my jobs matched BL of 0813 include this sfs job. But right now I just did
ls -l /scratch2/NAGAPE/epic/UFS-WM_RT/NEMSfv3gfs/develop-20240813/cpld_control_sfs_intel/RESTART/

-rw-rw-r-- 1 role.epic epic 917198614 Aug 19 19:10 20210323.060000.MOM.res.nc

this file is being updated last night which shouldn't.

@jkbk2004
Copy link
Collaborator

I ran UWM yesterday and all my jobs matched BL of 0813 include this sfs job. But right now I just did ls -l /scratch2/NAGAPE/epic/UFS-WM_RT/NEMSfv3gfs/develop-20240813/cpld_control_sfs_intel/RESTART/

-rw-rw-r-- 1 role.epic epic 917198614 Aug 19 19:10 20210323.060000.MOM.res.nc

this file is being updated last night which shouldn't.

@jiandewang We used symlink 0813 to 0819 since this only change that sfs case. Somehow symlink introduced update of the sfs case on 0813 yesterday. That's why we fixed on 0813. I am confirming the 0813 baseline on hera now.

@BrianCurtis-NOAA
Copy link
Collaborator

I ran UWM yesterday and all my jobs matched BL of 0813 include this sfs job. But right now I just did ls -l /scratch2/NAGAPE/epic/UFS-WM_RT/NEMSfv3gfs/develop-20240813/cpld_control_sfs_intel/RESTART/
-rw-rw-r-- 1 role.epic epic 917198614 Aug 19 19:10 20210323.060000.MOM.res.nc
this file is being updated last night which shouldn't.

@jiandewang We used symlink 0813 to 0819 since this only change that sfs case. Somehow symlink introduced update of the sfs case on 0813 yesterday. That's why we fixed on 0813. I am confirming the 0813 baseline on hera now.

I would recommend against symbolically linking baselines for this exact reason. You risk overwriting old baselines on accident. By the time the RT to create baselines is done, you can have a full baseline copied to a new bl_date on any system.

@DeniseWorthen
Copy link
Collaborator Author

@jkbk2004 The problem exists on Gaea also. These two files compare as identical, and they should not

 nccmp -d -S -q -f -g -B --Attribute=checksum --warn=format develop-20240813/cpld_control_sfs_intel/RESTART/20210323.060000.MOM.res.nc develop-20240819/cpld_control_sfs_intel/RESTART/20210323.060000.MOM.res.nc

@jkbk2004
Copy link
Collaborator

I ran UWM yesterday and all my jobs matched BL of 0813 include this sfs job. But right now I just did ls -l /scratch2/NAGAPE/epic/UFS-WM_RT/NEMSfv3gfs/develop-20240813/cpld_control_sfs_intel/RESTART/
-rw-rw-r-- 1 role.epic epic 917198614 Aug 19 19:10 20210323.060000.MOM.res.nc
this file is being updated last night which shouldn't.

@jiandewang We used symlink 0813 to 0819 since this only change that sfs case. Somehow symlink introduced update of the sfs case on 0813 yesterday. That's why we fixed on 0813. I am confirming the 0813 baseline on hera now.

I would recommend against symbolically linking baselines for this exact reason. You risk overwriting old baselines on accident. By the time the RT to create baselines is done, you can have a full baseline copied to a new bl_date on any system.

I prefer to recreate whole baseline whenever pr contains baseline change. It's pain to selectively and manually handle baseline update for a few cases.

@DusanJovic-NOAA
Copy link
Collaborator

@jkbk2004 Please do not recreate the whole baseline whenever pr contains baseline change. We added -b option specifically for this reason, to avoid the necessity of doing anything manually. Recreating all baselines unnecessarily uses more resources, takes more time and most importantly we can not be sure that only tests that are expected to change baselines, actually have their baselines updated. See this description #1834

And later on we decided to commit test_changes.list file with every commit to document which tests do change the baselines.

@jkbk2004
Copy link
Collaborator

0813 baselines are restored ok. jet is down. But I will make sure. we can start merging this pr.

jkbk2004
jkbk2004 previously approved these changes Aug 20, 2024
@DeniseWorthen
Copy link
Collaborator Author

DeniseWorthen commented Aug 20, 2024

@jkbk2004 Oh Hercules, I cannot ls develop-20240819. I can the 0813 baseline. What are the permissions?

@DeniseWorthen
Copy link
Collaborator Author

@jkbk2004 I have the same issue w/ the 0819 baseline on orion. Please check the permissions.

@jkbk2004
Copy link
Collaborator

release permission on herclues/orion

@FernandoAndrade-NOAA
Copy link
Collaborator

@BrianCurtis-NOAA are we skipping Acorn this PR?

@BrianCurtis-NOAA
Copy link
Collaborator

I got Acorn back late, so I've almost finished comparisons. If they fail and it's not a quick re-run, i'll skip. For now, wait another 15-20, hopefully it goes quick.

@jkbk2004 jkbk2004 merged commit b3cdd8e into ufs-community:develop Aug 21, 2024
3 checks passed
DavidHuber-NOAA added a commit to DavidHuber-NOAA/ufs-weather-model that referenced this pull request Sep 9, 2024
…r-model into develop

* 'develop' of https://github.com/ufs-community/ufs-weather-model:
  update mom6 to its main repo. 20240824 commit (FMA) (ufs-community#2412)
  Update CMEPS; fix aux history functionality for float variable type; Switch to using Aux history files in atm_ds2s_docn_dice test; Remove IFI tests (was ufs-community#2417) (ufs-community#2395)
  Combination for CCPP-physics ufs-community#213 and ufs-community#218 (H2O scheme refactor and C3/SAS/MYNN fix) (ufs-community#2408)
  Unify CDEPS gfs, cfsr, and gefs datm datamodes + Improve error checking in rt.sh (2388) + Add ability to read increment files on native cubed sphere grid (2304) (ufs-community#2389)
  sync with head of NOAA-EMC UPP develop (ufs-community#2326)
  Allow use of downscaled warmstart files for cpld_control_sfs test (ufs-community#2375)
  update to MOM6 main 20240729 commit (gfdl-to-main-2024-05-31) (ufs-community#2381)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Baseline Updates Current baselines will be updated. New Input Data Req'd This PR requires new data to be sync across platforms Ready for Commit Queue The PR is ready for the Commit Queue. All checkboxes in PR template have been checked.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow use of downscaled MOM6 and CICE6 warmstart files for SFS test
9 participants