Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix wave restart for cold start and add ic version file #3112

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from

Conversation

WalterKolczynski-NOAA
Copy link
Contributor

@WalterKolczynski-NOAA WalterKolczynski-NOAA commented Nov 19, 2024

Description

The stage job was incorrectly putting wave restarts into the gfs directory. The forecast job looks for them in the gdas directory, so this is updated.

Additionally, the restarts were also not being copied from the staged directory to $DATA, so now they are. The process is identical to that of non-RERUN warm starts, so the code is refactored a bit to avoid duplication.

As part of updating the ICs with the new restart location, an IC version file is added to support different version numbers for different IC directories. Unlike other version files, this one uses an associative array rather than individual variables.

Resolves #3109

Type of change

  • Bug fix (fixes something broken)

Change characteristics

  • Is this a breaking change (a change in existing functionality)? NO
  • Does this change require a documentation update? NO
  • Does this change require an update to any of the following submodules? NO

How has this been tested?

  • C48_S2SW test on Hercules
  • C48_S2SWA_gefs test on Hercules

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have documented my code, including function, input, and output descriptions
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • This change is covered by an existing CI test or a new one has been added
  • Any new scripts have been added to the .github/CODEOWNERS file with owners
  • I have made corresponding changes to the system documentation if necessary

The stage job was incorrectly putting wave restarts into the gfs
directory. The forecast job looks for them in the gdas directory,
so this is updated.

Additionally, the restarts were also not being copied from the
staged directory to `$DATA`, so now they are. The process is
identical to that of non-RERUN warm starts, so the code is re-
factored a bit to avoid duplication.

Resolves NOAA-EMC#3109
@WalterKolczynski-NOAA
Copy link
Contributor Author

This won't work yet because I need to move all the wave restarts in ICSDIR.

@JessicaMeixner-NOAA
Copy link
Contributor

This won't work yet because I need to move all the wave restarts in ICSDIR.

I have a directory of just 1 IC test so I can get it moved and tested from there pretty easily. Quick question, I should be staging in the previous cycle gdas, correct?

@WalterKolczynski-NOAA
Copy link
Contributor Author

This won't work yet because I need to move all the wave restarts in ICSDIR.

I have a directory of just 1 IC test so I can get it moved and tested from there pretty easily. Quick question, I should be staging in the previous cycle gdas, correct?

Yes. Should be the same as we have now, except gfs ➡ gdas

@JessicaMeixner-NOAA
Copy link
Contributor

@WalterKolczynski-NOAA - I have test running. It's not "clean" in the sense that I merged your changes into my branch - but the stage-IC job succeeded and forecast is in the queue. I'll report in the morning and thank you so much for a quick fix to this problem!

@JessicaMeixner-NOAA
Copy link
Contributor

@WalterKolczynski-NOAA - My test from last night has a wave IC!!!! Thank you again for this quick update. Even just having this branch means we can move forward with getting some runs comparing different physics options for the wave model now.

Adds a new vesion file for IC directories. Unlike other version
files, this one uses an associative array instead of different
variables.

With the version file in place, the versions are updated on most
of the directories to switch to the relocated wave restarts.

Refs: NOAA-EMC#3109
@WalterKolczynski-NOAA WalterKolczynski-NOAA marked this pull request as ready for review November 20, 2024 18:31
@WalterKolczynski-NOAA
Copy link
Contributor Author

New IC versions have been created for the relocated wave restarts. This required adding an IC versions file.

Note: the high-res cases (C768/C1152) likely will still not work. They use the wave grid name as the restart suffix, but for single-grid waves, ending in .ww3 is expected. Unlike the other directories, where I made this change previously by adding symlinks, I can't make create links for the high-res wave restarts because they point to the prototype ICs owned by the climate group. (I guess I could redo it to just point at the file instead of the directory, but I haven't.)

Adds a new vesion file for IC directories. Unlike other version
files, this one uses an associative array instead of different
variables.

With the version file in place, the versions are updated on most
of the directories to switch to the relocated wave restarts.

Refs: NOAA-EMC#3109
@JessicaMeixner-NOAA
Copy link
Contributor

New IC versions have been created for the relocated wave restarts. This required adding an IC versions file.

Note: the high-res cases (C768/C1152) likely will still not work. They use the wave grid name as the restart suffix, but for single-grid waves, ending in .ww3 is expected. Unlike the other directories, where I made this change previously by adding symlinks, I can't make create links for the high-res wave restarts because they point to the prototype ICs owned by the climate group. (I guess I could redo it to just point at the file instead of the directory, but I haven't.)

I have access to add additional links to files owned by climate group or can coordinate this being found. Can you let me know which directories?

@WalterKolczynski-NOAA WalterKolczynski-NOAA added CI-Wcoss2-Building **Bot use only** CI testing is cloning/building on WCOSS and removed CI-Wcoss2-Building **Bot use only** CI testing is cloning/building on WCOSS labels Nov 21, 2024
Copy link
Member

@KateFriedman-NOAA KateFriedman-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for the IC updates/fixes @WalterKolczynski-NOAA ! Approve pending successful completion of CI testing.

Copy link
Contributor

@JessicaMeixner-NOAA JessicaMeixner-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested the wave parts of these changes and approve those changes. This PR however touches a lot more than just wave ICs with the IC version change, so maybe posting the output of the CI for more people to look at before giving the official approval might be a good idea?

@JessicaMeixner-NOAA
Copy link
Contributor

@WalterKolczynski-NOAA - Can you remind me again of which machine which ICs I need to update for the wave model?

I wanted to say you said an HR3?

I can also work with @jiandewang to get this updated on WCOSS2 and run an HR4 test there as part of the review as well.

@JessicaMeixner-NOAA
Copy link
Contributor

@WalterKolczynski-NOAA - Can you remind me again of which machine which ICs I need to update for the wave model?

I wanted to say you said an HR3?

I can also work with @jiandewang to get this updated on WCOSS2 and run an HR4 test there as part of the review as well.

@WalterKolczynski-NOAA i just looked on wcoss2, this looks update? So can I run an HR4 like test there now? In the meantime, I'm not sure what you need changed otherwise so I'll wait for your response on testing and what else needs changed.

@WalterKolczynski-NOAA
Copy link
Contributor Author

WalterKolczynski-NOAA commented Nov 21, 2024

@WalterKolczynski-NOAA - Can you remind me again of which machine which ICs I need to update for the wave model?

I wanted to say you said an HR3?

I can also work with @jiandewang to get this updated on WCOSS2 and run an HR4 test there as part of the review as well.

@JessicaMeixner-NOAA
HR3, on all machines. Just need to add a symlink in each of the wave restart directories that uses the .ww3 suffix instead of .uglo_m1g16. This code run should do the trick:

20:35:04 HR3marine/> \
for restart in $(find /scratch1/NCEPDEV/climate/role.ufscpara/IC/HR3marine -name '*restart.uglo_m1g16'); do
  dir=$(dirname ${restart})
  target=$(basename ${restart})
  link=${target/%.uglo_m1g16/.ww3}
  cd ${dir}
  ln -s ${target} ${link}
done

Needs to be repeated on all machines.

(These changes are related to a previous PR fixing wave restarts #3009.)

@WalterKolczynski-NOAA WalterKolczynski-NOAA added the CI-Hercules-Ready **CM use only** PR is ready for CI testing on Hercules label Nov 21, 2024
@JessicaMeixner-NOAA
Copy link
Contributor

I've made the updates on hera and orion/hercules.

@jiandewang will have to make them on WCOSS2:

for restart in $(find  /lfs/h2/emc/couple/noscrub/Jiande.Wang/IC/HR3marine -name '*restart.uglo_m1g16'); do
  dir=$(dirname ${restart})
  target=$(basename ${restart})
  link=${target/%.uglo_m1g16/.ww3}
  cd ${dir}
  ln -s ${target} ${link}
done

@emcbot emcbot added CI-Hercules-Building **Bot use only** CI testing is cloning/building on Hercules and removed CI-Hercules-Ready **CM use only** PR is ready for CI testing on Hercules labels Nov 21, 2024
@jiandewang
Copy link
Contributor

I will do on wcoss2 shortly (but in meeting now)

@emcbot emcbot added CI-Hercules-Running **Bot use only** CI testing on Hercules for this PR is in-progress and removed CI-Hercules-Building **Bot use only** CI testing is cloning/building on Hercules labels Nov 21, 2024
@jiandewang
Copy link
Contributor

@JessicaMeixner-NOAA done on wcoss2 dev and prod machines.
@WalterKolczynski-NOAA your script works perfectly

@emcbot emcbot added CI-Hercules-Passed **Bot use only** CI testing on Hercules for this PR has completed successfully and removed CI-Hercules-Running **Bot use only** CI testing on Hercules for this PR is in-progress labels Nov 22, 2024
@emcbot
Copy link

emcbot commented Nov 22, 2024

CI Passed on Hercules in Build# 1
Built and ran in directory /work2/noaa/global/CI/HERCULES/3112


Experiment C96_S2SWA_gefs_replay_ics_f57b5801 Completed 1 Cycles: *SUCCESS* at Thu Nov 21 18:41:51 CST 2024
Experiment C48_ATM_f57b5801 Completed 2 Cycles: *SUCCESS* at Thu Nov 21 18:59:54 CST 2024
Experiment C96_atm3DVar_f57b5801 Completed 3 Cycles: *SUCCESS* at Thu Nov 21 20:07:12 CST 2024
Experiment C96C48_hybatmDA_f57b5801 Completed 3 Cycles: *SUCCESS* at Thu Nov 21 20:13:19 CST 2024
Experiment C48_S2SW_f57b5801 Completed 2 Cycles: *SUCCESS* at Thu Nov 21 20:49:49 CST 2024
Experiment C48_S2SWA_gefs_f57b5801 Completed 1 Cycles: *SUCCESS* at Thu Nov 21 21:02:27 CST 2024

@WalterKolczynski-NOAA WalterKolczynski-NOAA changed the title Fix wave restart for cold start Fix wave restart for cold start and add ic version file Nov 22, 2024
@WalterKolczynski-NOAA
Copy link
Contributor Author

Awaiting approval before running CI on the other machines.

@JessicaMeixner-NOAA
Copy link
Contributor

@WalterKolczynski-NOAA - Can you provide an overview of the wave ICs that are used in CI tests and what changes were made in staged ICs for those ICs?

The changes we made for HR3marine is fine because there was just one wave grid there, but for other tests it's unclear if we had multiple grids - and it seems like changes to wave grids were made that are now breaking other CI tests (see: #3115 (comment)).

@JessicaMeixner-NOAA
Copy link
Contributor

@WalterKolczynski-NOAA My clean test run of a C1152 case ran successfully on wcoss2 and included a wave IC. However, I have some concerns about how ICs might have been staged as it seems there are failures potentially related to these changes.

@DavidHuber-NOAA
Copy link
Contributor

@JessicaMeixner-NOAA @WalterKolczynski-NOAA It looks like the wave restarts are not staged properly on Hera, at least for the C48C48mx500 resolution. The file 20210323.120000.restart.ww3 is missing from /scratch1/NCEPDEV/global/glopara/data/ICSDIR/C48C48mx500/20240610/gefs.20210323/06/mem00 0/model/wave/restart but present on Hercules under /work/noaa/global/glopara/data/ICSDIR/C48C48mx500/20240610/gefs.20210323/06/mem000/model/wave/restart.

@JessicaMeixner-NOAA
Copy link
Contributor

Okay here are a list of CI tests that should be using WW3 ICs:

https://github.com/NOAA-EMC/global-workflow/blob/develop/ci/cases/weekly/C384_S2SWA.yaml - gfs C384 2016070100
https://github.com/NOAA-EMC/global-workflow/blob/develop/ci/cases/hires/C1152_S2SW.yaml gfs C1152 2019120300
https://github.com/NOAA-EMC/global-workflow/blob/develop/ci/cases/hires/C768_S2SW.yaml gfs C768 2019120300
https://github.com/NOAA-EMC/global-workflow/blob/develop/ci/cases/pr/C48_S2SW.yaml gfs C48 2021032312
https://github.com/NOAA-EMC/global-workflow/blob/develop/ci/cases/pr/C48_S2SWA_gefs.yaml gefs C48 2021032312
https://github.com/NOAA-EMC/global-workflow/blob/develop/ci/cases/pr/C96_S2SWA_gefs_replay_ics.yaml gefs C96 2020110100 {{ 'ICSDIR_ROOT' | getenv }}/C96mx100/20240610

We can then use this list to back-trace, what grid and what date, and where an IC for waves should be pre and post this update and make sure things are as expected.

And agreed @DavidHuber-NOAA - there do seem to be issues.

@JessicaMeixner-NOAA
Copy link
Contributor

I'm concerned we might have a conflict for ICs for
https://github.com/NOAA-EMC/global-workflow/blob/develop/ci/cases/pr/C48_S2SW.yaml gfs C48 2021032312
and
https://github.com/NOAA-EMC/global-workflow/blob/develop/ci/cases/pr/C48_S2SWA_gefs.yaml gefs C48 2021032312

as of right now I think these would be using the same IC directory, but we have 2 different wave grids, still working on details - but want to report the potential issue now.

@WalterKolczynski-NOAA
Copy link
Contributor Author

WalterKolczynski-NOAA commented Nov 22, 2024

@WalterKolczynski-NOAA - Can you provide an overview of the wave ICs that are used in CI tests and what changes were made in staged ICs for those ICs?

The changes we made for HR3marine is fine because there was just one wave grid there, but for other tests it's unclear if we had multiple grids - and it seems like changes to wave grids were made that are now breaking other CI tests (see: #3115 (comment)).

A few symlinks were inadvertently broken in places I didn't intend to modify. I've restored them.

@WalterKolczynski-NOAA
Copy link
Contributor Author

I'm concerned we might have a conflict for ICs for https://github.com/NOAA-EMC/global-workflow/blob/develop/ci/cases/pr/C48_S2SW.yaml gfs C48 2021032312 and https://github.com/NOAA-EMC/global-workflow/blob/develop/ci/cases/pr/C48_S2SWA_gefs.yaml gefs C48 2021032312

as of right now I think these would be using the same IC directory, but we have 2 different wave grids, still working on details - but want to report the potential issue now.

GFS and GEFS have different directories within that version because ${RUN}.${PDY} is part of the path.

@WalterKolczynski-NOAA
Copy link
Contributor Author

[role.glopara@hfe05 ICSDIR]$ tree C48C48mx500/20241120 -L 2 
C48C48mx500/20241120
|-- enkfgdas.20210323
|   |-- 06
|   `-- 12
|-- enkfgdas.20210324
|   |-- 06
|   `-- 12
|-- gdas.20210323
|   |-- 06
|   `-- 12
|-- gdas.20210324
|   |-- 06
|   `-- 12
|-- gefs.20210323
|   |-- 06
|   `-- 12
`-- gfs.20210323
    `-- 12

@JessicaMeixner-NOAA
Copy link
Contributor

Thanks for sharing that @WalterKolczynski-NOAA . I'm glad that the old tests should still work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI-Hercules-Passed **Bot use only** CI testing on Hercules for this PR has completed successfully
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WW3 ICs are not read for HR like experiements.
6 participants