update restart tests for coupled model #316

DeniseWorthen · 2020-12-03T13:36:08Z

Description

After the merge of PR #304, we should update and/or modify existing restart tests for the coupled model.

Solution

This Issue has several aspects, some of which still need to be decided, so I consider this issue open for discussion.

I believe that at a minimum, we should

retire the c96mx025 restart test. This was carried over from ufs-s2s and should be replaced by a c96mx100 restart test if we want to retain a low resolution restart test.
implement where possible checkpoint-restarting for the restart tests, reducing by one the number of tests that need to be run.
implement a restart test for frac_grid.
implement an 'overlap' restart test, meaning the the test will overlap the end of one day. Such a test would be for example a restart from hour 12, running for 36h and comparing to a continuous 48h forecast (12h/36h/48h). This references existing Issue add coupled model restart test overlapping 24 hr time boundary #293. If this is implemented is a 'non-overlap' restart still required (12h/12h/24h)?

What is not clear yet to me is which resolutions should be tested for restart and how.

The benchmark+frac_grid configuration is the closest to what will be implemented, however it is also our most resource intensive test and we cannot include waves in a restart test at this point. Eventually this would also need to be a L127 test.
If we have a benchmark+frac_grid restart test, are other restart tests (c96mx100) still required?

junwang-noaa · 2020-12-03T13:59:42Z

I'd suggest to set up restart test with C96mx100 using the benchmarch+frac_grid configuration (except the resolution). If we have this setting I won't expect the restart will not work at high resolution benchmark+frac_grid test.

JessicaMeixner-NOAA · 2020-12-03T14:19:50Z

For MOM6, the set-up is very different at 1deg versus 1/4 deg. Therefore, there are many aspects of the code that would be used operationally that would not be tested to really let us know about restart reproducibility in context of MOM6 if the restart test is only at 1deg. I know there is a desire to make tests as small and short as possible, but this is likely not sufficient for MOM6 testing of restarts. @jiandewang can provide more specific details if required.

junwang-noaa · 2020-12-03T14:34:35Z

@JessicaMeixner-NOAA @jiandewang can you provide information on what are the features used in benchmark, but can not be used for low resolutions? Also would those features impact the coupled model in terms of model interface for coupled model? Can high resolution standalone MOM6 tests cover these feature testing including restart reproducibility? I am asking because ufs currently support 4 applications, so we do want to get fast RT turnaround time to avoid delays.

DeniseWorthen · 2020-12-07T16:14:41Z

Currently, the plan is to:

remove the c96mx025 12h/12h/1d restart test and replace it with a c96mx100 12h/36h/48h test. Using checkpoint restarts, this will require two tests: a 48h test with restarts written at 12h intervals and a restart test from the first 12h restart integrating for 36h.
A restart test for the benchmark configuration using 3h/3h/6h. We don't current test the benchmark configuration at 6h so this will require new tests. The other option would be to change the current benchmark test from 1d to 6h. This would also reduce the time required to run the benchmark tests.
Implement a fractional grid restart test matching what is done for (2) above.

DeniseWorthen · 2020-12-11T15:44:47Z

I have a branch where I've implemented the above items as well as added in Shan's frac grid bmark wave tests from her PR #326 (including options to use L127 input).

For 2), I changed the default time for the cpld_bmark test to 6 hours (from 1d) and used that for the 3h/3h/6h restart test. I think we should probably also reduce the length to 6hours for both the exisiting bmark_wave test and the new fractional grid bmark_wave test. These are really long tests and I'm not sure we gain anything by testing 24hrs vs 6hrs.

I've also implemented a 12h/36h/48h restart test at c192mx050 for the frac grid. My idea was that the physics of the 1/2 deg MOM6 is most similar to the 1/4deg MOM6 according to @jiandewang. This is at least a frac grid long restart test although not at the resolution of the bmark.

I also added a debug test for frac grid (c96mx100).

The current number of cpld tests we actually run is 14. The new count is 19; If we set all the bmark tests to 6hours that would help.

junwang-noaa · 2020-12-11T16:37:38Z

How long does the bmark_wave test take? In general we hope all tests can be finished within half an hour.

…

On Fri, Dec 11, 2020 at 10:45 AM Denise Worthen ***@***.***> wrote: I have a branch <https://github.com/DeniseWorthen/ufs-weather-model/tree/feature/update_restarts_fracgrid> where I've implemented the above items as well as added in Shan's frac grid bmark wave tests from her PR #326 <#326> (including options to use L127 input). For 2), I changed the default time for the cpld_bmark test to 6 hours (from 1d) and used that for the 3h/3h/6h restart test. I think we should probably also reduce the length to 6hours for both the exisiting bmark_wave test and the new fractional grid bmark_wave test. These are really long tests and I'm not sure we gain anything by testing 24hrs vs 6hrs. I've also implemented a 12h/36h/48h restart test at c192mx050 for the frac grid. My idea was that the physics of the 1/2 deg MOM6 is most similar to the 1/4deg MOM6 according to @jiandewang <https://github.com/jiandewang>. This is at least a frac grid long restart test although not at the resolution of the bmark. I also added a debug test for frac grid (c96mx100). The current number of cpld tests we actually run is 14. The new count is 19; If we set all the bmark tests to 6hours that would help. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#316 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AI7D6TOOB2LDN23S4PFLMT3SUI477ANCNFSM4UL7TVOQ> .

DeniseWorthen · 2020-12-11T18:57:49Z

On orion, I get:

cpld_control_c192 (1d) : 3min, 288 PE
cpld_control_c384 (1d) : 21min, 318 PE

cpld_bmark (1day): 13min, 480 PE
cpld_bmark_wave (1d) : 23min, 520 PE

DeniseWorthen added the enhancement New feature or request label Dec 3, 2020

DeniseWorthen self-assigned this Dec 3, 2020

DeniseWorthen mentioned this issue Dec 10, 2020

Adding a regression test "cpld_bmark_wave_frac" #322

Closed

DeniseWorthen mentioned this issue Dec 21, 2020

Add checkpoint restarts for ufs-cpld #342

Merged

DeniseWorthen mentioned this issue Jan 4, 2021

add frac grid input, update and add additional cpld tests #354

Merged

DeniseWorthen linked a pull request Jan 5, 2021 that will close this issue

add frac grid input, update and add additional cpld tests #354

Merged

DeniseWorthen closed this as completed in #354 Jan 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update restart tests for coupled model #316

update restart tests for coupled model #316

DeniseWorthen commented Dec 3, 2020

junwang-noaa commented Dec 3, 2020

JessicaMeixner-NOAA commented Dec 3, 2020

junwang-noaa commented Dec 3, 2020

DeniseWorthen commented Dec 7, 2020

DeniseWorthen commented Dec 11, 2020

junwang-noaa commented Dec 11, 2020 via email

DeniseWorthen commented Dec 11, 2020

update restart tests for coupled model #316

update restart tests for coupled model #316

Comments

DeniseWorthen commented Dec 3, 2020

Description

Solution

junwang-noaa commented Dec 3, 2020

JessicaMeixner-NOAA commented Dec 3, 2020

junwang-noaa commented Dec 3, 2020

DeniseWorthen commented Dec 7, 2020

DeniseWorthen commented Dec 11, 2020

junwang-noaa commented Dec 11, 2020 via email

DeniseWorthen commented Dec 11, 2020