-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update CMEPS to allow bilinear ATM<->WAV mapping for global coupled application; utilize custom restart names for WW3 (was #1684) #1692
update CMEPS to allow bilinear ATM<->WAV mapping for global coupled application; utilize custom restart names for WW3 (was #1684) #1692
Conversation
* set configuration variable true to use non-default restart file names in WW3 * change name of WW3 used for restart tests * change name of WW3 in hafs wav tests
Sure! @zach1221 @BrianCurtis-NOAA this pr replaces #1685. I am adding new bl date. |
@DeniseWorthen two cases fail on jet: hafs_regional_datm_cdeps and regional_noquilt. I think regional_noquilt time out issue. But hafs_regional_datm_cdeps shows something hycom side in out file: error in zaiopf - can't open unit 13. can you check /lfs4/HFIP/h-nems/Jong.Kim/RT_RUNDIRS/Jong.Kim/FV3_RT/rt_52646/hafs_regional_datm_cdeps/out ? All ran ok on hera though. |
Please see jenkins-ci logs attached. ORTs passed. |
@jkbk2004 I will look, but I suspect a jet system issue. This PR does not touch anything in hycom or cdeps. |
There is a timeout message in the hafs_regional_datm_cdeps err log also. It seems to have hung. But I will run the test independently to check. |
I agree the time out case usually ended up with that hycom error message. Same thing occasionally happens even with develop branch. I think we need to turn those cases off on jet. |
We can't keep turning off tests that don't work. The issues need to be addressed with the machine admins and/or fixes in the UFSWM. |
@BrianCurtis-NOAA It does seem that jet is a flakey platform in general and it is difficult for either us or sys admins to debug intermittent issues. Generally we have the "2 tries" but even that doesn't seem to be enough for jet. |
I don't think this causes the time-outs, but we should reduce the debug flags here. There is no reason for those to be anything other than 0 as a default. The higher settings simply report min/max values for import/export states for example. They do not actually turn on any sort of compiler-debug options.
|
The current jet log shows this test taking almost 24 minutes. On Cheyenne.intel, it takes a 21 minutes. I would suggest we reduce the fhmax for this test to 12 or even 6. It is currently 24 but all the other HAFS tests are only 6 hours. |
Sounds like consistent 6 hour across hafs tests makes a sense. same way for regional_noquilt, right? |
I think reducing fhmax may work sometimes, but I just ran a the hafs_regional_datm_cdeps case on Jet w/ fhmax=12 and it didn't even finish 12 hours. It was a half-hour short when it ran out of wall clock. I just think there are just system issues w/ Jet. |
*regional_noquilt and hafs_regional_datm_cdeps are timing out *need to resolve issue separately
On jet, these 5 cases are continuously facing the time-out issue: regional_noquilt hafs_regional_datm_cdeps regional_wofs regional_atmaq regional_atmaq_faster. @DeniseWorthen @BrianCurtis-NOAA sounds like turning off all regional aqm cases on jet, too much? Other than that, Jet RT log is available to push. |
I already pushed a commit to remove regional_noquilt and hafs_regional_datm_cdeps for jet.intel. Now the other regional tests are timing-out? |
Yes, I keep seeing those 3 other cases continue to hit the time limit. I vote to turn those off if possible. We will create an issue and revisit the issue along with that. |
I will push a commit to remove regional_wofs, regional_atmaq, regional_atmaq_faster from jet.intel. |
* turn off regional_wofs, regional_atmaq and regional_atmaq_faster on jet.intel
I am writing an issue to address the jet time-out cases. We can start merging process. @DeniseWorthen @BrianCurtis-NOAA Can you go ahead to merge in CMEPS PR? |
issue was created #1695 |
All set! @BrianCurtis-NOAA @SadeghTabas-NOAA please, go ahead to approve the pr. |
Description
Changes the mapping of state fields between ATM and WAV for the coupled model to use bilinear with nearest-source-to-destination filling.
A test case was run using the
cpld_control_p8
test for 24 hours with and w/o this change. The following figure shows the difference inU10M
along ~62S, from 90W:40W imported by the WAV model with the current mapping (mapnstod_consf
, black line) vs the change in this PR (mapbilnr_nstod
, red line).The impact on the
Z0
imported by the ATM at ~0.5S,6E (on the coast of Brazil, where a large difference inZ0
is seen) for the current mapping (red) vs this PR (black) is shown belowIn the bmark test, the difference in
Z0
imported by the ATM on tile 1 (scaled by 1.0e4) after 6 hours is shown belowTop of commit queue on: TBD
Input data additions/changes
Anticipated changes to regression tests:
This will change baselines for all coupled tests using ATM-WAV coupling. HAFS wave tests do not change since their mapping is mapfillv_bilnr and does not change.
Full RTs on cheyenne show the following:
GNU:
INTEL:
RegressionTests_cheyenne.gnu.log
RegressionTests_cheyenne.intel.log
Subcomponents involved:
Combined with PR's (If Applicable):
Commit Queue Checklist:
Linked PR's and Issues:
Testing Day Checklist:
Testing Log (for CM's):