Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update MOM6 templates for coordination with global-workflow updates #1979

Merged
merged 26 commits into from
Dec 7, 2023

Conversation

JessicaMeixner-NOAA
Copy link
Collaborator

@JessicaMeixner-NOAA JessicaMeixner-NOAA commented Nov 6, 2023

PR Author Checklist:

  • I have linked PR's from all sub-components involved in section below.
  • I am confirming reviews are completed in ALL sub-component PR's.
  • I have run the full RT suite on either Hera/Cheyenne AND have attached the log to this PR below this line:
  • I have added the list of all failed regression tests to "Anticipated changes" section.
  • I have filled out all sections of the template.

Description

Updates to MOM6 templates for coordination with recent changes in global-workflow for NOAA-EMC/global-workflow#1944 and for the 5 deg template file to include wave-coupling variables, update the EPBL_LANGMUIR_SCHEME value and update the output to match workflow.

Linked Issues and Pull Requests

Associated UFSWM Issue to close

Subcomponent Pull Requests

None

Blocking Dependencies

None

Subcomponents involved:

  • AQM
  • CDEPS
  • CICE
  • CMEPS
  • CMakeModules
  • FV3
  • GOCART
  • HYCOM
  • MOM6
  • NOAHMP
  • WW3
  • stochastic_physics
  • none

Anticipated Changes

Input data

  • No changes are expected to input data.
  • Changes are expected to input data:
    • New input data.
    • Updated input data.

The file on hera /scratch1/NCEPDEV/global/glopara/fix/mom6/20220805/500/oceanda_zgrid_25L.nc needs to be added to @[INPUTDATA_ROOT]/MOM6_FIX/500/

Regression Tests:

  • No changes are expected to any regression test.
  • Changes are expected to the following tests:
    Due to the change in output and the change of EPBL_LANGMUIR_SCHEME in the 5deg MOM input, the cpld_control_c48_intel will change answers.

Libraries

  • Not Needed
  • Needed
    • Create separate issue in JCSDA/spack-stack asking for update to library. Include library name, library version.
    • Add issue link from JCSDA/spack-stack following this item
Code Managers Log
  • This PR is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR.
  • Move new/updated input data on RDHPCS Hera and propagate input data changes to all supported systems.
    • N/A

Testing Log:

  • RDHPCS
    • Hera
    • Orion
    • Hercules
    • Jet
    • Gaea
    • Cheyenne
  • WCOSS2
    • Dogwood/Cactus
    • Acorn
  • CI
    • Completed
  • opnReqTest
    • N/A
    • Log attached to comment

@BrianCurtis-NOAA BrianCurtis-NOAA added Baseline Updates Current baselines will be updated. Ready for Commit Queue The PR is ready for the Commit Queue. All checkboxes in PR template have been checked. New Input Data Req'd This PR requires new data to be sync across platforms labels Nov 9, 2023
@JessicaMeixner-NOAA
Copy link
Collaborator Author

Here's the new log file from hera, comparing against a newly created baseline:
RegressionTests_hera.log

@jkbk2004
Copy link
Collaborator

jkbk2004 commented Dec 4, 2023

@JessicaMeixner-NOAA Can we delay this pr one or two days more? RRFS operation application requires to merge #2021 ASAP. This week's schedule is not that busy. So, I think we can commit this pr tomorrow or 12/06 right after Dusan's PR.

@JessicaMeixner-NOAA
Copy link
Collaborator Author

Please prioritize the work needed for RRFS and thank you for communicating the reasons for the delay.

@jkbk2004
Copy link
Collaborator

jkbk2004 commented Dec 6, 2023

@JessicaMeixner-NOAA can you sync up branch? we can working on this pr.

@JessicaMeixner-NOAA
Copy link
Collaborator Author

@JessicaMeixner-NOAA can you sync up branch? we can working on this pr.

@jkbk2004 this branch has been sync'd.

@zach1221 zach1221 added the jenkins-ci Jenkins CI: ORT build/test on docker container label Dec 7, 2023
@epic-cicd-jenkins
Copy link
Collaborator

Jenkins-ci ORTs passed

@zach1221
Copy link
Collaborator

zach1221 commented Dec 7, 2023

Testing is complete, let's move to review.

jkbk2004
jkbk2004 previously approved these changes Dec 7, 2023
@zach1221
Copy link
Collaborator

zach1221 commented Dec 7, 2023

@JessicaMeixner-NOAA looks like there is one outdated conversation. Can you resolve it for us, please?

@DeniseWorthen
Copy link
Collaborator

DeniseWorthen commented Dec 7, 2023

I don't understand the hercules log. Why is control_wrtGauss_netcdf_parallel_intel showing up as test 235? It should be around test #30. It also shows up as a different rt_ directory. Is this a appended log? If so, the original log showing test failures should not be removed. If a second log is appended, it should clearly indicate that it is from a re-run.

Test 235 control_wrtGauss_netcdf_parallel_intel PASS

The test seems to have failed at the initial run, but does not show as a "failure"

baseline dir = /work/noaa/epic/hercules/UFS-WM_RT/NEMSfv3gfs/develop-20231206/control_wrtGauss_netcdf_parallel_intel
working dir = /work2/noaa/stmp/zshrader/stmp/zshrader/FV3_RT/rt_1844383/control_wrtGauss_netcdf_parallel_intel
Checking test 034 control_wrtGauss_netcdf_parallel_intel results ....
Comparing sfcf000.nc .........OK
Comparing sfcf024.nc .........OK
Comparing atmf000.nc .........OK
Comparing atmf024.nc ............ALT CHECK......ERROR

The RT seems to simply skip the reporting line? There should be a line like "Test 034 control_wrtGauss_netcdf_parallel_intel FAIL"

@JessicaMeixner-NOAA
Copy link
Collaborator Author

@JessicaMeixner-NOAA looks like there is one outdated conversation. Can you resolve it for us, please?

done

@zach1221
Copy link
Collaborator

zach1221 commented Dec 7, 2023

I don't understand the hercules log. Why is control_wrtGauss_netcdf_parallel_intel showing up as test 235? It should be around test #30. It also shows up as a different rt_ directory. Is this a appended log? If so, the original log showing test failures should not be removed. If a second log is appended, it should clearly indicate that it is from a re-run.

Test 235 control_wrtGauss_netcdf_parallel_intel PASS

The test seems to have failed at the initial run, but does not show as a "failure"

baseline dir = /work/noaa/epic/hercules/UFS-WM_RT/NEMSfv3gfs/develop-20231206/control_wrtGauss_netcdf_parallel_intel
working dir = /work2/noaa/stmp/zshrader/stmp/zshrader/FV3_RT/rt_1844383/control_wrtGauss_netcdf_parallel_intel
Checking test 034 control_wrtGauss_netcdf_parallel_intel results ....
Comparing sfcf000.nc .........OK
Comparing sfcf024.nc .........OK
Comparing atmf000.nc .........OK
Comparing atmf024.nc ............ALT CHECK......ERROR

The RT seems to simply skip the reporting line? There should be a line like "Test 034 control_wrtGauss_netcdf_parallel_intel FAIL"

Yes, it's the nccmp failure on Hercules. I have to run that case a second time to get it to pass. I did clean up the combined logs afterwards, so going forward I won't.

@DeniseWorthen
Copy link
Collaborator

@zach1221 OK, thanks. I'm also concerned though that there is an actual problem w/ the RT failure detection. I assume you didn't remove the line that reported the original test 034 failed "Test 034 control_wrtGauss_netcdf_parallel_intel FAIL"?

@zach1221
Copy link
Collaborator

zach1221 commented Dec 7, 2023

@zach1221 OK, thanks. I'm also concerned though that there is an actual problem w/ the RT failure detection. I assume you didn't remove the line that reported the original test 034 failed "Test 034 control_wrtGauss_netcdf_parallel_intel FAIL"?

No I didn't remove the actual failed 034 control_wrtGauss_netcdf_parallel_intel results piece, that should still in there.

@DeniseWorthen
Copy link
Collaborator

@junwang-noaa It seems we do have an issue w/ failure detection. It seems that when we get the "ALT CHECK ERROR" failure, the job is not always correctly reported.

@zach1221
Copy link
Collaborator

zach1221 commented Dec 7, 2023

Ok re-pushed the original logs without any edits.

@DeniseWorthen
Copy link
Collaborator

@zach1221 Thanks for posting the original logs. I see that at the end, the test 034 is noted as failed, but at the point where the files are being compared, it doesn't catch the fact that the comparison gives an error. It seems like after this line

Comparing atmf024.nc ............ALT CHECK......ERROR

We should get a report, something like

Test 034 control_wrtGauss_netcdf_parallel_intel FAIL

Copy link
Collaborator

@jkbk2004 jkbk2004 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve as new Hercules log has enough information for the failed case.

@zach1221
Copy link
Collaborator

zach1221 commented Dec 7, 2023

@zach1221 Thanks for posting the original logs. I see that at the end, the test 034 is noted as failed, but at the point where the files are being compared, it doesn't catch the fact that the comparison gives an error. It seems like after this line

Comparing atmf024.nc ............ALT CHECK......ERROR

We should get a report, something like

Test 034 control_wrtGauss_netcdf_parallel_intel FAIL

Ok, yah that's not happening it looks like. I can create a issue for it.

@zach1221 zach1221 merged commit 5be4f33 into ufs-community:develop Dec 7, 2023
@@ -106,6 +106,8 @@ export CHLCLIM=''
export FRUNOFF=''
export MOM6_RIVER_RUNOFF=False
export MOM6_RESTART_SETTING=r
export MOM6_DIAG_COORD_DEF_Z_FILE=oceanda_zgrid_25L.nc
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JessicaMeixner-NOAA Does this MOM6_DIAG_COORD_DEF_Z_FILE need to be added to the cpld_warmstart_c48 and cpld_restart_c48? Those tests are currently commented out (I need to recreate the warmstart files that they use). Will we need this variable set for those two tests?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DeniseWorthen good catch, yes you will want to set that for the other C48 tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Baseline Updates Current baselines will be updated. jenkins-ci Jenkins CI: ORT build/test on docker container New Input Data Req'd This PR requires new data to be sync across platforms Ready for Commit Queue The PR is ready for the Commit Queue. All checkboxes in PR template have been checked.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Updating MOM_input_template_500 with wave coupling configurations
8 participants