-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[develop] Update hash of ufs weather model and input.nml/diag_table files #663
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chan-hoo Thank you very much for taking care of this! I'd suggest that the UPP be updated to the version used with the weather model. I have noted the version associated with the updated weather model hash. Once the UPP hash has been updated, I'll give my approval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chan-hoo Thank you very much for updating the UPP hash as well! I will now approve these changes.
@chan-hoo The Cheyenne GNU test failed in the Jenkins tests. All other tests successfully passed. The failure was in the |
@MichaelLueken, a gnu test on hera failed too. I'll test with the latest hash of upp again. |
@MichaelLueken, the latest hash of upp failed too. It works well with intel though. I have no idea what happens with gnu. I think we need a help from others. Do you know who we can contact to resolve this gnu issue on run_post? |
@chan-hoo Unfortunately, I'm not sure who is in charge of the UPP. During my daily stand-up this morning, I'll see if Jong might know who to contact about this issue with post and GNU compilers. |
@chan-hoo No one in EPIC's CM team seems to know who to contact about the issue. Testing on Cheyenne shows that the same GNU-based double free or corruption error messages are being seen in hash e88dbea, which is the next hash following 2b2c84a, which is the hash currently used in the SRW App. Looking at PR #1529 in the weather model, the addition of the new winter weather diagnostics were placed in different areas of the various |
@MichaelLueken, based on ufs-community/ufs-weather-model@74ec1da, we might need to update the upp control file. I'll test this and then let you know. |
It didn't work. I got the same error (double free or corruption). |
I have no idea if you are seeing the same sort of issue, but I ran into a double free issue with the UFS-WM that was happening because it was linking both the shared and static ESMF libraries during the build. You might want to check the link command for whatever executable is failing to see if something similar is happening here. |
@mark-a-potts, thank you for your comments. I hope the upp developers would fix this issue soon. :) @MichaelLueken, I tested it with the original hash (2b2c84a). It worked well. I think we'd be better not to update the hash of UPP in this PR. What do you think? If you agree, could you run the fundamental we2e tests again? |
@chan-hoo I will go ahead and resubmit the Jenkins tests using the original hash. I'll continue to see what is happening with UPP and make them aware of the issue we are encountering. |
@mark-a-potts Thanks for the suggestion! Unfortunately, looking at the linking step for upp.x for both 2b2c84a (the version of UPP currently used in develop that works with GNU) and b37f8ab (the version of UPP associated with the weather model at the updated hash), the linking is identical and ESMF isn't used. I have a few more potential avenues to visit, but this is a bit of a head scratcher. |
@chan-hoo I was finally able to find a way to get the run_post tasks to run properly using the b37f8ab hash in UPP. As part of the ufs-weather-model's update in PR #1529, there were updates to In scripts/exregional_run_post.sh, when I replaced:
with:
in other words, replacing the I'll open an issue in the UPP repo to see if something can be done in UPP develop to address the need to point to a parameter file in a different component to allow the post to run using GNU. |
@MichaelLueken, great! Thank you for finding it out. I tried adding the new parameters in the above PR you mentioned to the existing upp control file 'postxconfig-NT-fv3lam.txt' but my run failed. Maybe I missed something. BTW the fundamental tests failed again. Can you let me know which tests failed? |
@chan-hoo The tests failed on Jet. The tests were launched before the machine went down for maintenance and timed out once Jet returned to production. Once Jet is recovered from the cooling issue over night, I will resubmit the Jenkins tests one more time. I'll also do a few more tests to make sure that the replacing of the |
@chan-hoo After backing out of all modifications except for pointing to the weather model's From this comment in the weather model's PR #1529, it looks like the issue is that a new UPP variable snow liquid ratio (SDEN) wasn't added to the diag_table files. So the UPP team needed to provide a temporary solution, adding control files to the weather model that removed SDEN. So, until SDEN is added to the diag_table files, we won't be able to point to the UPP parm files (as it so happens, SDEN was added to the control files in the PR that followed the current version that works). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good to me, approving now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
@chan-hoo Following discussions from last Friday's code management meeting, we should move forward with updating the UPP hash as well. There are two variables that can control what post control file is used - Setting I can open a PR in your fork to get the necessary changes in, if you would like for me to do so. |
@MichaelLueken, please do that. I didn't attend the code managers' meeting due to the conflict with the EMC internal meeting, so I don't know the detail. |
…el via ush/config_defaults.yaml
@chan-hoo I will open the PR in your fork once I complete the fundamental tests for Intel on Cheyenne (need to make sure that the update doesn't adversely affect Intel). |
Update UPP hash and point to updated post control file in weather model via ush/config_defaults.yaml
@RatkoVasic-NOAA and @BenjaminBlake-NOAA - A few additional changes were merged into @chan-hoo's work to update the UPP hash and update the Thank you very much for your time! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
DESCRIPTION OF CHANGES:
Type of change
TESTS CONDUCTED:
WE2E tests on Hera and WCOSS2:
community_ensemble_2mems_stoch
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp_regional
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_RRFS_v1beta
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15_thompson_mynn_lam3km
MET_verification
Sample script test for AQM on Hera
hera.intel
orion.intel
cheyenne.intel
cheyenne.gnu
gaea.intel
jet.intel
wcoss2.intel
NOAA Cloud (indicate which platform)
Jenkins
fundamental test suite
comprehensive tests (specify which if a subset was used)
ISSUE:
Fixes issue mentioned in #661
CHECKLIST