Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to spack-stack version 1.6.0 #856

Merged
merged 7 commits into from
Jan 29, 2024

Conversation

DavidHuber-NOAA
Copy link
Collaborator

This upgrades the libraries used by the UPP to those built with the recent release of spack-stack version 1.6.0. The libraries upgraded include

netCDF-Fortran v4.6.0 -> v4.6.1
CRTM v2.4.0 -> v2.4.0.1
sp v2.3.3 -> v2.5.0

The regression tests were run on Hera, but produced different results. Per a discussion with @WenMeng-NOAA, changes will need to be made to the job cards. Thus I am submitting this as a draft to allow that to happen.

@WenMeng-NOAA
Copy link
Collaborator

@AlexanderRichert-NOAA Could you please review the changes of ci/spack.yaml?

@WenMeng-NOAA
Copy link
Collaborator

@FernandoAndrade-NOAA Please review and test the changes of modulefiles for R&D platforms.

@FernandoAndrade-NOAA
Copy link
Collaborator

@FernandoAndrade-NOAA Please review and test the changes of modulefiles for R&D platforms.

The changes in modulefiles look good to me on initial review. I can verify RTs on Hera and Orion. Hercules I can reattempt a full rebuild of baselines with 1.6.0, perhaps that may resolve previous issues encountered. Jet I can test UPP build.

Thanks for working on this, I'll get started on updating and running job card scripts.

@FernandoAndrade-NOAA
Copy link
Collaborator

@DavidHuber-NOAA please go ahead and resync your branch with develop. Could you provide a location for your RT results on Hera?

@WenMeng-NOAA
Copy link
Collaborator

@DavidHuber-NOAA please go ahead and resync your branch with develop. Could you provide a location for your RT results on Hera?

@FernandoAndrade-NOAA There will be updates of job cards in the UPP RTs on Hera and Orion with this PR. Could you start with one test (any model) to evaluate what efforts are needed?

@DavidHuber-NOAA
Copy link
Collaborator Author

@FernandoAndrade-NOAA The branch has been updated.

I ran the regression tests here: /scratch1/NCEPDEV/stmp2/David.Huber/upp-HERA. I'm not very familiar with them, so I did not try to update the cards, but you are welcome to take a look if you like.

@DavidHuber-NOAA
Copy link
Collaborator Author

@AlexanderRichert-NOAA Good catch, I will try reverting that. For context, I had had problems with the concretizer flagging sp 2.3.3 and 2.5.0 for installation and tried to do this to resolve it, but the real fix was upgrading ip to 4.3.0.

@FernandoAndrade-NOAA
Copy link
Collaborator

@DavidHuber-NOAA please go ahead and resync your branch with develop. Could you provide a location for your RT results on Hera?

@FernandoAndrade-NOAA There will be updates of job cards in the UPP RTs on Hera and Orion with this PR. Could you start with one test (any model) to evaluate what efforts are needed?

Sure thing, I can start with the fv3hafs case if that's ok with you? I can compare between machines and then to David's results with my Hera run. Are you expecting any additional efforts beyond path updates resulting from the library updates?

@WenMeng-NOAA WenMeng-NOAA linked an issue Jan 23, 2024 that may be closed by this pull request
@FernandoAndrade-NOAA
Copy link
Collaborator

@DavidHuber-NOAA please go ahead and resync your branch with develop. Could you provide a location for your RT results on Hera?

@FernandoAndrade-NOAA There will be updates of job cards in the UPP RTs on Hera and Orion with this PR. Could you start with one test (any model) to evaluate what efforts are needed?

Sure thing, I can start with the fv3hafs case if that's ok with you? I can compare between machines and then to David's results with my Hera run. Are you expecting any additional efforts beyond path updates resulting from the library updates?

I believe I chose one of the very few tests that would not change with this update after comparing my results with David's for the hafs test case, I'll run for rtma and fv3r instead and see if there's any hiccups in testing beyond the expected 1.6.0 changes. I did not see any issues arise during the hafs test case @WenMeng-NOAA FYI

@WenMeng-NOAA
Copy link
Collaborator

@FernandoAndrade-NOAA That's great. Can you share with your updates in job cards?

@FernandoAndrade-NOAA
Copy link
Collaborator

FernandoAndrade-NOAA commented Jan 23, 2024

@FernandoAndrade-NOAA That's great. Can you share with your updates in job cards?

Sure thing they're available here:
/scratch2/NAGAPE/epic/Fernando.Andrade-maldonado/regression-tests/upp/spack-1.6.0/sample-test-suite/UPP/ci/work-upp-HERA

The primary updates were to the spack-stack path and module versions loaded as the job cards currently have their own explicit load commands. I will run the full RTs to compare all results and if that looks good I'll move onto making the appropriate changes on Orion. I will rerun full RTs on Hercules as well on the chance that this update resolves previous issues with testing on that machine.

@FernandoAndrade-NOAA
Copy link
Collaborator

@WenMeng-NOAA I did not observe any changes in the full RT run on Hera if you could please verify there is no issue with my run and modifications to my job cards within the work directory at /scratch2/NAGAPE/epic/Fernando.Andrade-maldonado/regression-tests/upp/spack-1.6.0/sample-test-suite/UPP/ci. Are changes expected or would this potentially be due to the discrepancy between the differing versions of the build vs the job cards themselves?

@DavidHuber-NOAA 's initial RT run without updates to the job cards would have run the tests with crtm 2.4.0 and prod_util 1.2.2

@WenMeng-NOAA
Copy link
Collaborator

@FernandoAndrade-NOAA So from your testing, there is no baseline change, right? That's great.

@FernandoAndrade-NOAA
Copy link
Collaborator

@FernandoAndrade-NOAA So from your testing, there is no baseline change, right? That's great.

Right, I'm not seeing any errors either in the Hera run. If everything looks good to you in that Hera rundir and work directory, I'll go ahead and move onto making and testing the appropriate changes on Orion and Hercules as well. Thanks!

@WenMeng-NOAA
Copy link
Collaborator

@FernandoAndrade-NOAA Please work on testing on Orion. Thanks!

Copy link
Collaborator

@FernandoAndrade-NOAA FernandoAndrade-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a slight update needed for prod_util since it renamed from prod-util

ci/rt.sh Outdated Show resolved Hide resolved
ci/rt.sh Outdated Show resolved Hide resolved
@WenMeng-NOAA
Copy link
Collaborator

@DavidHuber-NOAA Can you sync your branch with the UPP develop?

@DavidHuber-NOAA
Copy link
Collaborator Author

@WenMeng-NOAA Done. A test build on Hercules was successful.

@WenMeng-NOAA WenMeng-NOAA added enhancement New feature or request Ready for Review This PR is ready for code review. labels Jan 26, 2024
@WenMeng-NOAA
Copy link
Collaborator

The UPP RTs were completed on WCOSS2 without changed results.

@FernandoAndrade-NOAA
Copy link
Collaborator

FernandoAndrade-NOAA commented Jan 26, 2024

@WenMeng-NOAA I've rerun Hera after the sync, no changes in results. The build on Jet looks good with no errors. Hercules baselines will be fully recreated. I will create an issue as well for transitioning Gaea's configuration to the new F5 file system as the F2 file system will be retired soon and has already been disconnected from C5 nodes. The ufs-wm side is currently working on getting that back up and running. I'm seeing changes in the following test cases on Orion:
/work2/noaa/epic/nandoam/regression-testing/upp/orion/spack-1.6.0/UPP-update/ci/rundir/upp-ORION
fv3r NATLEV:

1549:1287717451:SBTA167:top of atmosphere:rpn_corr=0.999892:rpn_rms=0.199723
1550:1289453136:SBTA168:top of atmosphere:rpn_corr=0.999907:rpn_rms=0.132669
1551:1291087736:SBTA169:top of atmosphere:rpn_corr=0.999937:rpn_rms=0.116991
1552:1292798746:SBTA1610:top of atmosphere:rpn_corr=0.999802:rpn_rms=0.279467
1553:1294559367:SBTA1611:top of atmosphere:rpn_corr=0.999924:rpn_rms=0.193692
1554:1296068641:SBTA1612:top of atmosphere:rpn_corr=0.999893:rpn_rms=0.203959
1555:1298169744:SBTA1613:top of atmosphere:rpn_corr=0.999993:rpn_rms=0.0955402
1556:1299719172:SBTA1614:top of atmosphere:rpn_corr=0.999991:rpn_rms=0.0813804
1557:1301274228:SBTA1615:top of atmosphere:rpn_corr=0.999981:rpn_rms=0.145978
1558:1302790732:SBTA1616:top of atmosphere:rpn_corr=0.999662:rpn_rms=1.76369
1559:1304859744:var discipline=3 center=7 local_table=1 parmcat=192 parm=77:top of atmosphere:rpn_corr=0.999941:rpn_rms=0.140322
1560:1306554133:var discipline=3 center=7 local_table=1 parmcat=192 parm=78:top of atmosphere:rpn_corr=0.999927:rpn_rms=0.150794
1561:1308331826:var discipline=3 center=7 local_table=1 parmcat=192 parm=79:top of atmosphere:rpn_corr=0.999751:rpn_rms=0.373174
1562:1310156922:var discipline=3 center=7 local_table=1 parmcat=192 parm=80:top of atmosphere:rpn_corr=0.999964:rpn_rms=0.134193
1563:1311746308:var discipline=3 center=7 local_table=1 parmcat=192 parm=81:top of atmosphere:rpn_corr=0.999888:rpn_rms=0.237051
1564:1313806734:var discipline=3 center=7 local_table=1 parmcat=192 parm=82:top of atmosphere:rpn_corr=0.999974:rpn_rms=0.181468
1565:1315433161:var discipline=3 center=7 local_table=1 parmcat=192 parm=83:top of atmosphere:rpn_corr=0.99998:rpn_rms=0.143003
1566:1317060626:var discipline=3 center=7 local_table=1 parmcat=192 parm=84:top of atmosphere:rpn_corr=0.999964:rpn_rms=0.21097
1567:1318644554:var discipline=3 center=7 local_table=1 parmcat=192 parm=85:top of atmosphere:rpn_corr=0.999174:rpn_rms=1.93921

rtma NATLEV similarly:

1549:1240207509:SBTA167:top of atmosphere:rpn_corr=0.999947:rpn_rms=0.21822
1550:1241941835:SBTA168:top of atmosphere:rpn_corr=0.999819:rpn_rms=0.14603
1551:1243395108:SBTA169:top of atmosphere:rpn_corr=0.999919:rpn_rms=0.10217
1552:1244957262:SBTA1610:top of atmosphere:rpn_corr=0.99933:rpn_rms=0.40839
1553:1246655219:SBTA1611:top of atmosphere:rpn_corr=0.999954:rpn_rms=0.180672
1554:1248964985:SBTA1612:top of atmosphere:rpn_corr=0.999951:rpn_rms=0.177197
1555:1251058886:SBTA1613:top of atmosphere:rpn_corr=0.999996:rpn_rms=0.0861723
1556:1253404236:SBTA1614:top of atmosphere:rpn_corr=0.999997:rpn_rms=0.0534457
1557:1255761304:SBTA1615:top of atmosphere:rpn_corr=0.999991:rpn_rms=0.110948
1558:1258095786:SBTA1616:top of atmosphere:rpn_corr=0.999548:rpn_rms=1.84363
1559:1260202436:var discipline=3 center=7 local_table=1 parmcat=192 parm=77:top of atmosphere:rpn_corr=0.99992:rpn_rms=0.101186
1560:1261757941:var discipline=3 center=7 local_table=1 parmcat=192 parm=78:top of atmosphere:rpn_corr=0.999927:rpn_rms=0.119447
1561:1263417497:var discipline=3 center=7 local_table=1 parmcat=192 parm=79:top of atmosphere:rpn_corr=0.999561:rpn_rms=0.473525
1562:1265220652:var discipline=3 center=7 local_table=1 parmcat=192 parm=80:top of atmosphere:rpn_corr=0.999965:rpn_rms=0.149798
1563:1267605122:var discipline=3 center=7 local_table=1 parmcat=192 parm=81:top of atmosphere:rpn_corr=0.999922:rpn_rms=0.219417
1564:1269770080:var discipline=3 center=7 local_table=1 parmcat=192 parm=82:top of atmosphere:rpn_corr=0.999976:rpn_rms=0.171498
1565:1272187370:var discipline=3 center=7 local_table=1 parmcat=192 parm=83:top of atmosphere:rpn_corr=0.999974:rpn_rms=0.149857
1566:1274610935:var discipline=3 center=7 local_table=1 parmcat=192 parm=84:top of atmosphere:rpn_corr=0.999947:rpn_rms=0.231436
1567:1277005374:var discipline=3 center=7 local_table=1 parmcat=192 parm=85:top of atmosphere:rpn_corr=0.998841:rpn_rms=2.10333

gfs t00z.special.grb2f006:

1:0:SBTAGR8:top of atmosphere:rpn_corr=0.999985:rpn_rms=0.0999146
2:4028510:SBTAGR9:top of atmosphere:rpn_corr=0.999989:rpn_rms=0.0926061
3:8285304:SBTAGR10:top of atmosphere:rpn_corr=0.999886:rpn_rms=0.239585
4:12861832:SBTAGR13:top of atmosphere:rpn_corr=0.999997:rpn_rms=0.0554007

@WenMeng-NOAA
Copy link
Collaborator

@WenMeng-NOAA I've rerun Hera after the sync, no changes in results. The build on Jet looks good with no errors. Hercules baselines will be fully recreated. I will create an issue as well for transitioning Gaea's configuration to the new F5 file system as the F2 file system will be retired soon and has already been disconnected from C5 nodes. The ufs-wm side is currently working on getting that back up and running. I'm seeing changes in the following test cases on Orion: /work2/noaa/epic/nandoam/regression-testing/upp/orion/spack-1.6.0/UPP-update/ci/rundir/upp-ORION fv3r NATLEV:

1549:1287717451:SBTA167:top of atmosphere:rpn_corr=0.999892:rpn_rms=0.199723
1550:1289453136:SBTA168:top of atmosphere:rpn_corr=0.999907:rpn_rms=0.132669
1551:1291087736:SBTA169:top of atmosphere:rpn_corr=0.999937:rpn_rms=0.116991
1552:1292798746:SBTA1610:top of atmosphere:rpn_corr=0.999802:rpn_rms=0.279467
1553:1294559367:SBTA1611:top of atmosphere:rpn_corr=0.999924:rpn_rms=0.193692
1554:1296068641:SBTA1612:top of atmosphere:rpn_corr=0.999893:rpn_rms=0.203959
1555:1298169744:SBTA1613:top of atmosphere:rpn_corr=0.999993:rpn_rms=0.0955402
1556:1299719172:SBTA1614:top of atmosphere:rpn_corr=0.999991:rpn_rms=0.0813804
1557:1301274228:SBTA1615:top of atmosphere:rpn_corr=0.999981:rpn_rms=0.145978
1558:1302790732:SBTA1616:top of atmosphere:rpn_corr=0.999662:rpn_rms=1.76369
1559:1304859744:var discipline=3 center=7 local_table=1 parmcat=192 parm=77:top of atmosphere:rpn_corr=0.999941:rpn_rms=0.140322
1560:1306554133:var discipline=3 center=7 local_table=1 parmcat=192 parm=78:top of atmosphere:rpn_corr=0.999927:rpn_rms=0.150794
1561:1308331826:var discipline=3 center=7 local_table=1 parmcat=192 parm=79:top of atmosphere:rpn_corr=0.999751:rpn_rms=0.373174
1562:1310156922:var discipline=3 center=7 local_table=1 parmcat=192 parm=80:top of atmosphere:rpn_corr=0.999964:rpn_rms=0.134193
1563:1311746308:var discipline=3 center=7 local_table=1 parmcat=192 parm=81:top of atmosphere:rpn_corr=0.999888:rpn_rms=0.237051
1564:1313806734:var discipline=3 center=7 local_table=1 parmcat=192 parm=82:top of atmosphere:rpn_corr=0.999974:rpn_rms=0.181468
1565:1315433161:var discipline=3 center=7 local_table=1 parmcat=192 parm=83:top of atmosphere:rpn_corr=0.99998:rpn_rms=0.143003
1566:1317060626:var discipline=3 center=7 local_table=1 parmcat=192 parm=84:top of atmosphere:rpn_corr=0.999964:rpn_rms=0.21097
1567:1318644554:var discipline=3 center=7 local_table=1 parmcat=192 parm=85:top of atmosphere:rpn_corr=0.999174:rpn_rms=1.93921

rtma NATLEV similarly:

1549:1240207509:SBTA167:top of atmosphere:rpn_corr=0.999947:rpn_rms=0.21822
1550:1241941835:SBTA168:top of atmosphere:rpn_corr=0.999819:rpn_rms=0.14603
1551:1243395108:SBTA169:top of atmosphere:rpn_corr=0.999919:rpn_rms=0.10217
1552:1244957262:SBTA1610:top of atmosphere:rpn_corr=0.99933:rpn_rms=0.40839
1553:1246655219:SBTA1611:top of atmosphere:rpn_corr=0.999954:rpn_rms=0.180672
1554:1248964985:SBTA1612:top of atmosphere:rpn_corr=0.999951:rpn_rms=0.177197
1555:1251058886:SBTA1613:top of atmosphere:rpn_corr=0.999996:rpn_rms=0.0861723
1556:1253404236:SBTA1614:top of atmosphere:rpn_corr=0.999997:rpn_rms=0.0534457
1557:1255761304:SBTA1615:top of atmosphere:rpn_corr=0.999991:rpn_rms=0.110948
1558:1258095786:SBTA1616:top of atmosphere:rpn_corr=0.999548:rpn_rms=1.84363
1559:1260202436:var discipline=3 center=7 local_table=1 parmcat=192 parm=77:top of atmosphere:rpn_corr=0.99992:rpn_rms=0.101186
1560:1261757941:var discipline=3 center=7 local_table=1 parmcat=192 parm=78:top of atmosphere:rpn_corr=0.999927:rpn_rms=0.119447
1561:1263417497:var discipline=3 center=7 local_table=1 parmcat=192 parm=79:top of atmosphere:rpn_corr=0.999561:rpn_rms=0.473525
1562:1265220652:var discipline=3 center=7 local_table=1 parmcat=192 parm=80:top of atmosphere:rpn_corr=0.999965:rpn_rms=0.149798
1563:1267605122:var discipline=3 center=7 local_table=1 parmcat=192 parm=81:top of atmosphere:rpn_corr=0.999922:rpn_rms=0.219417
1564:1269770080:var discipline=3 center=7 local_table=1 parmcat=192 parm=82:top of atmosphere:rpn_corr=0.999976:rpn_rms=0.171498
1565:1272187370:var discipline=3 center=7 local_table=1 parmcat=192 parm=83:top of atmosphere:rpn_corr=0.999974:rpn_rms=0.149857
1566:1274610935:var discipline=3 center=7 local_table=1 parmcat=192 parm=84:top of atmosphere:rpn_corr=0.999947:rpn_rms=0.231436
1567:1277005374:var discipline=3 center=7 local_table=1 parmcat=192 parm=85:top of atmosphere:rpn_corr=0.998841:rpn_rms=2.10333

gfs t00z.special.grb2f006:

1:0:SBTAGR8:top of atmosphere:rpn_corr=0.999985:rpn_rms=0.0999146
2:4028510:SBTAGR9:top of atmosphere:rpn_corr=0.999989:rpn_rms=0.0926061
3:8285304:SBTAGR10:top of atmosphere:rpn_corr=0.999886:rpn_rms=0.239585
4:12861832:SBTAGR13:top of atmosphere:rpn_corr=0.999997:rpn_rms=0.0554007

@FernandoAndrade-NOAA These changed results in simulated satellite products might come from fix files updates in crtm/2.4.0.1 on Orion. I will conduct the testing from my end.

@WenMeng-NOAA
Copy link
Collaborator

@FernandoAndrade-NOAA My test results on Orion are consistent with yours. All changes in simulated satellite products come from upgrade of crtm on Orion. I think this PR is ready for merging.

@WenMeng-NOAA WenMeng-NOAA marked this pull request as ready for review January 28, 2024 03:20
@WenMeng-NOAA WenMeng-NOAA self-requested a review as a code owner January 28, 2024 03:20
@WenMeng-NOAA WenMeng-NOAA added the Baseline Change The baselines of the UPP regression tests are changed. label Jan 28, 2024
@FernandoAndrade-NOAA
Copy link
Collaborator

@FernandoAndrade-NOAA My test results on Orion are consistent with yours. All changes in simulated satellite products come from upgrade of crtm on Orion. I think this PR is ready for merging.

Thank you for confirming! I will update Orion baselines for the affected tests tomorrow after the merge. I will also replace the baselines for Hercules to potentially resolve the Hercules fatal errors. Would you happen to know why the upgrade in crtm versions did not affect results on WCOSS2 and Hera while changing values on Orion? Is the crtm configuration considerably different on that system?

@FernandoAndrade-NOAA
Copy link
Collaborator

I will also create an issue for the appropriate Gaea C5/F5 updates once everything is verified over on the weather model side.

@WenMeng-NOAA
Copy link
Collaborator

@FernandoAndrade-NOAA My test results on Orion are consistent with yours. All changes in simulated satellite products come from upgrade of crtm on Orion. I think this PR is ready for merging.

Thank you for confirming! I will update Orion baselines for the affected tests tomorrow after the merge. I will also replace the baselines for Hercules to potentially resolve the Hercules fatal errors. Would you happen to know why the upgrade in crtm versions did not affect results on WCOSS2 and Hera while changing values on Orion? Is the crtm configuration considerably different on that system?

@FernandoAndrade-NOAA The changed results in Orion come from crtm fix files. We occurred the similar situation when switching hpc-stack to spack-stack.

@WenMeng-NOAA
Copy link
Collaborator

This PR is ready for merging. The new baseline is needed on Orion.

@WenMeng-NOAA WenMeng-NOAA merged commit 6331f0b into NOAA-EMC:develop Jan 29, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Baseline Change The baselines of the UPP regression tests are changed. enhancement New feature or request Ready for Review This PR is ready for code review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upgrade to spack-stack 1.6.0
4 participants