Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update fix submodule for GFS v16.3.12 and soil analysis changes #695

Merged

Conversation

RussTreadon-NOAA
Copy link
Contributor

@RussTreadon-NOAA RussTreadon-NOAA commented Feb 12, 2024

Description

GFS v16.3.12 updated ascii GSI fix files. GFS v16 soil analysis updates two ascii GSI fix files. This PR changes the fix submodule hash to bring in the updated ascii GSI fix files.

Fixes #640

Type of change

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?
Run ctests on WCOSS2, Hera, Orion, and Hercules with expected results obtained.

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code

@RussTreadon-NOAA RussTreadon-NOAA self-assigned this Feb 12, 2024
@RussTreadon-NOAA RussTreadon-NOAA changed the title update fix submodule to bring in GFS v16.3.12 and soil analysis chang… update fix submodule for GFS v16.3.12 and soil analysis changes Feb 12, 2024
@RussTreadon-NOAA
Copy link
Contributor Author

WCOSS2 ctests
Install RussTreadon-NOAA:feature/gsi_fix16.3.12 at 3d942f7 and develop at bae0342 on Dogwood. Run ctests with following results

russ.treadon@dlogin05:/lfs/h2/emc/da/noscrub/russ.treadon/git/gsi/gsi_fix/build> tail -f stdout_ctest.txt 
    Start 4: netcdf_fv3_regional
    Start 1: global_4denvar
    Start 5: hafs_4denvar_glbens
    Start 6: hafs_3denvar_hybens
    Start 2: rtma
    Start 7: global_enkf
    Start 3: rrfs_3denvar_glbens
1/7 Test #4: netcdf_fv3_regional ..............   Passed  483.11 sec
2/7 Test #3: rrfs_3denvar_glbens ..............   Passed  484.65 sec
3/7 Test #7: global_enkf ......................   Passed  610.58 sec
4/7 Test #2: rtma .............................   Passed  970.13 sec
5/7 Test #6: hafs_3denvar_hybens ..............   Passed  1211.62 sec
6/7 Test #5: hafs_4denvar_glbens ..............   Passed  1274.65 sec
7/7 Test #1: global_4denvar ...................   Passed  1382.86 sec

100% tests passed, 0 tests failed out of 7

Total Test time (real) = 1382.86 sec

Confirmed that the contrl and updat global_4denvar used different files for anavinfo, convinfo, and satinfo. The observation types for which these fix files differ are not assimilated in the global_4denvar case. Hence, identical analysis results. The other ctests do not use the updated global fix files.

@RussTreadon-NOAA
Copy link
Contributor Author

Hera ctests
Install RussTreadon-NOAA:feature/gsi_fix16.3.12 at 3d942f7 and develop at bae0342 on Hera. Run ctests with following results

Test project /scratch1/NCEPDEV/da/Russ.Treadon/git/gsi/gsi_fix/build
    Start 1: global_4denvar
    Start 2: rtma
    Start 3: rrfs_3denvar_glbens
    Start 4: netcdf_fv3_regional
    Start 5: hafs_4denvar_glbens
    Start 6: hafs_3denvar_hybens
    Start 7: global_enkf
1/7 Test #3: rrfs_3denvar_glbens ..............   Passed  511.67 sec
2/7 Test #4: netcdf_fv3_regional ..............***Failed  546.86 sec
3/7 Test #7: global_enkf ......................***Failed  1020.20 sec
4/7 Test #2: rtma .............................***Failed  1153.50 sec
5/7 Test #6: hafs_3denvar_hybens ..............   Passed  1233.94 sec
6/7 Test #5: hafs_4denvar_glbens ..............   Passed  1464.57 sec
7/7 Test #1: global_4denvar ...................   Passed  1673.14 sec

57% tests passed, 3 tests failed out of 7

Total Test time (real) = 1673.14 sec

The following tests FAILED:
          2 - rtma (Failed)
          4 - netcdf_fv3_regional (Failed)
          7 - global_enkf (Failed)
Errors while running CTest

rtma failed with

The runtime for rtma_loproc_updat is 240.049616 seconds.  This has exceeded maximum allowable threshold time of 234.666092 seconds, resulting in Failure time-thresh of the regression test.

A check of the updat and contrl wall times shows timings to vary within expected ranges

tmpreg_rtma/rtma_hiproc_contrl/stdout:The total amount of wall time                        = 200.207948
tmpreg_rtma/rtma_hiproc_updat/stdout:The total amount of wall time                        = 191.858486
tmpreg_rtma/rtma_loproc_contrl/stdout:The total amount of wall time                        = 213.332811
tmpreg_rtma/rtma_loproc_updat/stdout:The total amount of wall time                        = 240.049616

This is not a fatal fail.

The netcdf_fv3_regional failure is due to

The runtime for netcdf_fv3_regional_hiproc_updat is 96.569096 seconds.  This has exceeded maximum allowable threshold time of 73.415131 seconds, resulting in Failure of timethresh2 the regression test.

A check of updat and contrl wall times shows variability within normally observed ranges

netcdf_fv3_regional_hiproc_contrl/stdout:The total amount of wall time                        = 58.732105
netcdf_fv3_regional_hiproc_updat/stdout:The total amount of wall time                        = 96.569096
netcdf_fv3_regional_loproc_contrl/stdout:The total amount of wall time                        = 74.399252
netcdf_fv3_regional_loproc_updat/stdout:The total amount of wall time                        = 73.844677

This is not a fatal fail.

global_enkf failed due to

The runtime for global_enkf_hiproc_updat is 68.637144 seconds.  This has exceeded maximum allowable threshold time of 65.351489 seconds,
resulting in Failure timethresh2 of the regression test.

A check of updat and contrl wall times

global_enkf_hiproc_contrl/stdout:The total amount of wall time                        = 59.410445
global_enkf_hiproc_updat/stdout:The total amount of wall time                        = 68.637144
global_enkf_loproc_contrl/stdout:The total amount of wall time                        = 80.456422
global_enkf_loproc_updat/stdout:The total amount of wall time                        = 80.469991

does not reveal anomalous behavior. This is not a fatal fail.

@RussTreadon-NOAA
Copy link
Contributor Author

RussTreadon-NOAA commented Feb 12, 2024

Hercules ctest
Install RussTreadon-NOAA:feature/gsi_fix16.3.12 at 3d942f7 and develop at bae0342 on Hercules. Run ctests with following results


Test project /work2/noaa/da/rtreadon/git/gsi/gsi_fix/build
    Start 1: global_4denvar
    Start 2: rtma
    Start 3: rrfs_3denvar_glbens
    Start 4: netcdf_fv3_regional
    Start 5: hafs_4denvar_glbens
    Start 6: hafs_3denvar_hybens
    Start 7: global_enkf
1/7 Test #4: netcdf_fv3_regional ..............   Passed  488.39 sec
2/7 Test #7: global_enkf ......................   Passed  499.63 sec
3/7 Test #3: rrfs_3denvar_glbens ..............***Failed  545.12 sec
4/7 Test #2: rtma .............................   Passed  967.94 sec
5/7 Test #6: hafs_3denvar_hybens ..............***Failed  1094.68 sec
6/7 Test #5: hafs_4denvar_glbens ..............   Passed  1300.73 sec
7/7 Test #1: global_4denvar ...................   Passed  1383.73 sec

71% tests passed, 2 tests failed out of 7

Total Test time (real) = 1383.74 sec

The following tests FAILED:
          3 - rrfs_3denvar_glbens (Failed)
          6 - hafs_3denvar_hybens (Failed)

The rrfs_3denvar_glbens failure is due to

The fv3_sfcdata are reproducible
The fv3_tracer are reproducible
The results between the two runs (rrfs_3denvar_glbens_loproc_updat and rrfs_3denvar_glbens_hiproc_updat) are not reproducible.  Thus, the case has Failed siganl of the regression tests.

This message is cryptic. Output files fv3_sfcdata, fv3_tracer, and fv3_dynvars are compared. The loproc and hiproc fv3_sfcdata and fv3_tracer files are identical. The fv3_dynvars files differ. A comparison of netcdf records in the loproc and_hiproc_ fv3_dynvar shows the following

xaxis_1 min/max 1=1.0,396.0 min/max 2=1.0,396.0 max abs diff=0.0000000000
xaxis_2 min/max 1=1.0,397.0 min/max 2=1.0,397.0 max abs diff=0.0000000000
yaxis_1 min/max 1=1.0,233.0 min/max 2=1.0,233.0 max abs diff=0.0000000000
yaxis_2 min/max 1=1.0,232.0 min/max 2=1.0,232.0 max abs diff=0.0000000000
zaxis_1 min/max 1=1.0,65.0 min/max 2=1.0,65.0 max abs diff=0.0000000000
Time min/max 1=1.0,1.0 min/max 2=1.0,1.0 max abs diff=0.0000000000
u min/max 1=-38.006363,59.550613 min/max 2=-38.006363,59.550613 max abs diff=0.0031976700
v min/max 1=-26.81582,31.66718 min/max 2=-26.81582,31.66718 max abs diff=6.9626979828
W min/max 1=-2.563452,6.3780456 min/max 2=-2.563452,6.3780456 max abs diff=0.0000000000
DZ min/max 1=-5746.5317,-17.513391 min/max 2=-5746.5317,-17.513391 max abs diff=0.0000000000
T min/max 1=194.30504,313.18134 min/max 2=194.30504,313.18134 max abs diff=0.0000000000
delp min/max 1=140.36311,3325.6877 min/max 2=140.36311,3325.6877 max abs diff=0.0000000000
phis min/max 1=-676.6589,36239.055 min/max 2=-676.6589,36239.055 max abs diff=0.0000000000

Differences are limited to the u and v wind components.

The hafs_3denvar_hybens failure is for the same reason

The fv3_sfcdata are reproducible
The fv3_tracer are reproducible
The results between the two runs (hafs_3denvar_hybens_loproc_updat and hafs_3denvar_hybens_hiproc_updat) are not reproducible
Thus, the case has Failed siganl of the regression tests.

However, this time it is the delp fields which differ

xaxis_1 min/max 1=1.0,720.0 min/max 2=1.0,720.0 max abs diff=0.0000000000
xaxis_2 min/max 1=1.0,721.0 min/max 2=1.0,721.0 max abs diff=0.0000000000
yaxis_1 min/max 1=1.0,541.0 min/max 2=1.0,541.0 max abs diff=0.0000000000
yaxis_2 min/max 1=1.0,540.0 min/max 2=1.0,540.0 max abs diff=0.0000000000
zaxis_1 min/max 1=1.0,65.0 min/max 2=1.0,65.0 max abs diff=0.0000000000
Time min/max 1=1.0,1.0 min/max 2=1.0,1.0 max abs diff=0.0000000000
u min/max 1=-49.416553,49.226555 min/max 2=-49.416553,49.226555 max abs diff=0.0000000000
v min/max 1=-49.340206,39.08488 min/max 2=-49.340206,39.08488 max abs diff=0.0000000000
W min/max 1=-2.842067,9.540336 min/max 2=-2.842067,9.540336 max abs diff=0.0000000000
DZ min/max 1=-5518.389,-17.457632 min/max 2=-5518.389,-17.457632 max abs diff=0.0000000000
T min/max 1=181.89748,309.6158 min/max 2=181.89748,309.6158 max abs diff=0.0000000000
delp min/max 1=140.48375,3839.4077 min/max 2=140.48375,3839.4077 max abs diff=1528.7261962891
phis min/max 1=-6.5023405e-06,36102.867 min/max 2=-6.5023405e-06,36102.867 max abs diff=0.0000000000
ua min/max 1=-49.661697,46.939003 min/max 2=-49.661697,46.939003 max abs diff=0.0000000000
va min/max 1=-37.868477,49.013893 min/max 2=-37.868477,49.013893 max abs diff=0.0000000000

Rerun rrfs_3denvar_glbens and it Passed

Test project /work2/noaa/da/rtreadon/git/gsi/gsi_fix/build
    Start 3: rrfs_3denvar_glbens
1/1 Test #3: rrfs_3denvar_glbens ..............   Passed  549.44 sec

100% tests passed, 0 tests failed out of 1

Total Test time (real) = 549.98 sec

Rerun of hafs_3denvar_hybens failed as before due to differences between the loproc and hiproc fv3_dynvars. However, for this run differences show up in the u and v wind components

xaxis_1 min/max 1=1.0,720.0 min/max 2=1.0,720.0 max abs diff=0.0000000000
xaxis_2 min/max 1=1.0,721.0 min/max 2=1.0,721.0 max abs diff=0.0000000000
yaxis_1 min/max 1=1.0,541.0 min/max 2=1.0,541.0 max abs diff=0.0000000000
yaxis_2 min/max 1=1.0,540.0 min/max 2=1.0,540.0 max abs diff=0.0000000000
zaxis_1 min/max 1=1.0,65.0 min/max 2=1.0,65.0 max abs diff=0.0000000000
Time min/max 1=1.0,1.0 min/max 2=1.0,1.0 max abs diff=0.0000000000
u min/max 1=-49.416553,49.226555 min/max 2=-49.416553,49.226555 max abs diff=0.0005331039
v min/max 1=-49.340206,39.08488 min/max 2=-49.340206,39.08488 max abs diff=2.9050793648
W min/max 1=-2.842067,9.540336 min/max 2=-2.842067,9.540336 max abs diff=0.0000000000
DZ min/max 1=-5518.389,-17.457632 min/max 2=-5518.389,-17.457632 max abs diff=0.0000000000
T min/max 1=181.89748,309.6158 min/max 2=181.89748,309.6158 max abs diff=0.0000000000
delp min/max 1=140.48375,3329.2869 min/max 2=140.48375,3329.2869 max abs diff=0.0000000000
phis min/max 1=-6.5023405e-06,36102.867 min/max 2=-6.5023405e-06,36102.867 max abs diff=0.0000000000
ua min/max 1=-49.661697,46.939003 min/max 2=-49.661697,46.939003 max abs diff=0.0000000000
va min/max 1=-37.868477,49.013893 min/max 2=-37.868477,49.013893 max abs diff=0.0000000000

This behavior is odd. Since the updat and contrl gsi.x are identical and the modified fix files in feature/gsi_fix16.3.12 are not used in hafs_3denvar_hybens, the above failures may be related to Hercules system issues.

The hafs_3denvar_hybens test was run a third time. This time the test Passed.

Test project /work2/noaa/da/rtreadon/git/gsi/gsi_fix/build
    Start 6: hafs_3denvar_hybens
1/1 Test #6: hafs_3denvar_hybens ..............   Passed  1761.72 sec

100% tests passed, 0 tests failed out of 1

Total Test time (real) = 1762.20 sec

@RussTreadon-NOAA
Copy link
Contributor Author

Orion ctest
Install RussTreadon-NOAA:feature/gsi_fix16.3.12 at 3d942f7 and develop at bae0342 on Orion. Run ctests with following results

Test project /work/noaa/da/rtreadon/git/gsi/gsi_fix/build
    Start 1: global_4denvar
    Start 2: rtma
    Start 3: rrfs_3denvar_glbens
    Start 4: netcdf_fv3_regional
    Start 5: hafs_4denvar_glbens
    Start 6: hafs_3denvar_hybens
    Start 7: global_enkf
1/7 Test #7: global_enkf ......................   Passed  552.19 sec
2/7 Test #4: netcdf_fv3_regional ..............***Failed  844.67 sec
3/7 Test #2: rtma .............................   Passed  1149.27 sec
4/7 Test #1: global_4denvar ...................   Passed  1745.30 sec
5/7 Test #3: rrfs_3denvar_glbens ..............   Passed  1986.89 sec

hafs_3denvar_hybens and hafs_4denvar_glbens did not complete before the specified job wall clock limit. The Orion work/ fileset is know to have latency issues. The Orion ctests ran in work/noaa/stmp/rtreadon/gsi_fix.

The netcdf_fv3_regional failure is due to

The runtime for netcdf_fv3_regional_loproc_updat is 265.907088 seconds.  This has exceeded maximum allowable threshold time of 102.773893 seconds,
resulting in Failure time-thresh of the regression test.

A check of the gsi.x wall time for each test shows considerable variabilty.

netcdf_fv3_regional_hiproc_contrl/stdout:The total amount of wall time                        = 181.530507
netcdf_fv3_regional_hiproc_updat/stdout:The total amount of wall time                        = 165.244079
netcdf_fv3_regional_loproc_contrl/stdout:The total amount of wall time                        = 82.219115
netcdf_fv3_regional_loproc_updat/stdout:The total amount of wall time                        = 265.907088

This points to Orion system issues. This is not a fatal fail.

The wall clock limit for the hafs_3denvar_hybens test was increased from 15 minutes to 45 minutes. The test was resubmitted. The hafs_3denvar_hybens_loproc_updat job did not complete within the 45 minute wall clock limit. Orion tests will be rerun at a later date.

@RussTreadon-NOAA
Copy link
Contributor Author

@ClaraDraper-NOAA , this PR updates the fix submodule hash to 298bdc0. This hash includes the two fix files you added via GSI-fix PR #14.

GSI PR's need two peer reviews. I added you as a peer reviewer to this PR since you created the soil analysis fix files. Once this PR has two peer approvals, we can get the updated fix submodule hash into develop.

@RussTreadon-NOAA
Copy link
Contributor Author

Orion ctests rerun
Rerun Orion ctests with the following results

Test project /work/noaa/da/rtreadon/git/gsi/gsi_fix/build
    Start 1: global_4denvar
    Start 2: rtma
    Start 3: rrfs_3denvar_glbens
    Start 4: netcdf_fv3_regional
    Start 5: hafs_4denvar_glbens
    Start 6: hafs_3denvar_hybens
    Start 7: global_enkf
1/7 Test #7: global_enkf ......................   Passed  489.04 sec
2/7 Test #4: netcdf_fv3_regional ..............   Passed  544.16 sec
3/7 Test #3: rrfs_3denvar_glbens ..............   Passed  606.51 sec
4/7 Test #2: rtma .............................   Passed  1030.50 sec
5/7 Test #6: hafs_3denvar_hybens ..............   Passed  1635.18 sec
6/7 Test #1: global_4denvar ...................***Failed  1683.46 sec
7/7 Test #5: hafs_4denvar_glbens ..............   Passed  1695.75 sec

86% tests passed, 1 tests failed out of 7

Total Test time (real) = 1695.89 sec

The following tests FAILED:
          1 - global_4denvar (Failed)

The global_4denvar test failed due to

The runtime for global_4denvar_loproc_updat is 431.349257 seconds.  This has exceeded maximum allowable threshold time of 429.251064 seconds, resulting in Failure time-thresh of the regression test.

A check of the gsi.x wall times shows

global_4denvar_hiproc_contrl/stdout:The total amount of wall time                        = 303.900952
global_4denvar_hiproc_updat/stdout:The total amount of wall time                        = 312.005466
global_4denvar_loproc_contrl/stdout:The total amount of wall time                        = 390.228240
global_4denvar_loproc_updat/stdout:The total amount of wall time                        = 431.349257

Given run time variability on Orion this is not a fatal fail.

@RussTreadon-NOAA
Copy link
Contributor Author

@ClaraDraper-NOAA , all ctests yield acceptable results. This PR can be scheduled for merger pending your review and approval.

Copy link
Contributor

@ClaraDraper-NOAA ClaraDraper-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This includes both @ADCollard and my changes / additions to the fix files. Looks good to me.

@RussTreadon-NOAA
Copy link
Contributor Author

@CoryMartin-NOAA , would you mind reviewing and approving this PR? I can not approve since I created the PR.

This PR updates the fix submodule hash to bring in operational GFS v16.3.12 fix file updates. Also included in the updated fix hash are two fix files for the GFS v17 soil analysis.

Copy link
Contributor

@CoryMartin-NOAA CoryMartin-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks

@CoryMartin-NOAA
Copy link
Contributor

@RussTreadon-NOAA should I handle the merge or would you like to?

@RussTreadon-NOAA
Copy link
Contributor Author

@CoryMartin-NOAA , I'm fine with shepherding this PR through the last gate. You already did something much more important. You opened the gate with your approval. Thanks!

@RussTreadon-NOAA RussTreadon-NOAA merged commit 86ad20e into NOAA-EMC:develop Feb 13, 2024
4 checks passed
@RussTreadon-NOAA RussTreadon-NOAA deleted the feature/gsi_fix16.3.12 branch February 13, 2024 20:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add GFS v16.3.12 changes to develop
4 participants