Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cads for andrew #616

Merged
merged 80 commits into from
Dec 4, 2023
Merged

Conversation

wx20jjung
Copy link
Contributor

@wx20jjung wx20jjung commented Aug 31, 2023

Description

Fixes #428

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • New and existing tests pass with my changes
  • Any dependent changes have been merged and published

DUE DATE for this PR is 10/12/2023. If this PR is not merged into develop by this date, the PR will be closed and returned to the developer.

…d sensor flags (iasi, hirs, airs, etc.) to the qc_irsnd subroutine.
… subroutine.

If a channel used by the CO2_slicing routine has a missing value, reject the profile.
… subroutine.

If a channel used by the CO2_slicing routine is bad/missing, reject the profile.
…ult. These flags include airs_co2, cris_co2, iasi_co2, hirs_co2 and goessndr_co2. When the flag is true the subroutine co2_cloud_detect will be used to determine cloud layer.
…IIRS cloud information within the CrIS bufr. Added namelist flags to determine which cloud detection routine to use. If cris_co2, airs_co2, iasi_co2, hirs_co2 and/or goessnder_co2 are true, use the co2_cloud_detect subroutine. The original cloud detection subroutine (statistical_cloud_detect) is the default if any flags are missing or set to false.
…bset. Also added logic if CO2_cloud_detect is used,specific channels must be available and pass minimum quality control.
…SI subset. CO2 required channels must also pass minimal quality control or profile will be rejected.
…ubset. Also added logic if CO2_cloud_detect is used,specific channels must be available and pass minimum quality control. AIRS is no longer an operational data set so these changes were never properly tested.
…e available and pass minimum quality control. These are channels 3 - 7 which are the basic CO2 sounding channels of this instrument.
…ch channel pair only tested a specific layer. All channel pairs should test from the tropopause to their pre determined level. Starting from the tropopause with each channel pair finds considerably more cirrus. A CrIS channel was changed in the 3rd pair, cloud thresholds were adjusted lower. There were other cosmetic changes like the radiative transfer integration, changed the subroutine name of the emc_legacy cloud test, etc.
…loud_and_aerosol_detection software. Specifically you will see the variable chan_level. Other variables were added (radiance_overcast, radiance_ratio) to compute chan_level.
… software. These include cris_cads, iasi_cads, and airs_cads. These variables need to be added to the script exglobal_atmos_analysis.sh to call this routine.
…routine) requires chan_level to be added in this routine. chan_level is NOT used in this routine and should NOT change the value of any variable going out of this subroutine.
…outine qc_irsnd. This variable is used in qc_irsnd when determining the clear/cloudy channels for the IR sensors AIRS, IASI, and CrIS.
…tware. This module contains the code (subroutines) developed by ECMWF and available on the NWP SAF.
…contains the setup and call routines for the cloud_and_aerosol_detection software. There are several code additions, deletions, and reorganizations in this push.
…_aerosol_detection software. This code is available from the NWP SAF and is specifically version 3. The only code changes made to these subroutines are to be compatible with the GSI. Logic changes were kept to a minimum.
…F90. Added an 11 - 12 micron test to qcmod to remove potential low level clouds.
…S) for use in CADS. These are NOT complete yet.
…in the fix_gsi directory.

In this case the IASI_CLDDET.NL was modified to NOT use the AVHRR cluster information as it is not ready yet.
AIRS_CLDDET.NL
CRIS_CLDDET.NL
IASI_CLDDET.NL
IASING_CLDDET.NL
IRS_CLDDET.NL
Fixed conflict in gsimod.f90 and removed exglobal_atmos_analysis.sh

Conflicts:
	scripts/exglobal_atmos_analysis.sh
	src/gsi/gsimod.F90
@RussTreadon-NOAA
Copy link
Contributor

wx20jjung:CADS_for_Andrew at af80355 installed on WCOSS2 (Cactus), Hera, and Orion. ctests run with the following results

WCOSS2 (Cactus)

Test project /lfs/h2/emc/da/noscrub/russ.treadon/git/gsi/pr616/build
    Start 1: global_4denvar
    Start 2: rtma
    Start 3: rrfs_3denvar_glbens
    Start 4: netcdf_fv3_regional
    Start 5: hafs_4denvar_glbens
    Start 6: hafs_3denvar_hybens
    Start 7: global_enkf
1/7 Test #4: netcdf_fv3_regional ..............   Passed  482.87 sec
2/7 Test #3: rrfs_3denvar_glbens ..............   Passed  484.54 sec
3/7 Test #7: global_enkf ......................   Passed  608.61 sec
4/7 Test #2: rtma .............................   Passed  969.17 sec
5/7 Test #6: hafs_3denvar_hybens ..............   Passed  1149.69 sec
6/7 Test #5: hafs_4denvar_glbens ..............   Passed  1209.48 sec
7/7 Test #1: global_4denvar ...................   Passed  1321.96 sec

100% tests passed, 0 tests failed out of 7

Total Test time (real) = 1321.97 sec

Hera

Test project /scratch1/NCEPDEV/da/Russ.Treadon/git/gsi/pr616/build
    Start 1: global_4denvar
    Start 2: rtma
    Start 3: rrfs_3denvar_glbens
    Start 4: netcdf_fv3_regional
    Start 5: hafs_4denvar_glbens
    Start 6: hafs_3denvar_hybens
    Start 7: global_enkf
1/7 Test #4: netcdf_fv3_regional ..............   Passed  1026.77 sec
2/7 Test #3: rrfs_3denvar_glbens ..............   Passed  1029.59 sec
3/7 Test #7: global_enkf ......................   Passed  1478.18 sec
4/7 Test #2: rtma .............................   Passed  2473.79 sec
5/7 Test #6: hafs_3denvar_hybens ..............   Passed  2482.62 sec
6/7 Test #5: hafs_4denvar_glbens ..............   Passed  2545.05 sec
7/7 Test #1: global_4denvar ...................   Passed  2573.92 sec

100% tests passed, 0 tests failed out of 7

Total Test time (real) = 2573.93 sec

Orion

Test project /work2/noaa/da/rtreadon/git/gsi/pr616/build
    Start 1: global_4denvar
    Start 2: rtma
    Start 3: rrfs_3denvar_glbens
    Start 4: netcdf_fv3_regional
    Start 5: hafs_4denvar_glbens
    Start 6: hafs_3denvar_hybens
    Start 7: global_enkf
1/7 Test #4: netcdf_fv3_regional ..............   Passed  486.48 sec
2/7 Test #7: global_enkf ......................   Passed  621.16 sec
3/7 Test #3: rrfs_3denvar_glbens ..............***Failed  668.74 sec
4/7 Test #2: rtma .............................   Passed  969.89 sec
5/7 Test #1: global_4denvar ...................   Passed  1622.81 sec
6/7 Test #6: hafs_3denvar_hybens ..............   Passed  2293.37 sec
7/7 Test #5: hafs_4denvar_glbens ..............***Failed  2474.81 sec

71% tests passed, 2 tests failed out of 7

Total Test time (real) = 2474.82 sec

The following tests FAILED:
          3 - rrfs_3denvar_glbens (Failed)
          5 - hafs_4denvar_glbens (Failed)

The rrfs_3denvar_glbens failure is due to the run time check

The runtime for rrfs_3denvar_glbens_loproc_updat is 173.359698 seconds.  This has exceeded maximum allowable threshold time of 166.973230 seconds,
resulting in Failure time-thresh of the regression test.

This is not a fatal failure.

The hafs_4denvar_glbens failure is due to the run time check

The runtime for hafs_4denvar_glbens_loproc_updat is 666.779203 seconds.  This has exceeded maximum allowable threshold time of 504.336197 seconds,
resulting in Failure time-thresh of the regression test.

This is not a fatal failure.

Copy link
Contributor

@RussTreadon-NOAA RussTreadon-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve.

Copy link
Collaborator

@DavidHuber-NOAA DavidHuber-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks @wx20jjung!

Approve.

@RussTreadon-NOAA
Copy link
Contributor

@wx20jjung , GSI PR #624 has been merged into develop. This PR updates the GSI build to spack-stack on non-production machines. We should bring this update into wx20jjung:CADS_for_Andrew.

I can rerun ctests on WCOSS2 after you update your branch. PR #624 should not impact this PR but it's best to confirm by rerunning the ctests. Would you please run ctests on Hera after you update your branch. Do you ever run on Orion?

@wx20jjung
Copy link
Contributor Author

wx20jjung commented Nov 30, 2023 via email

@RussTreadon-NOAA
Copy link
Contributor

The S4 failure is unexpected. @DavidHuber-NOAA , PR #624 was tested on S4, right?

I merged the current head of develop into my working copy of CADS_for_Andrew on Cactus. WCOSS2 ctests results below

russ.treadon@clogin03:/lfs/h2/emc/da/noscrub/russ.treadon/git/gsi/pr616_update/build> tail -f stdout_ctest.txt 
    Start 1: global_4denvar
    Start 2: rtma
    Start 3: rrfs_3denvar_glbens
    Start 4: netcdf_fv3_regional
    Start 5: hafs_4denvar_glbens
    Start 6: hafs_3denvar_hybens
    Start 7: global_enkf
1/7 Test #4: netcdf_fv3_regional ..............***Failed  483.39 sec
2/7 Test #3: rrfs_3denvar_glbens ..............   Passed  485.31 sec
3/7 Test #7: global_enkf ......................   Passed  607.82 sec
4/7 Test #2: rtma .............................   Passed  967.37 sec
5/7 Test #6: hafs_3denvar_hybens ..............   Passed  1209.49 sec
6/7 Test #5: hafs_4denvar_glbens ..............   Passed  1209.60 sec
7/7 Test #1: global_4denvar ...................   Passed  1322.06 sec

86% tests passed, 1 tests failed out of 7

Total Test time (real) = 1322.06 sec

The following tests FAILED:
          4 - netcdf_fv3_regional (Failed)
Errors while running CTest

The netcdf_fv3_regional is due to

The memory for netcdf_fv3_regional_loproc_updat is 289288 KBs.  This has exceeded maximum allowable memory of 191276 KBs,
resulting in Failure memthresh of the regression test.

The loproc_updat task 0 memory usage is indeed noticeably higher than the contrl

netcdf_fv3_regional_hiproc_contrl/stdout:The maximum resident set size (KB)                   = 358996
netcdf_fv3_regional_hiproc_updat/stdout:The maximum resident set size (KB)                   = 362660
netcdf_fv3_regional_loproc_contrl/stdout:The maximum resident set size (KB)                   = 173888
netcdf_fv3_regional_loproc_updat/stdout:The maximum resident set size (KB)                   = 289288

@DavidHuber-NOAA
Copy link
Collaborator

@RussTreadon-NOAA Yes, #624 was tested in the global workflow on S4 at C96/C48 deterministic/ensemble resolutions.

@RussTreadon-NOAA
Copy link
Contributor

Thank you @DavidHuber-NOAA for the confirmation. Jim's failure is puzzling.

@DavidHuber-NOAA
Copy link
Collaborator

@wx20jjung @RussTreadon-NOAA The issue on S4 is that the wrong modules are being loaded at runtime. Since the job is running within the global-workflow, it is still loading hpc-stack modules (e.g. hdf5/1.10.6), which is causing the crash. You are welcome to try and merge in git@github.com:DavidHuber-NOAA/global-workflow -b feature/spack-stack OR just copy over the module_base.s4.lua and versions/run.s4.ver files.

@wx20jjung
Copy link
Contributor Author

wx20jjung commented Nov 30, 2023 via email

@DavidHuber-NOAA
Copy link
Collaborator

@wx20jjung Yes, though I forgot that there are a few version files you will need to copy. You may be better off just copying over the contents of /data/users/dhuber/gw_ss/versions/{build.,run.}* and /data/users/dhuber/gw_ss/modulefiles/*. Also, you may need to copy over the job script gw_ss/jobs/rocoto/anal.sh, which removes a module kludge.

@wx20jjung
Copy link
Contributor Author

I am not able to run my internal CADS cycle tests as my global-workflow is not compatible (yet) with these changes. I did run the ctests on hera. Here are the results.

Start 1: [=[global_4denvar]=]
Start 5: [=[hafs_4denvar_glbens]=]
Start 6: [=[hafs_3denvar_hybens]=]
Start 2: [=[rtma]=]
Start 7: [=[global_enkf]=]
Start 3: [=[rrfs_3denvar_glbens]=]
Start 4: [=[netcdf_fv3_regional]=]

1/7 Test #4: [=[netcdf_fv3_regional]=] ........ Passed 726.39 sec
2/7 Test #3: [=[rrfs_3denvar_glbens]=] ........ Passed 729.70 sec
3/7 Test #7: [=[global_enkf]=] ................ Passed 1185.37 sec
4/7 Test #2: [=[rtma]=] ....................... Passed 1573.52 sec
5/7 Test #6: [=[hafs_3denvar_hybens]=] ........ Passed 1582.41 sec
6/7 Test #5: [=[hafs_4denvar_glbens]=] ........ Passed 1702.54 sec
7/7 Test #1: [=[global_4denvar]=] ............. Passed 1852.18 sec

100% tests passed, 0 tests failed out of 7

Total Test time (real) = 1852.19 sec

@RussTreadon-NOAA
Copy link
Contributor

Orion ctests
Manually merged the head of develop into wx20jjung:CADS_for_Andrew. Build updated working copy on Orion and run ctests with the following results

Test project /work2/noaa/da/rtreadon/git/gsi/pr616_update/build
    Start 1: global_4denvar
    Start 2: rtma
    Start 3: rrfs_3denvar_glbens
    Start 4: netcdf_fv3_regional
    Start 5: hafs_4denvar_glbens
    Start 6: hafs_3denvar_hybens
    Start 7: global_enkf
1/7 Test #4: netcdf_fv3_regional ..............   Passed  484.67 sec
2/7 Test #7: global_enkf ......................   Passed  489.37 sec
3/7 Test #3: rrfs_3denvar_glbens ..............   Passed  607.09 sec
4/7 Test #2: rtma .............................   Passed  970.90 sec
5/7 Test #6: hafs_3denvar_hybens ..............   Passed  1396.86 sec
6/7 Test #1: global_4denvar ...................***Failed  1502.96 sec
7/7 Test #5: hafs_4denvar_glbens ..............***Failed  1637.40 sec

71% tests passed, 2 tests failed out of 7

Total Test time (real) = 1637.42 sec

The following tests FAILED:
          1 - global_4denvar (Failed)
          5 - hafs_4denvar_glbens (Failed)

The global_4denvar test failed due to the runtime check

The runtime for global_4denvar_hiproc_updat is 307.642959 seconds.  This has exceeded maximum allowable threshold time of 304.149277 seconds,
resulting in Failure of timethresh2 the regression test.

The wall times do not look anomalous given high run to run variability on Orion, especially when running in the /work fileset.

global_4denvar_hiproc_contrl/stdout:The total amount of wall time                        = 276.499343
global_4denvar_hiproc_updat/stdout:The total amount of wall time                        = 307.642959
global_4denvar_loproc_contrl/stdout:The total amount of wall time                        = 373.393028
global_4denvar_loproc_updat/stdout:The total amount of wall time                        = 390.099795

The hafs_4denvar_glbens test failed for the same reason

The runtime for hafs_4denvar_glbens_hiproc_updat is 300.890503 seconds.  This has exceeded maximum allowable threshold time of 293.352983 seconds,
resulting in Failure of timethresh2 the regression test.

Again, the wall times to not look anomalous given high run to run variability on Orion

hafs_4denvar_glbens_hiproc_contrl/stdout:The total amount of wall time                        = 266.684530
hafs_4denvar_glbens_hiproc_updat/stdout:The total amount of wall time                        = 300.890503
hafs_4denvar_glbens_loproc_contrl/stdout:The total amount of wall time                        = 429.190717
hafs_4denvar_glbens_loproc_updat/stdout:The total amount of wall time                        = 433.160648

@wx20jjung , would you please update wx20jjung:CADS_for_Andrew with the current head of develop?

@wx20jjung
Copy link
Contributor Author

wx20jjung commented Dec 1, 2023 via email

@RussTreadon-NOAA
Copy link
Contributor

@wx20jjung , not sure what the problem is. Try the following

  1. create a new directory. mkdir update
  2. cd into the new directory. cd update
  3. git clone --recursive https://github.com/wx20jjung/GSI.git .
  4. git checkout CADS_for_Andrew
  5. git submodule sync
  6. git submodule update
  7. git remote add upstream https://github.com/NOAA-EMC/GSI
  8. git remote -v ... make sure you see something like what's on the GSI GS User Information wiki under Updating your Fork when the Official Repository is Updated
  9. git remote update If you see a fix submodule error, ignore it.
  10. git merge upstream/develop This merges the authoritative develop into the working copy of your branch. The merge command opens an editor. Accept the provided commit log message. Exit the editor. You should see something like the below
(gdasapp) Orion-login-3:/work/noaa/da/rtreadon/git/gsi/update$ git merge upstream/develop
hint: Waiting for your editor to close the file... PuTTY X11 proxy: unable to connect to forwarded X server: Network error: Connection refused
Display localhost:27.0 unavailable, simulating -nw
Merge made by the 'recursive' strategy.
 .github/workflows/gcc.yml          |  31 +++++++++++----------
 .github/workflows/intel.yml        |  47 +++++++++++++++++++-------------
 ci/spack.yaml                      |   4 +--
 modulefiles/gsi_cheyenne.gnu.lua   |  36 ++++++++++++------------
 modulefiles/gsi_cheyenne.intel.lua |  32 +++++++++++++---------
 modulefiles/gsi_common.lua         |  16 ++++++-----
 modulefiles/gsi_gaea.lua           |  24 ++++++++++------
 modulefiles/gsi_hera.gnu.lua       |  22 +++++++--------
 modulefiles/gsi_hera.intel.lua     |  21 ++++++--------
 modulefiles/gsi_hercules.lua       |  26 ++++++++++++++++++
 modulefiles/gsi_jet.lua            |  22 ++++++---------
 modulefiles/gsi_orion.lua          |  21 ++++++--------
 modulefiles/gsi_s4.lua             |  23 +++++++---------
 modulefiles/gsi_wcoss2.lua         |  28 ++++++++++++++++++-
 regression/regression_param.sh     | 137 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------------------
 regression/regression_var.sh       |  19 +++++++++----
 ush/detect_machine.sh              |   2 ++
 ush/module-setup.sh                |   7 +++++
 ush/sub_hercules                   | 170 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 ush/sub_orion                      |   2 ++
 20 files changed, 485 insertions(+), 205 deletions(-)
 create mode 100644 modulefiles/gsi_hercules.lua
 create mode 100755 ush/sub_hercules
(gdasapp) Orion-login-3:/work/noaa/da/rtreadon/git/gsi/update$

Your local copy of CADS_for_Andrew is now update to date with `develop. Push your updated working copy back to your repo.

  1. git push origin CADS_for_Andrew

@wx20jjung
Copy link
Contributor Author

wx20jjung commented Dec 1, 2023 via email

@DavidHuber-NOAA
Copy link
Collaborator

@wx20jjung It looks like you are having trouble with your authentication. You can try changing to ssh authentication:

git set-url origin git@github.com:wx20jjung/GSI.git
git push origin CADS_for_Andrew

This assumes you have an SSH key in your GitHub profile. If this fails, you can follow this guide to get set up.

@RussTreadon-NOAA
Copy link
Contributor

  1. Very strange. How did you update your branch with develop before?
  2. I didn't think I could push to your branch, but I tried git push origin CADS_for_Andrew and it worked
Orion-login-3:/work/noaa/da/rtreadon/git/gsi/update$ git push origin CADS_for_Andrew
Enumerating objects: 80, done.
Counting objects: 100% (56/56), done.
Delta compression using up to 80 threads
Compressing objects: 100% (18/18), done.
Writing objects: 100% (30/30), 8.04 KiB | 316.00 KiB/s, done.
Total 30 (delta 22), reused 18 (delta 11), pack-reused 0
remote: Resolving deltas: 100% (22/22), completed with 11 local objects.
To https://github.com/wx20jjung/GSI.git
   af803552c..88bf2ec52  CADS_for_Andrew -> CADS_for_Andrew

Your branch is now up to date with develop.

@RussTreadon-NOAA
Copy link
Contributor

Please update the working copy of your branch and ensure that everything looks correct.

@RussTreadon-NOAA
Copy link
Contributor

@wx20jjung , we can schedule this PR for merger into develop upon completion of the following two items

  1. your confirmation that wx20jjung:CADS_for_Andrew at 88bf2ec is acceptable
  2. peer review and approval from either Andrew or Erin. Would you please reach out to them?

@RussTreadon-NOAA RussTreadon-NOAA merged commit ea667d9 into NOAA-EMC:develop Dec 4, 2023
4 checks passed
@RussTreadon-NOAA RussTreadon-NOAA mentioned this pull request Mar 1, 2024
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Develop and test new infrared cloud detection routines
5 participants