Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roll-out spack-stack-1.5.0 #765

Closed
climbfuji opened this issue Sep 6, 2023 · 8 comments · Fixed by #787
Closed

Roll-out spack-stack-1.5.0 #765

climbfuji opened this issue Sep 6, 2023 · 8 comments · Fixed by #787
Assignees
Labels
INFRA JEDI Infrastructure

Comments

@climbfuji
Copy link
Collaborator

climbfuji commented Sep 6, 2023

Description

This PR captures the roll-out of spack-stack-1.5.0.

While we are at it, we should check and update thirdparty, install, and source-cache locations on preconfigured sites for spack-stack-1.5.0.

For example, gaea-c5 and hercules are following the new directory structure for sites that share a filesystem with another site, but their peers gaea-c4 and orion don't.

Important: check external packages for sites and remove any older, external bison to avoid duplicate packages

Latest release documentation: https://spack-stack.readthedocs.io/en/release-1.5.0/

Sites

Basic tests are: build jedi-bundle and run ctests AND/OR build ufs-weather-model and run regression tests AND/ORbuild ufs-srw-app and run tests (please denote in the list below). If someone wants to test an installation from a different installer with another set of tests, that is more than welcome (just note the github username when you add the test).

  • Orion (@climbfuji)
    • external bison removed
    • top-level directory structure
    • 3rdparty libraries and modules
    • source cache location
    • Installed in: /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.5.0/envs/unified-env
    • basic tests performed for all supported compilers: jedi-bundle with GNU, only two ctests failed out of 1900
    • site/config and documentation updated: release/1.5.0: site updates part 1 (Orion, Hercules, Narwhal) #770
    • (re-)installed with ip@4.3.0
The following tests FAILED:
	405 - saber_test_dirac_gsi_gfs_global_1-1 (Failed)
	1627 - test_soca_sqrtvertloc (Failed)
  • Hercules (@climbfuji)
    • external bison removed
    • top-level directory structure
    • 3rdparty libraries and modules
    • source cache location
    • Installed in: /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.0/envs/unified-env
    • basic tests performed for all supported compilers (jedi-bundle)
    • site/config and documentation updated: release/1.5.0: site updates part 1 (Orion, Hercules, Narwhal) #770
    • (re-)installed with ip@4.3.0
The following tests FAILED:
	405 - saber_test_dirac_gsi_gfs_global_1-1 (Failed)
	1532 - fv3jedi_test_tier1_dirac_gfs_gsi_global (Failed)
	1564 - fv3jedi_test_tier1_diffstates_gfs (Failed)
	1627 - test_soca_sqrtvertloc (Failed)
	1633 - test_soca_dirac_soca_mask (Failed)
	1642 - test_soca_3dvar_godas (Failed)
The following tests FAILED:
        408 - saber_test_dirac_gsi_gfs_global_1-1 (Failed)
        1541 - fv3jedi_test_tier1_dirac_geos_gsi_global (Failed)
        1542 - fv3jedi_test_tier1_3dvar_geos_sondes (Failed)
        1543 - fv3jedi_test_tier1_3dvar_geos_ozone (Failed)
        1544 - fv3jedi_test_tier1_dirac_gfs_gsi_global (Failed)
        1545 - fv3jedi_test_tier1_3dvar_gfs_sondes (Failed)
        1627 - test_soca_setcorscales (Failed)
        1629 - test_soca_parameters_bump_cor_nicas_scales (Failed)
        1632 - test_soca_enspert (Failed)
        1640 - test_soca_sqrtvertloc (Failed)
        1642 - test_soca_dirac_soca_cor_nicas_scales (Failed)
        1644 - test_soca_dirac_socahyb_cov (Failed)
        1646 - test_soca_dirac_soca_mask (Failed)
        1647 - test_soca_dirac_soca_nomask (Failed)
        1655 - test_soca_3dvar_godas (Failed)
  • Casper (@climbfuji)
  • Cheyenne (@climbfuji) - not making any changes unless necessary, since machine will be decommissioned at the end of this year
    • external bison removed
    • top-level directory structure
    • 3rdparty libraries and modules
    • source cache location
    • Installed in /glade/work/epicufsrt/contrib/spack-stack/cheyenne/spack-stack-1.5.0/envs/unified-env (gcc only) and /glade/work/epicufsrt/contrib/spack-stack/cheyenne/spack-stack-1.5.0/envs/ufs-env (intel only)
    • basic tests performed for all supported compilers:
      • jedi-bundle with gnu: all ctests pass
      • ufs-weather-model with intel: I compiled as follows after loading ufs-weather-model-env/1.0.0, and that worked (no more testing done)
git clone -b feature/spack_stack_150 --recurse-submodules https://github.com/climbfuji/ufs-weather-model
mkdir build; cd build
cmake -DAPP=S2SWA -DCCPP_SUITES=FV3_GFS_v17_coupled_p8 ../ufs-weather-model 2>&1 | tee log.cmake
- [x] site/config and documentation updated: https://github.com/JCSDA/spack-stack/pull/787
- [x] (re-)installed with ip@4.3.0
The following tests FAILED:
        412 - saber_test_dirac_gsi_gfs_global_1-1 (Failed)
        1403 - ufo_test_tier1_test_ufo_conventionalprofileprocessing_average_relativehumidity_obsfilter (Failed)
        1404 - ufo_test_tier1_test_ufo_conventionalprofileprocessing_average_relativehumidity_OPScomparison (Failed)
        1554 - fv3jedi_test_tier1_dirac_gfs_gsi_global (Failed)
        1576 - fv3jedi_test_tier1_4denvar_seq (Failed)
        1587 - fv3jedi_test_tier1_diffstates_gfs (Failed)
        1650 - test_soca_sqrtvertloc (Failed)

ctest.noget.timeout-0600.log

Many fv3-jedi and soca ctests fail when run consecutively, but when run individually they pass. Very weird.
  • [x Gaea C5 (@ulmononian, @climbfuji)
  • Hera (@ulmononian)
    • external bison removed
    • top-level directory structure
    • 3rdparty libraries and modules
    • source cache location
    • Install of unified-env for supported compilers (combined or separately) in /scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.5.0/envs/unified-env
    • basic tests performed for all supported compilers (ufs-weather-model)
    • site/config and documentation updated: Update Hera/Jet/Cheyenne site configs #779
    • (re-)installed with ip@4.3.0
  • Jet (@ulmononian)
    • external bison removed
    • top-level directory structure
    • 3rdparty libraries and modules
    • source cache location
    • Install of unified-env for supported compilers (combined or separately) in /mnt/lfs4/HFIP/hfv3gfs/role.epic/spack-stack/spack-stack-1.5.0/envs/unified-env
    • basic tests performed for all supported compilers (ufs-weather-model)
    • site/config and documentation updated: Update Hera/Jet/Cheyenne site configs #779
    • (re-)installed with ip@4.3.0
  • Narwhal (@climbfuji)
    • external bison removed
    • top-level directory structure
    • 3rdparty libraries and modules
    • source cache location
    • Installed in: /p/app/projects/NEPTUNE/spack-stack/spack-stack-1.5.0/envs/unified-env-intel-2021.4.0 and /p/app/projects/NEPTUNE/spack-stack/spack-stack-1.5.0/envs/unified-env-gcc-10.3.0
    • basic tests performed for all supported compilers (jedi-bundle with Intel; basically all soca tests fail, the rest looks good)
    • site/config and documentation updated: release/1.5.0: site updates part 1 (Orion, Hercules, Narwhal) #770
    • (re-)installed with ip@4.3.0
         64 - test_l95_pseudomodel_state4d (Failed)
        403 - saber_test_dirac_gsi_gfs_global_1-1 (Failed)
        1021 - ufo_test_tier1_instrument_sonde_gfs_HofX (Failed)
        1022 - ufo_test_tier1_instrument_sonde_gfs_qc (Failed)
        1508 - fv3jedi_test_tier1_linearmodel_htlm_physics (Failed)
        1577 - fv3jedi_test_tier1_eda_3dvar (Failed)
        1581 - test_soca_gridgen (Failed)
        1582 - test_soca_geometry (Failed)
        1583 - test_soca_geometry_iterator_2d (Failed)
        1584 - test_soca_geometry_iterator_3d (Failed)
        1585 - test_soca_state (Failed)
        1586 - test_soca_increment (Failed)
        1587 - test_soca_model (Failed)
        1588 - test_soca_modelaux (Failed)
        1589 - test_soca_getvalues (Failed)
        1590 - test_soca_errorcovariance (Failed)
        1591 - test_soca_linearmodel (Failed)
        1592 - test_soca_varchange_ana2model (Failed)
        1593 - test_soca_varchange_balance (Failed)
        1594 - test_soca_varchange_balance_TSSSH (Failed)
        1597 - test_soca_varchange_bkgerrsoca (Failed)
        1598 - test_soca_varchange_bkgerrsoca_stddev (Failed)
        1599 - test_soca_varchange_bkgerrgodas (Failed)
        1600 - test_soca_varchange_vertconv (Failed)
        1601 - test_soca_obslocalization (Failed)
        1605 - test_soca_forecast_mom6 (Failed)
        1607 - test_soca_forecast_mom6_ens1 (Failed)
        1608 - test_soca_forecast_mom6_ens2 (Failed)
        1610 - test_soca_static_socaerror_init (Failed)
        1612 - test_soca_setcorscales (Failed)
        1614 - test_soca_parameters_bump_cor_nicas_scales (Failed)
        1615 - test_soca_parameters_bump_loc (Failed)
        1617 - test_soca_enspert (Failed)
        1618 - test_soca_convertstate (Failed)
        1619 - test_soca_convertstate_changevar (Failed)
        1620 - test_soca_ensvariance (Failed)
        1621 - test_soca_ensmeanandvariance (Failed)
        1622 - test_soca_parametric_stddev (Failed)
        1623 - test_soca_ensrecenter (Failed)
        1624 - test_soca_hybridgain (Failed)
        1625 - test_soca_sqrtvertloc (Failed)
        1626 - test_soca_diffstates (Failed)
        1627 - test_soca_dirac_soca_cor_nicas_scales (Failed)
        1628 - test_soca_dirac_soca_cov (Failed)
        1629 - test_soca_dirac_socahyb_cov (Failed)
        1630 - test_soca_dirac_horizfilt (Failed)
        1631 - test_soca_dirac_soca_mask (Failed)
        1633 - test_soca_makeobs (Failed)
        1634 - test_soca_hofx_3d (Failed)
        1635 - test_soca_hofx_4d (Failed)
        1636 - test_soca_hofx_4d_pseudo (Failed)
        1637 - test_soca_enshofx (Failed)
        1638 - test_soca_3dvar_soca (Failed)
        1640 - test_soca_3dvar_godas (Failed)
        1641 - test_soca_3dvarlowres_soca (Failed)
        1642 - test_soca_3dvarfgat (Failed)
        1643 - test_soca_3dvarfgat_pseudo (Failed)
        1644 - test_soca_3dhyb (Failed)
        1645 - test_soca_3dhybfgat (Failed)
        1646 - test_soca_4denvar (Failed)
The following tests FAILED:
        407 - saber_test_dirac_gsi_gfs_global_1-1 (Failed)
        1639 - test_soca_sqrtvertloc (Failed)
Errors while running CTest
The following tests FAILED:
        408 - saber_test_dirac_gsi_gfs_global_1-1 (Failed)
        1462 - test_femps_csgrid (Failed)
        1541 - fv3jedi_test_tier1_dirac_geos_gsi_global (Failed)
        1542 - fv3jedi_test_tier1_3dvar_geos_sondes (Failed)
        1543 - fv3jedi_test_tier1_3dvar_geos_ozone (Failed)
        1544 - fv3jedi_test_tier1_dirac_gfs_gsi_global (Failed)
        1545 - fv3jedi_test_tier1_3dvar_gfs_sondes (Failed)
        1627 - test_soca_setcorscales (Failed)
        1629 - test_soca_parameters_bump_cor_nicas_scales (Failed)
        1632 - test_soca_enspert (Failed)
        1640 - test_soca_sqrtvertloc (Failed)
        1642 - test_soca_dirac_soca_cor_nicas_scales (Failed)
        1644 - test_soca_dirac_socahyb_cov (Failed)
        1646 - test_soca_dirac_soca_mask (Failed)
        1647 - test_soca_dirac_soca_nomask (Failed)
        1655 - test_soca_3dvar_godas (Failed)
The following tests FAILED with Intel (two weeks ago, 20230910):
	406 - saber_test_dirac_gsi_gfs_global_1-1 (Failed)
	1636 - test_soca_sqrtvertloc (Failed)
Errors while running **CTest

The following tests FAILED with GNU (today, 20230922):
	408 - saber_test_dirac_gsi_gfs_global_1-1 (Failed)
	1493 - fv3jedi_test_tier1_convertstate_gfs (Timeout)
	1494 - fv3jedi_test_tier1_convertstate_geos (Timeout)
	1562 - fv3jedi_test_tier1_hyb-fgat_fv3lm (Timeout)
	1641 - test_soca_sqrtvertloc (Failed)
The following tests FAILED:
        407 - saber_test_dirac_gsi_gfs_global_1-1 (Failed)
        1140 - ufo_test_tier1_test_ufo_gmi_qc_filters (Failed)
        1141 - ufo_test_tier1_test_ufo_gmi_qc_filters_geos (Failed)
        1356 - ufo_test_tier1_test_ufo_opr_gmi_crtm (Failed)
        1541 - fv3jedi_test_tier1_dirac_geos_gsi_global (Failed)
        1542 - fv3jedi_test_tier1_3dvar_geos_sondes (Failed)
        1543 - fv3jedi_test_tier1_3dvar_geos_ozone (Failed)
        1544 - fv3jedi_test_tier1_dirac_gfs_gsi_global (Failed)
        1545 - fv3jedi_test_tier1_3dvar_gfs_sondes (Failed)
        1551 - fv3jedi_test_tier1_hyb-3dvar (Failed)
        1576 - fv3jedi_test_tier1_diffstates_gfs (Failed)
        1579 - fv3jedi_test_tier1_addincrement_gfs (Failed)
        1626 - test_soca_setcorscales (Failed)
        1628 - test_soca_parameters_bump_cor_nicas_scales (Failed)
        1631 - test_soca_enspert (Failed)
        1639 - test_soca_sqrtvertloc (Failed)
        1641 - test_soca_dirac_soca_cor_nicas_scales (Failed)
        1643 - test_soca_dirac_socahyb_cov (Failed)
        1645 - test_soca_dirac_soca_mask (Failed)
        1646 - test_soca_dirac_soca_nomask (Failed)
        1654 - test_soca_3dvar_godas (Failed)
The following tests FAILED:
        408 - saber_test_dirac_gsi_gfs_global_1-1 (Failed)
        808 - test_ioda_obsspace (Failed)
        809 - test_ioda_obsspace_reader_pool (Failed)
        811 - test_ioda_obsspace_out_odc (Failed)
        812 - test_ioda_obsspace_out_odc_mpi_2 (Failed)
        813 - test_ioda_obsspace_out_odc_compare_aircraft (Failed)
        814 - test_ioda_obsspace_out_odc_compare_atms (Failed)
        815 - test_ioda_obsspace_out_odc_compare_gmi (Failed)
        816 - test_ioda_obsspace_out_odc_compare_amsua (Failed)
        817 - test_ioda_obsspace_out_odc_compare_sonde (Failed)
        818 - test_ioda_obsspace_out_odc_compare_aod (Failed)
        819 - test_ioda_obsspace_out_odc_compare_iasi (Failed)
        820 - test_ioda_obsspace_out_odc_compare_aod_testdatsets_0000 (Failed)
        821 - test_ioda_obsspace_out_odc_compare_aod_testdatsets_0001 (Failed)
        1019 - ufo_test_tier1_instrument_aircraft_gfs_HofX (Failed)
        1325 - ufo_test_tier1_test_ufo_opr_sfcpcorrected (Failed)
        1541 - fv3jedi_test_tier1_dirac_geos_gsi_global (Failed)
        1542 - fv3jedi_test_tier1_3dvar_geos_sondes (Failed)
        1543 - fv3jedi_test_tier1_3dvar_geos_ozone (Failed)
        1544 - fv3jedi_test_tier1_dirac_gfs_gsi_global (Failed)
        1545 - fv3jedi_test_tier1_3dvar_gfs_sondes (Failed)
        1551 - fv3jedi_test_tier1_hyb-3dvar (Failed)
        1566 - fv3jedi_test_tier1_4denvar_seq (Failed)
        1577 - fv3jedi_test_tier1_diffstates_gfs (Failed)
        1580 - fv3jedi_test_tier1_addincrement_gfs (Failed)
        1627 - test_soca_setcorscales (Failed)
        1629 - test_soca_parameters_bump_cor_nicas_scales (Failed)
        1632 - test_soca_enspert (Failed)
        1640 - test_soca_sqrtvertloc (Failed)
        1642 - test_soca_dirac_soca_cor_nicas_scales (Failed)
        1644 - test_soca_dirac_socahyb_cov (Failed)
        1646 - test_soca_dirac_soca_mask (Failed)
        1647 - test_soca_dirac_soca_nomask (Failed)
        1655 - test_soca_3dvar_godas (Failed)
The following tests FAILED:
        407 - saber_test_dirac_gsi_gfs_global_1-1 (Failed)
        1541 - fv3jedi_test_tier1_dirac_geos_gsi_global (Failed)
        1542 - fv3jedi_test_tier1_3dvar_geos_sondes (Failed)
        1543 - fv3jedi_test_tier1_3dvar_geos_ozone (Failed)
        1544 - fv3jedi_test_tier1_dirac_gfs_gsi_global (Failed)
        1545 - fv3jedi_test_tier1_3dvar_gfs_sondes (Failed)
        1551 - fv3jedi_test_tier1_hyb-3dvar (Failed)
        1576 - fv3jedi_test_tier1_diffstates_gfs (Failed)
        1579 - fv3jedi_test_tier1_addincrement_gfs (Failed)
        1626 - test_soca_setcorscales (Failed)
        1628 - test_soca_parameters_bump_cor_nicas_scales (Failed)
        1631 - test_soca_enspert (Failed)
        1639 - test_soca_sqrtvertloc (Failed)
        1641 - test_soca_dirac_soca_cor_nicas_scales (Failed)
        1643 - test_soca_dirac_socahyb_cov (Failed)
        1645 - test_soca_dirac_soca_mask (Failed)
        1646 - test_soca_dirac_soca_nomask (Failed)
        1654 - test_soca_3dvar_godas (Failed)
  • JCSDA CI container gnu/openmpi, clang/mpich, intel/impi
    • external bison removed
    • top-level directory structure (n/a)
    • 3rdparty libraries and modules (n/a)
    • source cache location (n/a)
    • (re-)built with ip@4.3.0
    • basic tests performed (JCSDA CI)
    • Uploaded to JCSDA NOAA AWS ECR as "latest"
    • Uploaded to dockerhub as "latest" (not Intel due to licensing restrictions)
    • Converted to Singularity and signed
    • Singularity containers uploaded to JCSDA S3 and to sylabs.io (the latter not for Intel due to licensing restrictions)
    • site/config and documentation updated: release/1.5.0: site config and doc updates round 2 (S4, aws-pcluster, Discover, Nautilus, JCSDA CI containers) #775
The following tests FAILED:
        407 - saber_test_dirac_gsi_gfs_global_1-1 (Failed)
        520 - get_crtm_coeffs (Failed)
        1541 - fv3jedi_test_tier1_dirac_geos_gsi_global (Failed)
        1542 - fv3jedi_test_tier1_3dvar_geos_sondes (Failed)
        1543 - fv3jedi_test_tier1_3dvar_geos_ozone (Failed)
        1544 - fv3jedi_test_tier1_dirac_gfs_gsi_global (Failed)
        1545 - fv3jedi_test_tier1_3dvar_gfs_sondes (Failed)
        1626 - test_soca_setcorscales (Failed)
        1628 - test_soca_parameters_bump_cor_nicas_scales (Failed)
        1631 - test_soca_enspert (Failed)
        1639 - test_soca_sqrtvertloc (Failed)
        1641 - test_soca_dirac_soca_cor_nicas_scales (Failed)
        1643 - test_soca_dirac_socahyb_cov (Failed)
        1645 - test_soca_dirac_soca_mask (Failed)
        1646 - test_soca_dirac_soca_nomask (Failed)
        1654 - test_soca_3dvar_godas (Failed)
@climbfuji climbfuji added the INFRA JEDI Infrastructure label Sep 6, 2023
@climbfuji climbfuji moved this to In Progress in spack-stack-1.5.0 (2023 Q3) Sep 6, 2023
@climbfuji climbfuji moved this from In Progress to Todo in spack-stack-1.5.0 (2023 Q3) Sep 6, 2023
@climbfuji climbfuji changed the title Check and update thirdparty, install, and source-cache locations on preconfigured sites for spack-stack-1.5.0 Roll-out spack-stack-1.5.0 Sep 9, 2023
@climbfuji climbfuji moved this from Todo to In Progress in spack-stack-1.5.0 (2023 Q3) Sep 9, 2023
@ulmononian
Copy link
Collaborator

on hera using spack-stack/1.5.0, the cmake step for the s2swa config of the ufs-wm fails:

-- Configuring done
CMake Error at /scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.5.0/envs/unified-env/install/intel/2021.5.0/mapl-2.35.2-rwaq27r/lib64/cmake/MAPL/MAPL-targets.cmake:158 (set_target_properties):
  The link interface of target "MAPL.gridcomps" contains:

    FARGPARSE::fargparse

  but the target was not found.  Possible reasons include:

    * There is a typo in the target name.
    * A find_package call is missing for an IMPORTED target.
    * An ALIAS target is missing.

Call Stack (most recent call first):
  /scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.5.0/envs/unified-env/install/intel/2021.5.0/mapl-2.35.2-rwaq27r/lib64/cmake/MAPL/mapl-config.cmake:74 (include)
  GOCART/CMakeLists.txt:64 (find_package)


CMake Error at /scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.5.0/envs/unified-env/install/intel/2021.5.0/mapl-2.35.2-rwaq27r/lib64/cmake/MAPL/MAPL-targets.cmake:166 (set_target_properties):
  The link interface of target "MAPL.cap" contains:

    FARGPARSE::fargparse

  but the target was not found.  Possible reasons include:

    * There is a typo in the target name.
    * A find_package call is missing for an IMPORTED target.
    * An ALIAS target is missing.

Call Stack (most recent call first):
  /scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.5.0/envs/unified-env/install/intel/2021.5.0/mapl-2.35.2-rwaq27r/lib64/cmake/MAPL/mapl-config.cmake:74 (include)
  GOCART/CMakeLists.txt:64 (find_package)


-- Generating done
CMake Generate step failed.  Build files cannot be regenerated correctly.

@AlexanderRichert-NOAA
Copy link
Collaborator

Blergh. Is there a reason we didn't go back to keep fargparse disabled for MAPL? In 1.4.1 the package default was for it to be off.

@climbfuji
Copy link
Collaborator Author

See the UFS weather model esmf 8.5.0 testing issue. I have branches that fix those problems.

@ulmononian
Copy link
Collaborator

this was w/ esmf 8.4.2, though...am i missing something?

@climbfuji
Copy link
Collaborator Author

The variants fargparse and pflogger being turned on by default in the new build recipes? Affects older mapl versions, too.

@ulmononian
Copy link
Collaborator

so without the fixes from the branches you alluded to, we should not expect the ufs wm to run with spack-stack/1.5.0?

@climbfuji
Copy link
Collaborator Author

I think that is the case, yes. It's a tiny bug fix in GOCART, not more.

@climbfuji
Copy link
Collaborator Author

Closed via #787

@github-project-automation github-project-automation bot moved this from In Progress to Done in spack-stack-1.5.0 (2023 Q3) Sep 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
INFRA JEDI Infrastructure
Projects
No open projects
3 participants