Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spack-stacks on Hera with gnu/13.3.x and openmpi/4.1.6 built for Rocky8 #1090

Closed
natalie-perlin opened this issue Apr 22, 2024 · 17 comments
Closed
Assignees
Labels
NOAA-EMC OAR-EPIC NOAA Oceanic and Atmospheric Research and Earth Prediction Innovation Center

Comments

@natalie-perlin
Copy link
Collaborator

natalie-perlin commented Apr 22, 2024

Issues have been reported with Weather Model runs on Hera, with Gnu compiler, after transition to Rocky8.

ufs-community/ufs-weather-model#2200

The compiler used was still gnu/9.2.0

** Offering a solution **
To build an updated gnu and openmpi on Rocky 8 system.

A new installation of gnu/13.2.0 compiler suite (gcc, g++, gfortran) is built on Hera/ Rocky 8 system, in a separate role.epic location. Openmpi/1.4.6 is built with this new gnu/13.2.0 compiler as well.

Installation locations for gnu and openmpi:
/scratch2/NCEPDEV/stmp1/role.epic/installs/gnu/13.2.0
/scratch2/NCEPDEV/stmp1/role.epic/installs/openmpi/4.1.6
Modules to be loaded as following:

module use /scratch2/NCEPDEV/stmp1/role.epic/installs/gnu/modulefiles
module load gnu/13.2.0
module use /scratch2/NCEPDEV/stmp1/role.epic/installs/openmpi/modulefiles
module load openmpi/4.1.6

Building spack-stack-1.5.1 environment in
/scratch2/NCEPDEV/stmp1/role.epic/spack-stack/spack-stack-1.5.1/envs/uni fied-env-rocky8.

The following modules need to be loading the modules before building the spack-stack environment:

module use /scratch1/NCEPDEV/nems/role.epic/modulefiles
module load qt mysql ecflow
module use /scratch2/NCEPDEV/stmp1/role.epic/installs/gnu/modulefiles
module load gnu/13.2.0
module use /scratch2/NCEPDEV/stmp1/role.epic/installs/openmpi/modulefiles
module load openmpi/4.1.6

(NB: order of modulepaths matter here, as the /scratch1/NCEPDEV/nems/role.epic/modulefiles has another openmpi/4.1.6 modulefile from an earlier installation and compiler)

94 packages were successfully built in this unified-env-rocky8 environment, but getting some errors in others, i.e., proj-8.1.0. These errors could be likely related to different standards for c++, as shown below. The option -std=c++11 is currently used as appears in the logs, and I wonder if -std=c++98 could possibly help.

A question: is it a good idea to try setting -std=c++98 for the entire stack build, or try specifying it only for certain packages that report errors? Any other ideas?

Most recent log file:/scratch2/NCEPDEV/stmp1/role.epic/spack-stack/spack-stack-1.5.1/envs/unified-env-rocky8/log.install.unified-env.003
(cache cleaned up, however, to have other options tested)

Errors as following:

In file included from /scratch2/NCEPDEV/stmp1/role.epic/spack-stack/spack-stack-1.5.1/cache/build_stage/spack-stage-proj-8.1.0-ycbrsv4ltfamsxni6n5prycyk5nnpukh/spack-src/src/proj_json_streaming_writer.cpp:34:
/scratch2/NCEPDEV/stmp1/role.epic/spack-stack/spack-stack-1.5.1/cache/build_stage/spack-stage-proj-8.1.0-ycbrsv4ltfamsxni6n5prycyk5nnpukh/spack-src/src/proj_json_streaming_writer.hpp:42:14: error: 'int64_t' in namespace 'std' does not name a type
   42 | typedef std::int64_t GIntBig;
      |              ^~~~~~~
/scratch2/NCEPDEV/stmp1/role.epic/spack-stack/spack-stack-1.5.1/cache/build_stage/spack-stage-proj-8.1.0-ycbrsv4ltfamsxni6n5prycyk5nnpukh/spack-src/src/proj_json_streaming_writer.hpp:43:14: error: 'uint64_t' in namespace 'std' does not name a type; did you mean 'wint_t'?
   43 | typedef std::uint64_t GUInt64;
      |              ^~~~~~~~
      |              wint_t
/scratch2/NCEPDEV/stmp1/role.epic/spack-stack/spack-stack-1.5.1/cache/build_stage/spack-stage-proj-8.1.0-ycbrsv4ltfamsxni6n5prycyk5nnpukh/spack-src/src/proj_json_streaming_writer.hpp:93:14: error: 'GIntBig' has not been declared
   93 |     void Add(GIntBig nVal);
      |              ^~~~~~~
/scratch2/NCEPDEV/stmp1/role.epic/spack-stack/spack-stack-1.5.1/cache/build_stage/spack-stage-proj-8.1.0-ycbrsv4ltfamsxni6n5prycyk5nnpukh/spack-src/src/proj_json_streaming_writer.hpp:93:10: error: 'void osgeo::proj::CPLJSonStreamingWriter::Add(int)' cannot be overloaded with 'void osgeo::proj::CPLJSonStreamingWriter::Add(int)'
   93 |     void Add(GIntBig nVal);
      |          ^~~
/scratch2/NCEPDEV/stmp1/role.epic/spack-stack/spack-stack-1.5.1/cache/build_stage/spack-stage-proj-8.1.0-ycbrsv4ltfamsxni6n5prycyk5nnpukh/spack-src/src/proj_json_streaming_writer.hpp:91:10: note: previous declaration 'void osgeo::proj::CPLJSonStreamingWriter::Add(int)'
   91 |     void Add(int nVal) { Add(static_cast<GIntBig>(nVal)); }
      |          ^~~
/scratch2/NCEPDEV/stmp1/role.epic/spack-stack/spack-stack-1.5.1/cache/build_stage/spack-stage-proj-8.1.0-ycbrsv4ltfamsxni6n5prycyk5nnpukh/spack-src/src/proj_json_streaming_writer.hpp:94:14: error: 'GUInt64' has not been declared
   94 |     void Add(GUInt64 nVal);

... (more similar errors)...

make[2]: *** [src/CMakeFiles/proj.dir/build.make:695: src/CMakeFiles/proj.dir/proj_json_streaming_writer.cpp.o] Error 1
make[2]: Leaving directory '/scratch2/NCEPDEV/stmp1/role.epic/spack-stack/spack-stack-1.5.1/cache/build_stage/spack-stage-proj-8.1.0-ycbrsv4ltfamsxni6n5prycyk5nnpukh/spack-build-ycbrsv4'
make[1]: *** [CMakeFiles/Makefile2:282: src/CMakeFiles/proj.dir/all] Error 2
make[1]: Leaving directory '/scratch2/NCEPDEV/stmp1/role.epic/spack-stack/spack-stack-1.5.1/cache/build_stage/spack-stage-proj-8.1.0-ycbrsv4ltfamsxni6n5prycyk5nnpukh/spack-build-ycbrsv4'
make: *** [Makefile:159: all] Error 2
==> Error: ProcessError: Command exited with status 2:
  '/usr/bin/make' '-j1'

Attached is the most recent spack.lock and the installation build log.

spack.lock.hera.rocky8.txt
log.install.unified-env.003.txt

######### Summaries on building gnu/13.2.0, openmpi/4.1.6:

Hera.Rocky8.gcc-13.2.0.txt

Hera.Rocky8.openmpi.4.1.6.txt

@natalie-perlin
Copy link
Collaborator Author

natalie-perlin commented Apr 23, 2024

Found issues reported with spack building proj-8.1.0:
https://github.com/spack/spack/issues/42775

"Installation issue: proj #42775"

UPDATE: Version proj-9.1.1 installs successfully with no additional specs changes.

@natalie-perlin
Copy link
Collaborator Author

Another package build fails: odc-1.4.6

spack-build-out.txt

Some of the errors:

/scratch2/NCEPDEV/stmp1/role.epic/spack-stack/spack-stack-1.5.1/cache/build_stage/spack-stage-odc-1.4.6-mxdwbhg42464z62avdq3rn2np6qfbzbo/spack-src/src/odc/api/StridedData.h: In member function 'void odc::api::StridedDataT<value_type>::fill(int, int)':
/scratch2/NCEPDEV/stmp1/role.epic/spack-stack/spack-stack-1.5.1/cache/build_stage/spack-stage-odc-1.4.6-mxdwbhg42464z62avdq3rn2np6qfbzbo/spack-src/src/odc/api/StridedData.h:190:40: error: 'uint64_t' does not name a type
 190 |             std::fill(reinterpret_cast<uint64_t*>(get(sourceRow+1)),
     |                                        ^~~~~~~~
/scratch2/NCEPDEV/stmp1/role.epic/spack-stack/spack-stack-1.5.1/cache/build_stage/spack-stage-odc-1.4.6-mxdwbhg42464z62avdq3rn2np6qfbzbo/spack-src/src/odc/api/StridedData.h:23:1: note: 'uint64_t' is defined in header '<cstdint>'; did you forget to '#include <cstdint>'?
  22 | #include "eckit/exception/Exceptions.h"
 +++ |+#include <cstdint>
  23 |
/scratch2/NCEPDEV/stmp1/role.epic/spack-stack/spack-stack-1.5.1/cache/build_stage/spack-stage-odc-1.4.6-mxdwbhg42464z62avdq3rn2np6qfbzbo/spack-src/src/odc/api/StridedData.h:190:48: error: expected '>' before '*' token
 190 |             std::fill(reinterpret_cast<uint64_t*>(get(sourceRow+1)),
     |                                                ^

@climbfuji climbfuji added INFRA JEDI Infrastructure NOAA-EMC OAR-EPIC NOAA Oceanic and Atmospheric Research and Earth Prediction Innovation Center labels Apr 23, 2024
@climbfuji climbfuji removed the INFRA JEDI Infrastructure label May 13, 2024
@natalie-perlin
Copy link
Collaborator Author

natalie-perlin commented May 22, 2024

Updates:

  1. Spack-stacks v1.6.0 are built successfully with either gnu/13.2.0 and openmpi/4.1.6 with system-wide installs and with EPIC-installs, with modules accessible from /scratch2/NCEPDEV/stmp1/role.epic/installs/[gnu/openmpi]/modulefiles. Updated ESMF/8.1.6 and MAPL/2.46.0 were also tested with gnu/13.2.0.

Spack-stack/1.6.0 for ufs-wm-env and ufs-srw-env (not the entire unified-env) were built successfully for custom-installed gnu-13.2.0 and openmpi-4.1.6, as well as with system-wide installation of gnu-13.2.0 and openmpi-4.1.6. These included esmf/8.1.6 and mapl/2.46.0.

Building ufs-weather-model, however, gives internal compiler error (ICE) while either of these two stacks is used.
Searched for the bug list in https://gcc.gnu.org/bugzilla, found a similar one that appears to be resolved when gnu-14.1.0 is used.

The gnu/14.1.0 is built under
/scratch2/NCEPDEV/stmp1/role.epic/installs/gnu/14.1.0, and the openmpi/4.1.6.
However, building spack-stack with this combination produced another error.

Opened a bug report in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115107

  1. Reinstalled gnu/14.1.0 (a previous installation got corrupted) and openmpi/4.1.6 , revised configuration of spack-stack. Still the same error with [-Wimplicit-function-declaration], which came up as a warning in gnu/13.2.0. Details in a compiler bug report listed above.
module use /scratch2/NCEPDEV/stmp1/role.epic/installs/gnu/modulefiles
module load gnu/14.1.0
module use /scratch2/NCEPDEV/stmp1/role.epic/installs/openmpi/modulefiles
module load openmpi/4.1.6_gnu14

Found a workaround for gnu/14.1.0 gcc warnings/errors issues: using cflag="-Wno-implicit-function-declaration ..." helped to avoid errors during the build.

The following "cflags" was needed to build successfully a package (wgrib2, in particular) that had troubles being compiled:

cflags="Wno-deprecated-declarations -Wno-implicit-function-declaration -Wno-incompatible-pointer-types -Wno-implicit-int"

  1. Building gcc and intel packages in the same spack environment does not appear to work. Due to additional flags for the linking stage including gcc libraries, linking fails for the intel.
    I'm currently building separate environments for the gcc/14.1.0 and intel on Hera.

The stack environments are built under /scratch2/NCEPDEV/stmp1/role.epic/spack-stack/spack-stack-1.6.0_gnu14/

@natalie-perlin
Copy link
Collaborator Author

natalie-perlin commented May 23, 2024

A gnu compiler version 13.3.0 has been released as well, which it is claimed having an internal compiler error fixed; internal compiler error appeared in gnu/13.2.0.
Openmpi 4.1.6 installed as well.

module use /scratch2/NCEPDEV/stmp1/role.epic/installs/gnu/modulefiles
module load gnu/13.3.0
module use /scratch2/NCEPDEV/stmp1/role.epic/installs/openmpi/modulefiles
module load openmpi/4.1.6_gnu13.3

@natalie-perlin
Copy link
Collaborator Author

natalie-perlin commented May 28, 2024

Update for gnu/13.2.0 bug, fixed in 13.3.0: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115107#c15

Locations of the spack-stacks (NB: packages for UFS-WM and UFS-SRW only!)

/scratch2/NCEPDEV/stmp1/role.epic/spack-stack/spack-stack-1.6.0_gnu13.3/envs/ufs-wm-srw-rocky8
/scratch2/NCEPDEV/stmp1/role.epic/spack-stack/spack-stack-1.5.1/envs/ufs-wm-srw-rocky8/

Added a comment on ufs-weather-model repository on the tests with spack-stack/1.5.1 and spack-stack-1.6.0 (for ufs-weather-model and ufs-srweather-app packages only):

ufs-community/ufs-weather-model#2263 (comment)

@natalie-perlin
Copy link
Collaborator Author

Update for ufs-weather-model, spack-stack-1.6.0 with gnu/13.3.0 and openmpi/4.1.6:
Hera regression tests passed successfully for GNU:
https://github.com/ufs-community/ufs-weather-model/pull/2093#issuecomment-2143694396

@climbfuji climbfuji changed the title Spack-stacks on Hera with gnu/13.2.0 and openmpi/4.1.6 built for Rocky8 Spack-stacks on Hera with gnu/13.3.x and openmpi/4.1.6 built for Rocky8 Jun 11, 2024
@climbfuji
Copy link
Collaborator

Is this all done, can the issue be closed?

@natalie-perlin
Copy link
Collaborator Author

Is this all done, can the issue be closed?

This was built only for the UFS WM and SRW packages. Still needs solutions for a couple of other packages for the ue-dev, getting back to the issue!

@climbfuji
Copy link
Collaborator

Well, I built the unified environment with gcc@13 and committed the spack submodule pointer updates. The only thing missing is to bump odc from 1.4.6 to 1.5.2, which I am doing in #759.

@natalie-perlin
Copy link
Collaborator Author

Well, I built the unified environment with gcc@13 and committed the spack submodule pointer updates. The only thing missing is to bump odc from 1.4.6 to 1.5.2, which I am doing in #759.

Oh, that's awesome! On which platforms is the unified env. installed with the gcc@13, where could I check the logs how the installation issues were resolved??. I wonder if the issue with the gcc@13 could be considered solved with the gcc.gnu.org/bugzilla...

@climbfuji
Copy link
Collaborator

I did that on my laptop (repeatedly)

@natalie-perlin
Copy link
Collaborator Author

I did that on my laptop (repeatedly)
Oh, was it for CentOS, or Rocky, and with what MPI version? openmpi/4.1.6 ?

@climbfuji
Copy link
Collaborator

Rocky Linux 9
gcc@13.3.0
openmpi@5.0.3

@natalie-perlin
Copy link
Collaborator Author

Finding the same error as found earlier with odc/1.4.6 when built with the gcc@13.3:

.6-cg3l6ixmpiwjwpq25xainaluso43s5ef/spack-src/src/odc/api/StridedData.h:190:40: error: 'uint64_t' does not name a type
  190 |             std::fill(reinterpret_cast<uint64_t*>(get(sourceRow+1)),
      |                                        ^~~~~~~~
/scratch2/NCEPDEV/stmp1/role.epic/spack-stack/spack-stack-1.6.0_gnu13/cache/build_stage/spack-stage-odc-1.4.6-cg3l6ixmpiwjwpq25xainaluso43s5ef/spack-src/src/odc/api/StridedData.h:23:1: note: 'uint64_t' is defined in header '<cstdint>'; did you forget to '#include <cstdint>'?
   22 | #include "eckit/exception/Exceptions.h"
  +++ |+#include <cstdint>
   23 |
/scratch2/NCEPDEV/stmp1/role.epic/spack-stack/spack-stack-1.6.0_gnu13/cache/build_stage/spack-stage-odc-1.4.6-cg3l6ixmpiwjwpq25xainaluso43s5ef/spack-src/src/odc/api/StridedData.h:190:48: error: expected '>' before '*' token
  190 |             std::fill(reinterpret_cast<uint64_t*>(get(sourceRow+1)),
      |                                                ^
/scratch2/NCEPDEV/stmp1/role.epic/spack-stack/spack-stack-1.6.0_gnu13/cache/build_stage/spack-stage-odc-1.4.6-cg3l6ixmpiwjwpq25xainaluso43s5ef/spack-src/src/odc/api/StridedData.h:190:48: error: expected '(' before '*' token
  190 |             std::fill(reinterpret_cast<uint64_t*>(get(sourceRow+1)),
      |                                                ^
      |                                                (
/scratch2/NCEPDEV/stmp1/role.epic/spack-stack/spack-stack-1.6.0_gnu13/cache/build_stage/spack-stage-odc-1.4.6-cg3l6ixmpiwjwpq25xainaluso43s5ef/spack-src/src/odc/api/StridedData.h:190:49: error: expected primary-expression before '>' token
  190 |             std::fill(reinterpret_cast<uint64_t*>(get(sourceRow+1)),
      |                                                 ^
/scratch2/NCEPDEV/stmp1/role.epic/spack-stack/spack-stack-1.6.0_gnu13/cache/build_stage/spack-stage-odc-1.4.6-cg3l6ixmpiwjwpq25xainaluso43s5ef/spack-src/src/odc/api/StridedData.h:191:40: error: 'uint64_t' does not name a type
  191 |                       reinterpret_cast<uint64_t*>(get(finalRow+1)),
      |                                        ^~~~~~~~
/scratch2/NCEPDEV/stmp1/role.epic/spack-stack/spack-stack-1.6.0_gnu13/cache/build_stage/spack-stage-odc-1.4.6-cg3l6ixmpiwjwpq25xainaluso43s5ef/spack-src/src/odc/api/StridedData.h:191:40: note: 'uint64_t' is defined in header '<cstdint>'; did you forget to '#include <cstdint>'?
/scratch2/NCEPDEV/stmp1/role.epic/spack-stack/spack-stack-1.6.0_gnu13/cache/build_stage/spack-stage-odc-1.4.6-cg3l6ixmpiwjwpq25xainaluso43s5ef/spack-src/src/odc/api/StridedData.h:191:48: error: expected '>' before '*' token
  191 |                       reinterpret_cast<uint64_t*>(get(finalRow+1)),
      |                                                ^
/scratch2/NCEPDEV/stmp1/role.epic/spack-stack/spack-stack-1.6.0_gnu13/cache/build_stage/spack-stage-odc-1.4.6-cg3l6ixmpiwjwpq25xainaluso43s5ef/spack-src/src/odc/api/StridedData.h:191:48: error: expected '(' before '*' token
  191 |                       reinterpret_cast<uint64_t*>(get(finalRow+1)),
  ....
  make[2]: *** [src/odc/CMakeFiles/odccore.dir/build.make:471: src/odc/CMakeFiles/odccore.dir/core/DecodeTarget.cc.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[2]: Leaving directory '/scratch2/NCEPDEV/stmp1/role.epic/spack-stack/spack-stack-1.6.0_gnu13/cache/build_stage/spack-stage-odc-1.4.6-cg3l6ixmpiwjwpq25xainaluso43s5ef/spack-build-cg3l6ix'
...

The odc build is made with -std=gnu++11 option.
I attempted to set a higher-version standard, i.e., -std=c++14 by adding cxxflags in ./common/packages.yaml as follows:

odc:
version: ['1.4.6']
variants: ~fortran
require:
- any_of: ['cxxflags="-std=c++14"']

It did record the flag during concretization:

 -   cg3l6ix          ^odc@1.4.6%gcc@13.3.0 cxxflags="-std=c++14" ~fortran~ipo build_system=cmake build_type=Release generator=make arch=linux-rocky8-haswell

However, odc was still being built with the "-std=gnu++11" during the install phase.

Any suggestions will be helpful!

@climbfuji
Copy link
Collaborator

I mentioned this a few comments above! You need to bump odc to 1.5.2 - the spack-stack PR that does the update still hasn't been merged.

@natalie-perlin
Copy link
Collaborator Author

@climbfuji - thank you, all is great now! I haven't realized bumping odc to 1.5.2 was part of the solution. The unified environment for spack-stack-1.6.0 has built successfully to completion on Hera Rocky 8 (gnu@13.3.0, openmpi@1.4.6). The env. is located: /scratch2/NCEPDEV/stmp1/role.epic/spack-stack/spack-stack-1.6.0_gnu13/envs/ue-gcc13-openmpi146

I've tested this environment for building SRW and launched a single test; it is successfully running.

This issue could be closed.

@climbfuji
Copy link
Collaborator

@climbfuji - thank you, all is great now! I haven't realized bumping odc to 1.5.2 was part of the solution. The unified environment for spack-stack-1.6.0 has built successfully to completion on Hera Rocky 8 (gnu@13.3.0, openmpi@1.4.6). The env. is located: /scratch2/NCEPDEV/stmp1/role.epic/spack-stack/spack-stack-1.6.0_gnu13/envs/ue-gcc13-openmpi146

I've tested this environment for building SRW and launched a single test; it is successfully running.

This issue could be closed.

More than happy to do that! Thanks for testing in spack-stack-1.6.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NOAA-EMC OAR-EPIC NOAA Oceanic and Atmospheric Research and Earth Prediction Innovation Center
Projects
None yet
Development

No branches or pull requests

2 participants