-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating hpc-stack modules and miniconda locations for Hera, Gaea, Cheyenne, Orion, Jet #1465
Comments
@natalie-perlin Can you make sure all compiler and library versions are confirmed against https://github.com/ufs-community/ufs-weather-model/tree/develop/modulefiles ? |
@ulmononian can we coordinate about intel/gnu/openmpi to hera on this issue? |
@jkbk2004 The PRs have not been made yet to address the changes in modulefiles for the ufs-weather-model, only for the ufs-srweather-app |
The modulefiles for Hera and Jet to use the intel/2022.1.2 version, and not the latest 2022.2.0, version have been built. Updating the info in the top comment of this issue. |
Can somebody please build the gnu hpc-stack on hera and cheyenne using openmpi. Thanks. |
@DusanJovic-NOAA @jkbk2004 here is a build i did in the past w/ gnu-9.2.0 & openmpi-3.1.4 on hera: module use /scratch1/NCEPDEV/stmp2/Cameron.Book/hpcs_work/libs/gnu/stack_noaa/modulefiles/stack |
Thanks @ulmononian. I also have the gnu/openmpi stack built in my own space. What I was asking is the installation in officially supported location so that we can update modulefiles in develop branch. |
@ulmononian would you please also create an issue hpc-stack on upp repo (https://github.com/noaa-emc/upp). Also other workflow (global workflow, HAFS workflow) may also be impacted by this change. @WenMeng-NOAA @aerorahul @WalterKolczynski-NOAA @KateFriedman-NOAA @BinLiu-NOAA FYI. |
@junwang-noaa @ulmononian @WenMeng-NOAA @aerorahul @WalterKolczynski-NOAA @KateFriedman-NOAA @BinLiu-NOAA @natalie-perlin I noticed that Kyle's old stack installations are still used in other applications and some machines. I started a coordination on EPIC side. It may take a week or two to finish the full transition. I want to combine this issue with the other library update follow-ups on-going: netcdf/esmf, etc. |
@jkbk2004 Can you install g2tmpl/1.10.2 for the UPP? Thanks! |
@WenMeng-NOAA g2tmpl/1.10.2 is available (current ufs-wm modulefiles) but backward comparability issue was captured at issue #1441. |
@DusanJovic-NOAA - hpc-stack with gnu/9.2.0+mpich/3.3.2 and gnu/10.2.0+mpich/3.3.2 have been installed on Hera under role.epic account (EPIC-managed space). Testing them with ufs-weather-model-RTs, and plan to include these Hera-gnu into the module updates. The stack installation locations are: Exact modifications to the modulefiles (paths needed for finding all the modules) will be listed in a subsequent PR(s). |
@natalie-perlin Is anyone going to provide gnu/openmpi stack? |
@ulmononian can you install gnu/openmpi parallel to the location above? |
@jkbk2004 - do we need al four possible combinations for compilers (gnu/9.2.0, gnu/10.2.0) with mpich/3.3.2 , openmpi/4.1.2 ? |
@natalie-perlin I think @ulmononian has installed gnu10.1/openmpi. That should be good enough as a starting point for openmpi option. But it makes a sense to set openmpi installation available along with the role account path. |
@jkbk2004, @ulmonian - gnu/9.2.0 + mpich/3.3.2 + netcdf/4.7.4 The updates of the stack locations are made in the top comment of this Issue-1465 |
Added a stack build with the intel compiler and netcdf4.9 on Hera (see the list of locations in the top comment) |
@DusanJovic-NOAA @jkbk2004 @natalie-perlin i will install the stack w/ gnu-9.2 and openmpi-3.1.4 here /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs shortly, as well as w/ gnu-10.1 & openmpi-3.1.4 in the official location. |
@DusanJovic-NOAA @jkbk2004 @natalie-perlin hpc-stack built w/ gnu-9.2 and openmpi-3.1.4 was installed successfully here: /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/gnu-9.2_openmpi-3.1.4. |
I tried running the regression test using gnu-9.2_openmpi-3.1.4 stack but it failed because the debug version of esmf library is missing:
|
I also tried 'gnu-10.2_openmpi' stack, but it looks like when I load it, it does not actually load gnu 10.2 module, I see:
note, there is no gnu/10.2 module loaded. When I run gcc I see the compiler is version 4.8.5:
I think this is because, in gnu-10.2_openmpi/modulefiles/core/hpc-gnu/10.2.lua, two lines:
are missing:
which are present in:
|
There is also unnecessary inconsistency in the naming of hpc-gnu module between two versions:
Why '10.2' and not '10.2.0'? Also the 9.2 stack directory name has openmpi version, while directory for 10.2 stack does not. |
my apologies, @DusanJovic-NOAA i will install esmf/8.3.0b09-debug in /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/gnu-9.2_openmpi-3.1.4 now and update you when it is finished. we will also address the inconsistency in naming convention and look into the gnu-10.2 modulefile. thank you for testing w/ these stacks. |
@DusanJovic-NOAA the stack at /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/gnu-9.2_openmpi-3.1.4 has been updated to include esmf/8.3.0b09-debug. i was able to load ufs_common_debug.lua, so hopefully it works for you now! |
@DusanJovic-NOAA, @ulmononian - please note that the GNU 10.2.0 is not installed system-wide on Hera, and only installed locally in EPIC space. It could be build under the current hpc-stack for a particular compiler-gnu-netcdf installation location, but because the compiler is shared between several of such combinations, it is moved to a common location outside a given hpc-stack installation. Please note that directions to load the compilers and stack given in the first comment address the way the compiler is loaded! For example, |
@ulmononian Thanks for adding the debug build of esmf. I ran control and control_debug regression tests, both finished successfully. The control tests outputs are not bit identical to the baseline, contol_debug are identical. I guess this is expected due to different MPI library. |
@natalie-perlin I tried to run control and control_debug tests after loading gnu module form the location above (thanks for explaining this, I missed that in the description). The control test compiled successfuly, but failed at run time:
|
Debug version of esmf is missing in gnu-10.2_openmpi stack:
|
@natalie-perlin The SRW App was tested on Hera using the It would be interesting to see the differences between the two stacks and see why one version works while the other doesn't. |
I think that this is related to this issue: NCAR/ccpp-physics#980 |
Thanks, @grantfirl! Yes, I was seeing the same issue as described in NCAR/ccpp-physics#980. It is nice to see that this won't be an issue once the stack on Hera is transitioned to @natalie-perlin's new stack. |
Hi @MichaelLueken I've tested Natalie's instructions above for loading conda/python and hpc-modules on Hera, Gaea, Cheyenne, Orion and Jet. I did not have any issues. |
Thanks, @zach1221! That's great news! Once the updated hpc-stack locations for Gaea, Cheyenne, Orion, and Jet are updated in the weather-model, @natalie-perlin will be able to update the locations in the SRW App. |
@natalie-perlin can crtm and gftl-shared be updated to crtm/2.4.0 and gftl-shared/v1.5.0 on Jet? Currently it seems your new module stack location has only crtm/2.3.0 and gftl-shared/1.3.3. |
@MichaelLueken @zach1221 - |
Hi, @natalie-perlin |
@zach1221 @MichaelLueken - Yes, ran the SRW tests with the new stack on Jet.
The four experiments:
|
@natalie-perlin Thanks! I was able to successfully build and run the SRW App's fundamental WE2E tests on Jet using the new HPC-stack location (the run_fcst job even ran using vjet, which would have led to the job failing previously). |
@jkbk2004 - Gaea modules were not updated |
Description
Update the locations of the hpc-stack modules and miniconda3 for compiling and running the UFS-weather-model on NOAA HPC systems, such as Hera, Gaea, Cheyenne, Orion, Jet. The modules are installed under role.epic account and placed in a common EPIC-managed space on each system. Gaea also uses the Lmod installed locally in the same common location (ufs-srweather-app/PR-352, ufs-weather-app/PR-353), and needs to run a script to initialize Lmod before loading a modulefile ufs_gaea.intel.lua. While ufs-weather model may use/require python to a lesser extent, the UFS-srweather-app relies heavily on conda environment.
For ease of maintenance of the libraries on the NOAA HPC systems, transition to new location of the modules built for both ufs-weather-model and ufs-srweather-app is needed.
Solution
Repo of the ufs-weather-model to be updated with the new version of miniconda and hpc libraries.
Udated installation locations have been used to load the modules listed in
/ufs-weather-model/modulefiles/ufs_common
and build the ufs model binaries.Hera gnu compilers includ
UPD. 10/20/2022: Modules for Hera and Jet have been build for the already tested compiler intel/2022.1.2. Modules for the compiler/impi intel/2022.2.0 also remained and could be used when the upgrade is needed.
UPD. 10/24/2022: Modules for Hera gnu compilers (9.2.0, 10.2.0) and different mpich/openmpi combinations, and updated netcdf/4.9.0 have been prepared.
UPD. 12/07/2022: Added gnu/10.1.0-based hpc-stack on Cheyenne, by a request
UPD. 12/07/2022: Added gnu/10.1.0-based hpc-stack on Cheyenne with mpt/2.22, by a request
Cheyenne Lmod has been upgraded to v.8.7.13 systemwide after system maintenance on 10/21/2022.
Alternatives
Alternative solutions could be to have the hpc libraries and modules built in separate locations for the ufs-weather-model and ufs-srweather-app. The request from EPIC management, however, was to use a common location for the all the libraries.
Related to
A PR-419 in the ufs-srweather-model already exists, and a new PR will be made to the current repo.
top priority
Updated locations to load the conda/python and hpc-modules and how to load them on all the systems:
Hera python/miniconda :
module use /scratch1/NCEPDEV/nems/role.epic/miniconda3/modulefiles
module load miniconda3/4.12.0
Hera intel/2022.1.2 + impi/2022.1.2 :
module load intel/2022.1.2
module load impi/2022.1.2
use /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/intel-2022.1.2/modulefiles/stack
module load hpc/1.2.0
module load hpc-intel/2022.1.2
module load hpc-impi/2022.1.2
Hera intel/2022.1.2 + impi/2022.1.2 + netcdf-c 4.9.0:
module load intel/2022.1.2
module load impi/2022.1.2
use /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/intel-2022.1.2_ncdf49/modulefiles/stack
module load hpc/1.2.0
module load hpc-intel/2022.1.2
module load hpc-impi/2022.1.2
Hera gnu/9.2 + mpich/3.3.2 :
module load gnu/9.2
module use /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/gnu-9.2/modulefiles/stack
module load hpc/1.2.0
module load hpc-gnu/9.2
module load mpich/3.3.2
module load hpc-mpich/3.3.2
Hera gnu/10.2 + mpich/3.3.2 :
module use /scratch1/NCEPDEV/nems/role.epic/gnu/modulefiles
module load gnu/10.2.0
module use /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/gnu-10.2/modulefiles/stack
module load hpc/1.2.0
module load hpc-gnu/10.2
module load mpich/3.3.2
module load hpc-mpich/3.3.2
Hera gnu/10.2 + openmpi/4.1.2 :
module use /scratch1/NCEPDEV/nems/role.epic/gnu/modulefiles
module load gnu/10.2.0
module use /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/gnu-10.2_openmpi/modulefiles/stack
module load hpc/1.2.0
module load hpc-gnu/10.2
module load openmpi/4.1.2
module load hpc-openmpi/4.1.2
Hera gnu/9.2 + mpich/3.3.2 + netcdf-c 4.9.0:
module load gnu/9.2
module use /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/gnu-9.2_ncdf49/modulefiles/stack
module load hpc/1.2.0
module load hpc-gnu/9.2
module load mpich/3.3.2
module load hpc-mpich/3.3.2
Hera gnu/10.2 + mpich/3.3.2 + netcdf-c/4.9.0:
module use /scratch1/NCEPDEV/nems/role.epic/gnu/modulefiles
module load gnu/10.2.0
module use /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/gnu-10.2_ncdf49/modulefiles/stack
module load hpc/1.2.0
module load hpc-gnu/10.2
module load mpich/3.3.2
module load hpc-mpich/3.3.2
Gaea miniconda:
:module use /lustre/f2/dev/role.epic/contrib/modulefiles
module load miniconda3/4.12.0
Gaea intel:
Lmod initialization on Gaea needs to be done first by sourcing the following script:
/lustre/f2/dev/role.epic/contrib/Lmod_init.sh
module use /lustre/f2/dev/role.epic/contrib/modulefiles
module load miniconda3/4.12.0
module use /lustre/f2/dev/role.epic/contrib/hpc-stack/intel-2021.3.0/modulefiles/stack
module load hpc/1.2.0
module load intel/2021.3.0
module load hpc-intel/2021.3.0
module load hpc-cray-mpich/7.7.11
Cheyenne miniconda:
module use /glade/work/epicufsrt/contrib/miniconda3/modulefiles
module load miniconda3/4.12.0
Cheyenne intel
:module use /glade/work/epicufsrt/contrib/miniconda3/modulefiles
module load miniconda3/4.12.0
module use /glade/work/epicufsrt/contrib/hpc-stack/intel2022.1/modulefiles/stack
module load hpc/1.2.0
module load hpc-intel/2022.1
module load hpc-mpt/2.25
Cheyenne gnu/10.1.0_mpt2.22
:module use /glade/work/epicufsrt/contrib/hpc-stack/gnu10.1.0_mpt2.22/modulefiles/stack
module load hpc/1.2.0
module load hpc-gnu/10.1.0
module load hpc-mpt/2.22
Cheyenne gnu/10.1.0
:module use /glade/work/epicufsrt/contrib/hpc-stack/gnu10.1.0/modulefiles/stack
module load hpc/1.2.0
module load hpc-gnu/10.1.0
module load hpc-mpt/2.25
Cheyenne gnu/11.2.0
:module use /glade/work/epicufsrt/contrib/hpc-stack/gnu11.2.0/modulefiles/stack
module load hpc/1.2.0
module load hpc-gnu/11.2.0
module load hpc-mpt/2.25
Orion miniconda:
module use /work/noaa/epic-ps/role-epic-ps/miniconda3/modulefiles
module load miniconda3/4.12.0
Orion intel:
module use /work/noaa/epic-ps/role-epic-ps/miniconda3/modulefiles
module load miniconda3/4.12.0
module use /work/noaa/epic-ps/role-epic-ps/hpc-stack/libs/intel-2022.1.2/modulefiles/stack
module load hpc/1.2.0
module load hpc-intel/2022.1.2
module load hpc-impi/2022.1.2
Jet miniconda:
module use /mnt/lfs4/HFIP/hfv3gfs/role.epic/miniconda3/modulefiles
module load miniconda3/4.12.0
Jet intel
:module use /mnt/lfs4/HFIP/hfv3gfs/role.epic/miniconda3/modulefiles
module load miniconda3/4.12.0
module use /mnt/lfs4/HFIP/hfv3gfs/role.epic/hpc-stack/libs/intel-2022.1.2/modulefiles/stack
module load hpc/1.2.0
module load hpc-intel/2022.1.2
module load hpc-impi/2022.1.2
NB:
There were comments in ufs-weather-app/PR-419 suggesting to roll back to lower compiler versions for Cheyenne gnu (to use 11.2.0 instead of 12.1.0), Hera intel (to use intel/2021.1.2 instead of 2022.2.0), and Jet intel (to use intel/2021.1.2 instead of intel/2022.2.0)Either way could be OK for the SRW, and the libraries would be built for the lower-version compilers as suggested
The text was updated successfully, but these errors were encountered: