Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Software module updates in hpc-stack for Hera (intel, gnu) #1468

Closed
wants to merge 17 commits into from

Conversation

natalie-perlin
Copy link
Collaborator

@natalie-perlin natalie-perlin commented Oct 20, 2022

UPD. 07 Nov 2022:
Limiting the updates to hpc-stack only in the present PR-1468. Not including miniconda3 updates, by a request.

UPD2. 08 Nov 2022:
Added updates for Hera system to use hpc-stack built with gnu/9.2.0 compiler and mpich/3.3.2, installed in EPIC-managed space, by a request.

UPD3. 09 Nov 2022:
By a request, changed the module to use different gnu-mpi combination, gnu/9.2.0 compiler and openmpi/3.1.4, installed in EPIC-managed space

UPD4. 14 Nov 2022:
By a request from @jkbk2004, reverting hera_gnu option to use gnu/9.2 compilers + mpich/3.3.2.

PR Checklist

  • This PR is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR. Please consult the ufs-weather-model wiki if you are unsure how to do this.

  • This PR has been tested using a branch which is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR

  • An Issue describing the work contained in this PR has been created either in the subcomponent(s) or in the ufs-weather-model. The Issue should be created in the repository that is most relevant to the changes in contained in the PR. The Issue and the dependent sub-component PR
    are specified below.
    Issue 1465

  • Results for one or more of the regression tests change and the reasons for the changes are understood and explained below (see Testing section).

  • New or updated input data is required by this PR. If checked, please work with the code managers to update input data sets on all platforms.

Instructions: All subsequent sections of text should be filled in as appropriate.

Description

The updates were made to list new locations of the updated hpc-stack libraries and an updated miniconda3/4.12.0.

Updated files are the modulefiles:

./modulefiles/ufs_hera.intel.lua 
./modulefiles/ufs_hera.intel_debug.lua

and python environmental variables for the Hera system:
./tests/rt.sh

Issue(s) addressed

Link the issues to be closed with this PR, whether in this repository, or in another repository.
This PR adresses one of the issues in Issue-1465

Testing

The following regression tests have been run on Hera and reported to be passed (OK):

COMPILE | -DAPP=ATM -DCCPP_SUITES=FV3_GFS_v16,FV3_GFS_v15_thompson_mynn,FV3_GFS_v17_p8,FV3_GFS_v17_p8_rrtmgp,FV3_GFS_v15_thompson_mynn_lam3km -D32BIT=ON
RUN     | control   
RUN     | control_decomp 
RUN     | control_2dwrtdecomp 
RUN     | control_2threads
RUN     | control_restart
RUN     | control_fhzero 
RUN     | control_CubedSphereGrid
RUN     | control_CubedSphereGrid_parallel
RUN     | control_latlon 
RUN     | control_wrtGauss_netcdf_parallel

  • hera.intel

Dependencies

Updating the location of a fresh installation of the hpc-stack modulefiles and miniconda3/4.12.0 version
Updating locations of newly installed hpc-stack modules and miniconda3/4.12.0
Updating miniconda3 module installation location with python3.9
@natalie-perlin natalie-perlin changed the title Hera intel software modules update for hpc-stack and miniconda3+python Hera intel software module updates for hpc-stack and miniconda3+python Oct 20, 2022
@DeniseWorthen
Copy link
Collaborator

@natalie-perlin I'm confused about whether this PR changes baselines or not. You report that a small subset of the RTs pass, but you've also checked the box saying that one or more tests have changed results.

Can you run the entire intel and gnu baselines on a single platform with your PR and verify whether any test changes results?

@jkbk2004
Copy link
Collaborator

jkbk2004 commented Nov 4, 2022

module use /scratch2/NCEPDEV/marine/Jong.Kim/UFS-RT/rt-1468-intel/modulefiles/ module load ufs_hera.intel_debug complains

Lmod has detected the following error:  The following module(s) are unknown: "netcdf/4.7.4"
"esmf/8.3.0b09-debug"

@natalie-perlin natalie-perlin deleted the develop_hera branch November 4, 2022 12:16
@natalie-perlin natalie-perlin restored the develop_hera branch November 4, 2022 17:18
@natalie-perlin natalie-perlin deleted the develop_hera branch November 4, 2022 17:22
@natalie-perlin natalie-perlin restored the develop_hera branch November 4, 2022 17:22
@natalie-perlin natalie-perlin deleted the develop_hera branch November 4, 2022 17:27
@natalie-perlin natalie-perlin restored the develop_hera branch November 4, 2022 17:27
@natalie-perlin natalie-perlin reopened this Nov 4, 2022
@jkbk2004
Copy link
Collaborator

jkbk2004 commented Nov 7, 2022

@natalie-perlin two baselines fail with intel: control_fhzero and control_CubedSphereGrid_parallel.

@jkbk2004
Copy link
Collaborator

jkbk2004 commented Nov 7, 2022

@natalie-perlin gnu baselines are reproduced ok.

@natalie-perlin
Copy link
Collaborator Author

natalie-perlin commented Nov 7, 2022 via email

@jkbk2004
Copy link
Collaborator

jkbk2004 commented Nov 7, 2022 via email

@natalie-perlin
Copy link
Collaborator Author

natalie-perlin commented Nov 7, 2022 via email

Keeping the existing miniconda3/3.7.3 (not updating)
Keeping the existing miniconda3/3.7.3 (not updating to miniconda3/4.12.0)
@natalie-perlin natalie-perlin changed the title Hera intel software module updates for hpc-stack and miniconda3+python Hera intel software module updates for hpc-stack, intel/2022.1.2 Nov 7, 2022
tests/rt.sh Outdated Show resolved Hide resolved
tests/rt.sh Outdated Show resolved Hide resolved
Reverting to the existing path of miniconda3 and python3
@DusanJovic-NOAA
Copy link
Collaborator

@DusanJovic-NOAA @aliabdolali - does the scotch package need to be compiled as serial or MPI application? Does it need to depend on whether netcdf is used with or without MPI?

I am not familiar with the scotch package.

@DeniseWorthen
Copy link
Collaborator

@DusanJovic-NOAA I had found this issue #1440 and someone suggested it was because the GNU on hera is 9.x but on cheyenne is 10.x. Is that is a reasonable guess? It seemed odd to me that it would result in being 5x slower w/ the older gnu. But if true, then gnu 10.x would be preferable.

Modified gnu/9.2- mpi combination to use openmpi/3.1.4, as  requested (Attn: @jkbk2004 )
Modified gnu/9.2- mpi combination to use openmpi/3.1.4, as  requested (Attn: @jkbk2004 )
@natalie-perlin natalie-perlin changed the title Hera software module updates for hpc-stack, intel/2022.1.2 , gnu/9.2 + mpich/3.3.2 Software module updates in hpc-stack for Hera (intel, gnu) Nov 10, 2022
@DusanJovic-NOAA
Copy link
Collaborator

DusanJovic-NOAA commented Nov 10, 2022

I ran control_p8 test using gnu 9.2 and openmpi 3.4.1 and it finished in 267 seconds. Looks like just switching from mpich openmpi speeds up model execution almost 3x.

Changing the optimization level from -O2 to -O3 speeds up control_p8 test a little bit more, it now runs 254 seconds.

@natalie-perlin
Copy link
Collaborator Author

@DeniseWorthen , @DusanJovic-NOAA - thank you for your comments and testing! @jkbk2004 - should we switch the compiler to gnu/10.2?

@jkbk2004
Copy link
Collaborator

hera system admin didn't agree to install gn10.2 since they prefer gnu12. but @ulmononian was able to install gnu10.2 thru spack. we are achieving a goal with gnu9.2/openmpi3.1.4. we can test gnu10.2.

@ulmononian
Copy link
Collaborator

hera system admin didn't agree to install gn10.2 since they prefer gnu12. but @ulmononian was able to install gnu10.2 thru spack. we are achieving a goal with gnu9.2/openmpi3.1.4. we can test gnu10.2.

@jkbk2004 @natalie-perlin gnu/9.2.0 is natively installed on Hera through RDHPCS. as jong mentions, the Hera team does not want to install an intermediate gnu (e.g., they will not install gnu/10.1 or gnu/10.2), only gnu/12. i would suggest moving forward with gnu/9.2.0-openmpi/3.1.4 given @DusanJovic-NOAA's report on the rt results, rather than relying on a non-native install of gnu on Hera (e.g. installed directly from a tarball or spack), which would be the only way to utilize anything newer than gnu/9.2.0.

@natalie-perlin
Copy link
Collaborator Author

natalie-perlin commented Nov 10, 2022

@ulmononian @jkbk2004 @DusanJovic-NOAA -
The installation of GNU/10.2.0 was done using the hpc-stack build_gnu.sh script, so it was a verified method to install the compiler. It was moved out of the original installation location after the initial install, in order to allow the gnu/gcc module and library to be shared for different hpc-stacks.

@ulmononian
Copy link
Collaborator

@ulmononian @jkbk2004 @DusanJovic-NOAA - The installation of GNU/10.2.0 was done using the hpc-stack build_gnu.sh script, so it was a verified method to install the compiler. It was moved out of the original installation location in order to allow the module and library to be shared for different hpc-stacks.

@natalie-perlin noted. i also installed gnu/10.1.0 (via spack). will the ufs-wm CM accept compiler installations performed in this way (i.e., not by RDHPCS) for use in the official ufs-wm modulefiles? @jkbk2004

@DusanJovic-NOAA
Copy link
Collaborator

cpld_control_p8 fails with gnu/9.2.0 and openmpi/3.1.4:

18: --------------------------------------------------------------------------
18: The OSC pt2pt component does not support MPI_THREAD_MULTIPLE in this release.
18: Workarounds are to run on a single node, or to use a system with an RDMA
18: capable network such as Infiniband.
18: --------------------------------------------------------------------------

I opened hera helpdesk ticket.

reverting back to using gnu/9.2.0 + mpich/3.3.2
reverting back to using gnu/9.2 + mpich/3.3.2
@jkbk2004
Copy link
Collaborator

@natalie-perlin your branch is 2 commits behind. Please, sync up. Then can you make a pr to @DeniseWorthen #1486?

@natalie-perlin
Copy link
Collaborator Author

@jkbk2004 - synced with the develop.

@natalie-perlin
Copy link
Collaborator Author

It would look much more transparent of the work that has been done on updating the hpc-stack locations (Issue-1465) when the changes from this PR on Hera compilers go from NOAA-EPIC repo directly to ufs-community/ufs-weather-model.

@natalie-perlin
Copy link
Collaborator Author

If there is a need to test this PR-1468 before it is merged for working along another PR, it could be checked out in the following way:

git fetch origin pull/1468/head:PR1468
git checkout PR1468

when the origin is:

$ git remote -v
origin	https://github.com/ufs-community/ufs-weather-model.git (fetch)
origin	https://github.com/ufs-community/ufs-weather-model.git (push)

@DeniseWorthen
Copy link
Collaborator

@jkbk2004 @BrianCurtis-NOAA Before combining PRs, we need to know that this PR has been tested and verified. I don't see that has happened. Only a limited sub-set of tests appears to have been tested but the part which says "regression test results change" has been marked but the small sub-set of tests which was run appears to have passed.

@jkbk2004
Copy link
Collaborator

@DeniseWorthen @BrianCurtis-NOAA I was able to test this pr ok on hera for both intel and gnu. @natalie-perlin can you go ahead to directly create a pr to #1486 branch?

@DeniseWorthen
Copy link
Collaborator

@jkbk2004 Then the checkmark that says "one or more regression tests change" should not be checked.

@natalie-perlin
Copy link
Collaborator Author

@DeniseWorthen - unchecked the "regression test results change"

jkbk2004 added a commit that referenced this pull request Nov 16, 2022
…reading for cpld_bmark control and restart (was #1483); Software module updates in hpc-stack for Hera (intel, gnu) (was #1468) (#1486)

* update CMEPS submodule

* bmark cpld tests use esmf-managed threading by default
* remove version w/o esmf-managed threading

* update hera hpc stack locations: intel/gnu

Co-authored-by: Brian Curtis <brian.curtis@noaa.gov>
Co-authored-by: jkbk2004 <jong.kim@noaa.gov>
Co-authored-by: zach1221 <99902696+zach1221@users.noreply.github.com>
@jkbk2004
Copy link
Collaborator

This pr was merged thru #1486. @DusanJovic-NOAA we will create another pr to add gnu/openmpi feature on hera.

@jkbk2004 jkbk2004 closed this Nov 16, 2022
@jkbk2004 jkbk2004 deleted the develop_hera branch March 18, 2024 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants