Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GNU and Cheyenne Support to Automated RT #444

Merged

Conversation

BrianCurtis-NOAA
Copy link
Collaborator

@BrianCurtis-NOAA BrianCurtis-NOAA commented Feb 26, 2021

Description

This PR Addresses the need for automation for Gnu compiler and adding support for Cheyenne.

Is a change of answers expected from this PR? NO
Are any library updates included in this PR (modulefiles etc.)? NO

Issue(s) addressed

Automated RT Needed to run on Hera Gnu and both Cheyenne Intel/Gnu
Fixes mentioned in Pull #439 (skip-ci)
Stop removing PR workdir for now. Dir available in case of failures

Testing

How were these changes tested?
What compilers / HPCs was it tested with? Intel/Gnu Hera, Dom will be testing on Cheyenne

#Combined work from
@climbfuji

@climbfuji
Copy link
Collaborator

One comment I have is that it would be nice to purge the workdir if the tests passed successfully, only keep it in case they failed. Future improvement!

I added the grt and rt labels for cheyenne, should kick off in 18 mins or so.

@BrianCurtis-NOAA
Copy link
Collaborator Author

One comment I have is that it would be nice to purge the workdir if the tests passed successfully, only keep it in case they failed. Future improvement!

I added the grt and rt labels for cheyenne, should kick off in 18 mins or so.

Make sure you add this new code to cheyenne too.

@climbfuji
Copy link
Collaborator

climbfuji commented Feb 26, 2021 via email

tests/auto/rt_auto.sh Outdated Show resolved Hide resolved
Remove extra comments
tests/auto/rt_auto.sh Outdated Show resolved Hide resolved
tests/auto/rt_auto.sh Show resolved Hide resolved
@climbfuji
Copy link
Collaborator

It's not working on cheyenne. I'll need to look into it a bit closer.

@climbfuji
Copy link
Collaborator

rt.sh failed
machine: cheyenne
compiler: gnu
STDOUT: ['/glade/u/home/heinzell/.bash_profile: line 10: setup_cheyenne_no_modules: No such file or directory', '+ SECONDS=0', '+ hostname', 'chadmin3.ib0.cheyenne.ucar.edu', '+ [[ 3 -eq 0 ]]', '+ trap '{ echo "rt.sh interrupted"; rt_trap ; }' INT', '+ trap '{ echo "rt.sh quit"; rt_trap ; }' QUIT', '+ trap '{ echo "rt.sh terminated"; rt_trap ; }' TERM', '+ trap '{ echo "rt.sh error on line $LINENO"; cleanup ; }' ERR', '+ trap '{ echo "rt.sh finished"; cleanup ; }' EXIT', '+++ dirname ./rt.sh', '++ cd .', '++ pwd -P', '+ readonly PATHRT=/glade/work/heinzell/fv3/ufs-weather-model/auto-rt/581177510/20210303084508/ufs-weather-model/tests', '+ PATHRT=/glade/work/heinzell/fv3/ufs-weather-model/auto-rt/581177510/20210303084508/ufs-weather-model/tests', '+ cd /glade/work/heinzell/fv3/ufs-weather-model/auto-rt/581177510/20210303084508/ufs-weather-model/tests', '++ cd /glade/work/heinzell/fv3/ufs-weather-model/auto-rt/581177510/20210303084508/ufs-weather-model/tests/..', '++ pwd', '+ readonly PATHTR=/glade/work/heinzell/fv3/ufs-weather-model/auto-rt/581177510/20210303084508/ufs-weather-model', '+ PATHTR=/glade/work/heinzell/fv3/ufs-weather-model/auto-rt/581177510/20210303084508/ufs-weather-model', '+ readonly LOCKDIR=/glade/work/heinzell/fv3/ufs-weather-model/auto-rt/581177510/20210303084508/ufs-weather-model/tests/lock', '+ LOCKDIR=/glade/work/heinzell/fv3/ufs-weather-model/auto-rt/581177510/20210303084508/ufs-weather-model/tests/lock', '+ mkdir /glade/work/heinzell/fv3/ufs-weather-model/auto-rt/581177510/20210303084508/ufs-weather-model/tests/lock', '++ hostname', '+ echo chadmin3.ib0.cheyenne.ucar.edu 15260', '+ export RT_COMPILER=gnu', '+ RT_COMPILER=gnu', '+ source detect_machine.sh', '++ export ACCNR=P48503002', '++ ACCNR=P48503002', '++ case $(hostname -f) in', '+++ hostname -f', 'detect_machine.sh: line 99: MACHINE_ID: unbound variable', "+++ echo 'rt.sh finished'", 'rt.sh finished', '+++ cleanup', '+++ rm -rf /glade/work/heinzell/fv3/ufs-weather-model/auto-rt/581177510/20210303084508/ufs-weather-model/tests/lock', '+++ [[ false == true ]]', '+++ trap 0', '+++ exit', '']
STDERR: []

@climbfuji
Copy link
Collaborator

Log Name:rt_auto_20210303084504.log
Log Location:/glade/work/heinzell/fv3/ufs-weather-model/auto-rt/control-20210226-new/tests/auto
Logs are kept for one month

@climbfuji
Copy link
Collaborator

rt.sh failed
machine: cheyenne
compiler: intel
STDOUT: ['/glade/u/home/heinzell/.bash_profile: line 10: setup_cheyenne_no_modules: No such file or directory', '+ SECONDS=0', '+ hostname', 'chadmin3.ib0.cheyenne.ucar.edu', '+ [[ 3 -eq 0 ]]', '+ trap '{ echo "rt.sh interrupted"; rt_trap ; }' INT', '+ trap '{ echo "rt.sh quit"; rt_trap ; }' QUIT', '+ trap '{ echo "rt.sh terminated"; rt_trap ; }' TERM', '+ trap '{ echo "rt.sh error on line $LINENO"; cleanup ; }' ERR', '+ trap '{ echo "rt.sh finished"; cleanup ; }' EXIT', '+++ dirname ./rt.sh', '++ cd .', '++ pwd -P', '+ readonly PATHRT=/glade/work/heinzell/fv3/ufs-weather-model/auto-rt/581177510/20210303084603/ufs-weather-model/tests', '+ PATHRT=/glade/work/heinzell/fv3/ufs-weather-model/auto-rt/581177510/20210303084603/ufs-weather-model/tests', '+ cd /glade/work/heinzell/fv3/ufs-weather-model/auto-rt/581177510/20210303084603/ufs-weather-model/tests', '++ cd /glade/work/heinzell/fv3/ufs-weather-model/auto-rt/581177510/20210303084603/ufs-weather-model/tests/..', '++ pwd', '+ readonly PATHTR=/glade/work/heinzell/fv3/ufs-weather-model/auto-rt/581177510/20210303084603/ufs-weather-model', '+ PATHTR=/glade/work/heinzell/fv3/ufs-weather-model/auto-rt/581177510/20210303084603/ufs-weather-model', '+ readonly LOCKDIR=/glade/work/heinzell/fv3/ufs-weather-model/auto-rt/581177510/20210303084603/ufs-weather-model/tests/lock', '+ LOCKDIR=/glade/work/heinzell/fv3/ufs-weather-model/auto-rt/581177510/20210303084603/ufs-weather-model/tests/lock', '+ mkdir /glade/work/heinzell/fv3/ufs-weather-model/auto-rt/581177510/20210303084603/ufs-weather-model/tests/lock', '++ hostname', '+ echo chadmin3.ib0.cheyenne.ucar.edu 17341', '+ export RT_COMPILER=gnu', '+ RT_COMPILER=gnu', '+ source detect_machine.sh', '++ export ACCNR=P48503002', '++ ACCNR=P48503002', '++ case $(hostname -f) in', '+++ hostname -f', 'detect_machine.sh: line 99: MACHINE_ID: unbound variable', "+++ echo 'rt.sh finished'", 'rt.sh finished', '+++ cleanup', '+++ rm -rf /glade/work/heinzell/fv3/ufs-weather-model/auto-rt/581177510/20210303084603/ufs-weather-model/tests/lock', '+++ [[ false == true ]]', '+++ trap 0', '+++ exit', '']
STDERR: []

@climbfuji
Copy link
Collaborator

Log Name:rt_auto_20210303084504.log
Log Location:/glade/work/heinzell/fv3/ufs-weather-model/auto-rt/control-20210226-new/tests/auto
Logs are kept for one month

@climbfuji
Copy link
Collaborator

added the cheyenne gnu rt label for a first test

@climbfuji
Copy link
Collaborator

auto-rt for cheyenne/gnu kicked off; just added the label for cheyenne+intel

@climbfuji
Copy link
Collaborator

Log Name:rt_auto_20210303092004.log
Log Location:/glade/work/heinzell/fv3/ufs-weather-model/auto-rt/control-20210226-new/tests/auto
Logs are kept for one month

@climbfuji
Copy link
Collaborator

cheyenne.gnu passed.

@climbfuji
Copy link
Collaborator

Log Name:rt_auto_20210303093005.log
Log Location:/glade/work/heinzell/fv3/ufs-weather-model/auto-rt/control-20210226-new/tests/auto
Logs are kept for one month

Copy link
Collaborator

@climbfuji climbfuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me. Scope of the changes (including the addition in tests/detect_machine.sh seem safe to me without full testing everywhere, especially since the next PR will be tested immediately afterwards.

@DusanJovic-NOAA DusanJovic-NOAA merged commit 0b8a889 into ufs-community:develop Mar 3, 2021
@BrianCurtis-NOAA BrianCurtis-NOAA deleted the feature/rt-auto-gnu branch March 4, 2021 19:18
AnningCheng-NOAA added a commit to AnningCheng-NOAA/ufs-weather-model that referenced this pull request Mar 8, 2021
* upstream/develop:
  update MOM6 to GFDL 20210224 main branch commit (ufs-community#439)
  Add GNU and Cheyenne Support to Automated RT (ufs-community#444)
  Move Noah MP init to CCPP and update Noah MP regression tests, ice flux init bug fix in CCPP (ufs-community#425)
  Feature/rt automation (ufs-community#403)
  Update ccpp-physics. Make RRTMGP thread safe (ufs-community#418)
  Update regression tests from GFSv15+Thompson to GFSv16+Thompson, include "Add one regional regression test in DEBUG mode. (ufs-community#419)" (ufs-community#421)
  UGWP v0 v1 combined (ufs-community#396)
  add optional mesh in MOM6; add dz_min and min_seaice as configurable variables for coupled model (ufs-community#399)
  updates FMS to 2020.04.01 (ufs-community#392)
  Move LSM vegetation lookup tables into CCPP, clean up RUC snow cover on ice initialization (remove IPD step 2)  (ufs-community#407)
  Update CMEPS for HAFS integration; add datm and coupled-model tests on Gaea (ufs-community#401)
  Remove legacy gnumake build from fv3atm and NEMS, remove legacy Python 2.7 support, rename v16beta to v16 and RT updates (ufs-community#384)
  MOM6 bugfixes, GFDL update, update CDMBGWD settings; fix for restart reproducibility (without waves) when USE_LA_LI2016=True, sign error on fprec passed to ocean, GFDL update, resolution dependent cdmbgwd settings (ufs-community#379)
  dycore options to add zero-gradient BC to reconstruct interface u/v and change dz_min as input (ufs-community#369)
  Update develop from NOAA-GSL: RUC ice, MYNN sfclay, stochastic land perturbations (ufs-community#386)
  update cpl gfsv16 tests, rrtmgp fix and bug fixes in cmeps (ufs-community#378)
  point fv3 to EMC develop branch (ufs-community#377)
  Remove IPD steps 3 and 5 (ufs-community#357)
  Update CMEPS  (ufs-community#345)
  Implementation of CCPP timestep_init and timestep_final phases (ufs-community#337)
  Remove unnecessary SIMD instruction sets for Jet, first round of cleanup in rt.conf, initialize cld_amt to zero for regional runs (dycore) (ufs-community#353)
  add frac grid input, update and add additional cpld tests (ufs-community#354)
  Add checkpoint restarts for ufs-cpld (ufs-community#342)
  Update the format of rt.conf (ufs-community#349)
  Remove IPD (step 1) (ufs-community#331)
  Feature/ww3update (ufs-community#334)
  Replace old regional SDF with FV3_GFS_v15_thompson_mynn (ufs-community#333)
  Update modules with hpc-stack v1.1.0 (ufs-community#319)
  Regression test log for PR ufs-community#323 for jet.intel (ufs-community#336)
  RRTMGP and Thompson MP coupling (ufs-community#323)
  Add 2 new tests for DATM-MOM6-CICE6 application (ufs-community#332)
  Add optional bulk flux calculation in ufs-datm (ufs-community#266)
  Final-final GFS v16 updates / restart reproducibility bugfixes (ufs-community#325)
  Updates to build for JEDI linking/control, add wcoss2 (ufs-community#295)
  Update CICE, Move regression test input outside baseline directory (ufs-community#270)
  Feature/update mom6 and retain b4b results for 025x025 resolution (ufs-community#290)
  Update for Jet, bug fixes in running with frac_grid=T and GFDL MP, and in restarting with frac_grid=T  (ufs-community#304)
  Updates to stochastic_physics_wrapper (ufs-community#280)
  Update develop from gsd/develop 2020/11/20: Unified gravity wave drag, updates to other GSL physics (ufs-community#297)
  Fix to allow quilting with non-factors for layout (ufs-community#250)
  rt update (ufs-community#261)
epic-cicd-jenkins pushed a commit that referenced this pull request Apr 17, 2023
* Update build_cheyenne_gnu.lua

remove loading of system python3

* Update build_cheyenne_intel.lua

remove loading system python module

* Update wflow_cheyenne.lua

Load updated miniconda3 and ask to activate regional_workflow enviroment

* Update wflow_hera.lua

Load an updated miniconda3 and ask to activate regional_workflow environment

* Update wflow_jet.lua

Update miniconda3 module location and ask to activate regional_workflow

* Update wflow_orion.lua

Update miniconda3/4.12.0 module location and ask to activate regional_workflow environment

* Update load_modules_run_task.sh

Run an additional cycle of "conda deactivate" and "conda activate regional_workflow". It ensures that _python3_ binary path from the *regional_workflow* environment  becomes prepended to the search $PATH, and is found first, before the _python3_ from miniconda3/4.12.0 from the *base* environment.

* Update wflow_cheyenne.lua

"conda activate regional_workfow"

* Update and rename conda_regional_workflow.lua to miniconda_regional_workflow.lua

use new miniconda3/4.12.0 with regional_workflow environment

* Update make_grid.local.lua

* Update get_extrn_ics.local.lua

* Update get_extrn_lbcs.local.lua

* Update make_ics.local.lua

* Update make_lbcs.local.lua

* Update and rename make_orog.hardcoded.lua to make_orog.local.lua

* Update run_fcst.local.lua

* Update run_vx.local.lua

* Create make_sfc_climo.local.lua

* Update miniconda_regional_workflow.lua

* Update get_obs.local.lua

all the requested packages for the python3 are found in regional_workflow environment

* Update miniconda_regional_workflow.lua

load updated miniconda3/4.12.0 with regional_workflow environment

* Update miniconda_regional_workflow.lua

Load an updated miniconda3/4.12.0 with the regional_workflow environment

* Enable SCHED_NATIVE_CMD on all systems.

* Update build_cheyenne_gnu.lua

need to have miniconda3 loaded in build module

* Update build_cheyenne_intel.lua

need to have miniconda3 loaded in the build module

* Update build_gaea_intel.lua

need to have miniconda3 loaded in build module

* Update build_hera_intel.lua

need to have miniconda3 loaded in build module

* Update build_jet_intel.lua

need to have miniconda3 loaded in the build modulefile

* Update build_orion_intel.lua

need to have miniconda3 loaded in build modulefile

* Update load_modules_wflow.sh

conda activate command same across the platforms

* Don't export variables for those that use SLURM.

* Add some missing task specific modulefiles.

* Update miniconda_regional_workflow.lua

miniconda3 is now loaded in build_<system>_<compiler>, not in *.local files

* Update miniconda_regional_workflow.lua

miniconda3 is now loaded in build_<system>_<compiler>, not in *.local files

* Update miniconda_regional_workflow.lua

miniconda3 is now loaded in build_<system>_<compiler>, not in *.local files

* Update miniconda_regional_workflow.lua

miniconda3 is now loaded in build_<system>_<compiler>, not in *.local files

* Update miniconda_regional_workflow.lua

* fixes for noaacloud that work with Daniels pr

* removed extra lines

* removed commented lines

* Removed set -x and some commented lines

* put module list back in

* removed ldd

* removed miniconda from build*.lua files

* returned conda load to local files

* returned python to cheyenne

* unload python module before setting up miniconda

* added unload python to miniconda_regional_workflow.lua file

* added local files for orion

Co-authored-by: Natalie Perlin <68030316+natalie-perlin@users.noreply.github.com>
Co-authored-by: Daniel Abdi <daniel.abdi@noaa.gov>
Co-authored-by: Mark Potts <mpotts@redlineperf.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants