-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove unnecessary SIMD instruction sets for Jet, first round of cleanup in rt.conf, initialize cld_amt to zero for regional runs (dycore) #353
Conversation
…r-model into simd_update_and_rt_cleanup_20201231
Can we have a detailed plan what the future rt.conf will look like? I thought we are going to reduce the total number of rt.conf files, but now we have rt_gnu.conf, rt_acorn.conf, rt_ccpp_dev.conf. Also why do we remove the "-f" option which may allow us integrate rt_35d.conf into rt.conf. Also we worked with other research collaborators to port the code to stampede, before we have a formal plan on supporting a wide range developers including people using stampede, please do not remove the file. I think we need to write a document for the rt.conf development. |
That's why I started reducing it by removing EDIT For the same reason, we should delete
Because @DusanJovic-NOAA asked me to do so, and because it is not needed?
See above.
Yes. |
Let's discuss this at ufs infrastructure development, we do have tier2 or
tier3 platforms. We may not have enough resources to test it on certain
platforms, it does not mean " Leaving this file there when it is not
tested", I do think corresponding parties can have test them. In general we
need to provide a way to allow other collaborators to run ufs. Again I just
don't know where the code changes lead us to, we need a development plan
for it.
…On Mon, Jan 4, 2021 at 9:32 AM Dom Heinzeller ***@***.***> wrote:
Can we have a detailed plan what the future rt.conf will look like? I
thought we are going to reduce the total number of rt.conf files, but now
we have rt_gnu.conf, rt_acorn.conf, rt_ccpp_dev.conf.
That's why I started reducing it by removing rt_stampede.conf, which a
few lines below you ask me to put it back. Leaving this file there when it
is not tested doesn't make any sense. If needed we can simply pull out the
two most standard tests of rt.conf when we identify a team that maintains
and run rt.sh on stampede, and that the full rt.conf doesn't run for some
reason.
Also why do we remove the "-f" option which may allow us integrate
rt_35d.conf into rt.conf.
Because @DusanJovic-NOAA <https://github.com/DusanJovic-NOAA> asked me to
do so, and because it is not needed? -l xyz.conf exists, and without -l
it goes to rt.conf automatically, what does -f do on top of it?
Also we worked with other research collaborators to port the code to
stampede, before we have a formal plan on supporting a wide range
developers including people using stampede, please do not remove the file.
See above.
I think we need to write a document for the rt.conf development.
Yes.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#353 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AI7D6TLVRE263MQ6G75VL4LSYHGQXANCNFSM4VPTXI5Q>
.
|
Based on today's discussion it seems to be appropriate to remove We will retain We did not discuss the |
@junwang-noaa @DusanJovic-NOAA please take a look at the modified |
…r-model into simd_update_and_rt_cleanup_20201231
This PR is ready for review. The new regression test baseline date tag is 20210106. |
…sts on all platforms; skip-ci
0ecfc31
to
00d34e1
Compare
Sorry, maybe I misunderstand, I thought we would have a different machine
for wcoss2, not acorn which is a small test machine. If acorn is the wcoss2
machine, we need to add it to rt.conf like dell and cray.
…On Wed, Jan 6, 2021 at 11:17 AM Dusan Jovic ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In tests/rt_acorn.conf
<#353 (comment)>
:
> -RUN | fv3_ccpp_gfdlmprad_32bit_post | standard | | fv3 |
-RUN | fv3_ccpp_cpt | standard | | fv3 |
-RUN | fv3_ccpp_gsd | standard | | fv3 |
-RUN | fv3_ccpp_thompson | standard | | fv3 |
-RUN | fv3_ccpp_thompson_no_aero | standard | | fv3 |
-RUN | fv3_ccpp_rrfs_v1beta | standard | | fv3 |
-
-################################################################################################################################################################################
-# CPLD tests #
-################################################################################################################################################################################
-
-COMPILE | SUITES=FV3_GFS_2017_coupled,FV3_GFS_2017_satmedmf_coupled,FV3_GFS_v15p2_coupled S2S=Y | standard | | fv3 |
-RUN | cpld_control | standard | | fv3 |
-RUN | cpld_2threads | standard | | |
-RUN | cpld_decomp | standard | | |
+##################################################################################################################################################################
We do not run any test on acorn before each commit.
Acorn (or however they call the real WCOSS2) is not the same as stampede.
It will (soon?) become Tier-1 platform. Moreover it will be the NCEP's
production machine.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#353 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AI7D6TMUKKBHMPKEI6KMYOLSYSEKLANCNFSM4VPTXI5Q>
.
|
Let's add wcoss2 to rt.conf when it is accepted as NCEP's production
machine.
On Wed, Jan 6, 2021 at 11:34 AM Jun Wang - NOAA Federal <jun.wang@noaa.gov>
wrote:
… Sorry, maybe I misunderstand, I thought we would have a different machine
for wcoss2, not acorn which is a small test machine. If acorn is the wcoss2
machine, we need to add it to rt.conf like dell and cray.
On Wed, Jan 6, 2021 at 11:17 AM Dusan Jovic ***@***.***>
wrote:
> ***@***.**** commented on this pull request.
> ------------------------------
>
> In tests/rt_acorn.conf
> <#353 (comment)>
> :
>
> > -RUN | fv3_ccpp_gfdlmprad_32bit_post | standard | | fv3 |
> -RUN | fv3_ccpp_cpt | standard | | fv3 |
> -RUN | fv3_ccpp_gsd | standard | | fv3 |
> -RUN | fv3_ccpp_thompson | standard | | fv3 |
> -RUN | fv3_ccpp_thompson_no_aero | standard | | fv3 |
> -RUN | fv3_ccpp_rrfs_v1beta | standard | | fv3 |
> -
> -################################################################################################################################################################################
> -# CPLD tests #
> -################################################################################################################################################################################
> -
> -COMPILE | SUITES=FV3_GFS_2017_coupled,FV3_GFS_2017_satmedmf_coupled,FV3_GFS_v15p2_coupled S2S=Y | standard | | fv3 |
> -RUN | cpld_control | standard | | fv3 |
> -RUN | cpld_2threads | standard | | |
> -RUN | cpld_decomp | standard | | |
> +##################################################################################################################################################################
>
> We do not run any test on acorn before each commit.
> Acorn (or however they call the real WCOSS2) is not the same as stampede.
> It will (soon?) become Tier-1 platform. Moreover it will be the NCEP's
> production machine.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#353 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AI7D6TMUKKBHMPKEI6KMYOLSYSEKLANCNFSM4VPTXI5Q>
> .
>
|
@climbfuji would you please give a short summary what additional tests have been added on orion, jet, cheyenne, and wcoss? |
It's best to look at https://docs.google.com/spreadsheets/d/1tf7ufYW2umLXQQ2G43h64ESGw5jYb67MAqEbvGPNAm4/edit?ts=5feca219#gid=1397536520 and the updated
That's it. |
@DusanJovic-NOAA I think you can start creating baselines on wcoss and orion (if possible). I am cruising along on jet, gaea, cheyenne, hera. |
I will update the spreadsheet that Dusan put together after the PR is merged. |
@junwang-noaa @DusanJovic-NOAA This PR is ready to merge. The CI tests just kicked off, they'll be done by tomorrow morning easily. Thanks @DusanJovic-NOAA for your help with running the regression tests today. |
module load hdf5/1.10.6 | ||
module load netcdf/4.7.4 | ||
module load pio/2.5.1 | ||
module load esmf/8_1_0_beta_snapshot_27 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there no esmf debug module on jet ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all the platforms where we haven't compiled the debug module yet we simply had to copy the existing module over so that the code would still run. This is because somebody removed the logic (or it is not working as intended) that says "only if a debug module exists, use it; otherwise use the standard module".
Yes, when the next HPC stack release is rolled out we should create debug modules for all platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, it's the same for wcoss_cray, for example.
CI tests passed; @DeniseWorthen once you approve we are ready to merge. Thanks everyone for your review. |
* Updates to stochastic_physics_wrapper (ufs-community#280) Fix to stochastic_physics_wrapper to allow for random patterns to update at a longer time-step than model Co-authored-by: Dom Heinzeller <climbfuji@ymail.com> * Update for Jet, bug fixes in running with frac_grid=T and GFDL MP, and in restarting with frac_grid=T (ufs-community#304) Update the modulefile for jet.intel to enable UPP v10.0.0. The hpc-stack v1.0.0 pre-release is used for this. Small changes are made to tests.rt.sh for jet.intel and gaea.intel (consistency with other platforms). The submodule pointer update for fv3atm addresses bugs in the ufs-weather-model with frac_grid=T and GFDL microphysics, and with restarting the model when frac_grid=T (from @shansun6 and @SMoorthi-emc). * Feature/update mom6 and retain b4b results for 025x025 resolution (ufs-community#290) * point MOM6 to new branch which corresponding to GFDL 20201022 commit * modify fms_files.cmake and mom6_files.cmake to reflect changes in MOM6 code as this version of MOM6 contains some file deletion, new files being added and renaming of files * manually set MOM6 parameters in order to retain origonal results for 0.25x0.25 resolution * update MOM6 to include Bugfix for mom6solo to be built * modify compile.sh to allow mom6solo compiling * modify MOM_input_template for all resolutions based on GFDL MOM6-example main branch update on 20201022 * change executable permissions for CMakeLists.txt * chmod 644 to 6 files Dom pointed out * chmod for CMakeLists.txt and tests/compile.sh * change baseline directpory to 20201202 in rt.sh * Update CICE, Move regression test input outside baseline directory (ufs-community#270) *Updates CICE to most recent develop branch of NOAA-EMC * Sets n_aero (number of aerosols) in ice_in_template to 0. * removes trailing whitespace from ice_in * moves regression test input outside baseline directory (ufs-weather PR ufs-community#312) Co-authored-by: Dusan Jovic <48258889+DusanJovic-NOAA@users.noreply.github.com> Co-authored-by: Dom Heinzeller <dom.heinzeller@icloud.com> * Updates to build for JEDI linking/control, add wcoss2 (ufs-community#295) * Build on wcoss2 (acorn) * Use -march=core-avx2 instead of -xCORE-AVX2 on wcoss2 * Updates to build for JEDI linking/control * Removed unnecessary include files and INLINE POST setting * Updated to address PR suggestions. * Add rt_acorn.conf. Change /lfs/h2 to /lfs/h1. * Update .gitmodules and submodule pointer for fv3atm for code review and testing * regression test results * Updated .gitmodules and removed extraneous file * Fixed .gitmodules and updated pointer for FV3 * Updated pointer to NEMS repo Co-authored-by: Dusan Jovic <dusan.jovic@noaa.gov> Co-authored-by: Dom Heinzeller <climbfuji@ymail.com> * Final-final GFS v16 updates / restart reproducibility bugfixes (ufs-community#325) * Update .gitmodules and submodule pointer for fv3atm for code review and testing * Add GFS v16 beta restart test, update stochastics test * Update regression test baseline date tag to 20201214; skip-ci * tests/rt.conf: bugfix, add missing 'fv3' to new stochy tests; skip-ci * Regression test logs for gaea.intel, hera.gnu, hera.intel, jet.intel, orion.intel; skip-ci * Run GFS v16beta tests also on wcoss; regression test logs for wcoss; skip-ci * Regression test logs for cheyenne.intel and cheyenne.gnu * Revert change to .gitmodules and update submodule pointer for fv3atm * Add optional bulk flux calculation in ufs-datm (ufs-community#266) * Update NEMS DATM and CMEPS to allow the optional bulk flux formulation; add two tests using the option * Update top level CMakeList.txt to have compile flags for MOM6 and CICE6 identical for ufs-cpld and ufs-datm * Add optional configuration variable to nems.configure to specify the directory where CMEPS will write restarts * Adds cheyenne tasking variables to default_vars and sets WW3_COMP to cheyenne for platform cheyenne.intel *NOTE: Baselines develop-20201215 exist on all platforms, regression tests were run against exactly that baseline on all systems except cheyenne.intel. On cheyenne.intel the tests were run against 20201214, and this baseline is identical to 20201215 (as per "diff -r develop-20201214 develop-20201215"). Co-authors: @DusanJovic-NOAA @aerorahul @JessicaMeixner-NOAA skip-ci * Add 2 new tests for DATM-MOM6-CICE6 application (ufs-community#332) * Add the following 2 tests: datm_restart_cfsr, datm_debug_cfsr * Add wcoss_dell_p3.log. * Add Hera log, Orion log, wcoss_dell_p3 log. * RRTMGP and Thompson MP coupling (ufs-community#323) * Feature branch with RRTMGP and Thompson MP * Updated FV3/ccpp-physics. Added namelist and configuration for RRTMGP RTs using GSD physics. * Updated FV3 * Update physics in FV3 * Updated baselines in rt.sh * Updated RT logs. Updated FV3 physics submodule pointer. * Updated FV3 hash and .gitmodules * Regression test log for PR ufs-community#323 for jet.intel (ufs-community#336) * Update modules with hpc-stack v1.1.0 (ufs-community#319) * Update modules with hpc-stack v1.1.0 * Minor bug fixes to CCPP UGWP Co-authored-by: Dom Heinzeller <climbfuji@ymail.com> * Replace old regional SDF with FV3_GFS_v15_thompson_mynn (ufs-community#333) * Replace old FV3_GFS_2017_gfdlmp_regional SDF for regional tests with FV3_GFS_v15_thompson_mynn. * Final path to IC's and new results. Also, input.nml updated. * Update RegressionTests_wcoss_dell_p3.log * Update RegressionTests_wcoss_cray.log * Update RegressionTests_hera.intel.log * Update RegressionTests_jet.intel.log * Update RegressionTests_orion.intel.log * Update RegressionTests_cheyenne* logs. * Update RegressionTests_hera.gnu.log * Feature/ww3update (ufs-community#334) This updates the WW3 submodule pointer to point to the top of the WW3 develop branch. The path to WW3 inputs is changed to input-data-20201201/WW3_input_data_20201207/ * Remove IPD (step 1) (ufs-community#331) Make CCPP=Y the default in tests/compile.sh. Remove CCPP=Y from tests/rt*.conf and adjust formatting. Update submodule pointer for MOM6 to include PR ufs-community#341 ("Update MOM6 to GFDL's 20201218 commit") Add modulefiles/wcoss_cray/fv3_debug (identical to modulefiles/wcoss_cray/fv3) Fix broken utest (see ufs-community#348) * Update the format of rt.conf (ufs-community#349) Update the format of MACHINES column in rt.conf (and other .conf files). This column can be either empty, which means a test will run on all supported machines, or start with - or + sign to exclude or include specified machines explicitly. * Add checkpoint restarts for ufs-cpld (ufs-community#342) * Adds 3 checkpoint restart tests for the ufs-cpld model * Drops the existing c92mx025 restart test * Adds cheyenne.intel as tested configuration for ufs-cpld and ufs-datm * Fixes instances of srf_data* in various fv3_conf files * add frac grid input, update and add additional cpld tests (ufs-community#354) * Updates FV3_input_frac to add both benchmark dates and L127 files * Adds additional tests and restart tests for coupled model * Sets all cpld tests to use frac grid input by default * Removes all instances of USE_LA_LI2016=True except for benchmark+wave configurations * Remove unnecessary SIMD instruction sets for Jet, first round of cleanup in rt.conf, initialize cld_amt to zero for regional runs (dycore) (ufs-community#353) * Reduce SIMDMULTIARCH sets from four to two in cmake/Intel.cmake * First cleanup of regression test config tests/rt.conf * tests/rt.sh: reduce number of build jobs on jet.intel from 10 to 5 * Remove flags -f and -s from rt.sh, remove SET logic, remove corresponding column in all rt*conf files * Remove tests/rt_acorn.conf and run GFS v15p2 and GFS v16beta DEBUG tests on all platforms * Implementation of CCPP timestep_init and timestep_final phases (ufs-community#337) * Update .gitmodules and submodule pointer for fv3atm for code review and testing * Update submodule pointer for fv3atm; skip-ci * Don't try to compile all suites in DEBUG mode on cheyenne.intel, weird bug on compute nodes; skip-ci * Don't try to compile all suites in DEBUG mode on wcoss_cray; skip-ci * Regression test logs for cheyenne.gnu, cheyenne.intel, gaea.intel, hera.gnu, hera.intel, jet.intel, orion.intel; skip-ci * Don't try to compile all suites in DEBUG mode on wcoss_dell_p3; skip-ci * Regression test logs for wcoss_cray and wcoss_dell_p3 * Revert change to .gitmodules and update submodule pointer for fv3atm * Update CMEPS (ufs-community#345) * Update CMEPS for recent changes, including addition of new run "post" run phases to eliminate redundant mapping, multiple ice sheet capability and ocn->land ice dynamic mapping * Add a new test fv3_gfs_v16_RRTMGP_c192L127 Co-authored-by: Jun Wang <junwang-noaa@users.noreply.github.com> * Remove IPD steps 3 and 5 (ufs-community#357) Reduce SIMDMULTIARCH sets from four to two in cmake/Intel.cmake * First cleanup of regression test config tests/rt.conf * tests/rt.sh: reduce number of build jobs on jet.intel from 10 to 5; skip-ci * Remove flags -f and -s from rt.sh, remove SET logic, remove corresponding column in all rt*conf files * Update usage in rt.sh, add modulefiles/jet.intel/fv3_debug; skip-ci * CCPP is default in cmake build * Add debug modulefiles for linux.gnu and macosx.gnu * Update submodule pointer for fv3atm * Change logic in CMakeLists.txt and tests/compile.sh so that 32BIT=ON automatically sets DYN32=ON; skip-ci * Move logic to set DYN32 - depending on 32BIT setting - to fv3atm * Remove -DCCPP=ON from tests/compile.sh; update submodule pointer for fv3atm; skip-ci * point fv3 to EMC develop branch (ufs-community#377) * update cpl gfsv16 tests, rrtmgp fix and bug fixes in cmeps (ufs-community#378) * update CMEPS, fix character length error for gnu compile * add Dusan's fix for rt_utils.sh * update cpl gfsv16 tests, replace seaice_newland.grb with global_slmask.t1534.3072.1536.grb, recover input.mom6.nml.IN, update input directory, update global thread and decomp tests, update fdiag for global control * point to Dustins rrtmgp fix branch * update input directory Co-authored-by: denise.worthen <Denise.Worthen@noaa.gov> Co-authored-by: Jun Wang <junwang-noaa@users.noreply.github.com> * Update develop from NOAA-GSL: RUC ice, MYNN sfclay, stochastic land perturbations (ufs-community#386) * Update .gitmodules and submodule pointer for fv3atm for gsl/develop branch * RUC ice for gsl/develop (replaces #47) (#49)Implementation of RUC LSM ice model in CCPP * Squash-merge climbfuji:rucice_gfsv16dzmin into gsl/develop * Add kice=9 to tests/tests/fv3_ccpp_rap and tests/tests/fv3_ccpp_hrrr * Change NEW_BASELINE directory for gsl/develop to avoid conflicts with development work on the authoritative branches * Add KICE=9 to tests/tests/fv3_ccpp_gsd_unified_ugwp and tests/tests/fv3_ccpp_gsd_drag_suite_unified_ugwp * Revert change to .gitmodules and update submodule pointer for fv3atm * Update gsl/develop from develop 2020/12/08 (#50) * Updates to stochastic_physics_wrapper (ufs-community#280) Fix to stochastic_physics_wrapper to allow for random patterns to update at a longer time-step than model * Update for Jet, bug fixes in running with frac_grid=T and GFDL MP, and in restarting with frac_grid=T (ufs-community#304) Update the modulefile for jet.intel to enable UPP v10.0.0. The hpc-stack v1.0.0 pre-release is used for this. Small changes are made to tests.rt.sh for jet.intel and gaea.intel (consistency with other platforms). The submodule pointer update for fv3atm addresses bugs in the ufs-weather-model with frac_grid=T and GFDL microphysics, and with restarting the model when frac_grid=T (from @shansun6 and @SMoorthi-emc). * Land stochastic perturbations (#57) * dycore options to add zero-gradient BC to reconstruct interface u/v and change dz_min as input (ufs-community#369) * Update fv3atm * update ccpp control test forecast length to 24h * remove rename command * Add CI related changes * Update RT logs * Update RT log files * Add the gaea RT log file * Update the point of fv3atm * Update fv3atm Co-authored-by: Jun Wang <junwang-noaa@users.noreply.github.com> Co-authored-by: MinsukJi-NOAA <minsuk.ji@noaa.gov> Co-authored-by: Jun Wang <37633869+junwang-noaa@users.noreply.github.com> * MOM6 bugfixes, GFDL update, update CDMBGWD settings; fix for restart reproducibility (without waves) when USE_LA_LI2016=True, sign error on fprec passed to ocean, GFDL update, resolution dependent cdmbgwd settings (ufs-community#379) * implements two MOM6 bugfixes in the NUOPC MOM6 cap to allow restart reproducibility when USE_LA_LI2016=True and to change the sign of the latent heat flux associated with frozen precipitation (fprec) exported to MOM6 * updates MOM6 to include the GFDL 20210120 main branch which contains EMC's wave coupling code, alone with some minor code standardization and documentation * updates the cdmbgwd namelist settings for FV3 standalone tests at C96 and implements resolution dependent values for ufs-cpld tests Co-authored-by: Ali <ali.abdolali@noaa.gov> * Remove legacy gnumake build from fv3atm and NEMS, remove legacy Python 2.7 support, rename v16beta to v16 and RT updates (ufs-community#384) * Update .gitmodules and submodule pointers for fv3atm and NEMS * Remove Python 2.7 support from top-level CMakeLists.txt * Reduce forecast length of test fv3_ccpp_gfs_v16_RRTMGP_c192L127 from 24h to 12h * Rename v16beta to v16 everywhere except the public release documentation * Bugfixes and missing changes * Remove 'export CCPP_LIB_DIR=ccpp/lib' from all regression tests * Update regression test baseline date tag to 20210128; skip-ci * Update ecflow-python environment on cheyenne and jet; skip-ci * Update CMEPS for HAFS integration; add datm and coupled-model tests on Gaea (ufs-community#401) * Add HAFS support in NOAA-EMC/CMEPS * Add coupled and datm tests for Gaea.intel Co-authored-by: Jun Wang <junwang-noaa@users.noreply.github.com> Co-authored-by: Bin Li <Bin.Li@gaea13.ncrc.gov> * Move LSM vegetation lookup tables into CCPP, clean up RUC snow cover on ice initialization (remove IPD step 2) (ufs-community#407) * Regression test logs for all tier=1 platforms * updates FMS to 2020.04.01 (ufs-community#392) * updates FMS to 2020.04.01 * fixes fms_files.cmake * removes extra horiz_interp * Workaround for FMS 2020.04.01 for Cheyenne with GNU 9.1.0, incl. regression test log Co-authored-by: Mikyung Lee <mlee@Orion-login-1.HPC.MsState.Edu> Co-authored-by: Dom Heinzeller <climbfuji@ymail.com> * add optional mesh in MOM6; add dz_min and min_seaice as configurable variables for coupled model (ufs-community#399) *Implements an optional setting in the cpld and datm nems.configure files to specify whether the MOM6 cap should use a mesh or a grid *Adds configurable settings for min_seaice to gfs_physics_nml and dz_min to fv_core_nml. * UGWP v0 v1 combined (ufs-community#396) - combines the changes in PRs ufs-community#360 and ufs-community#382 - adds three regression tests `fv3_ccpp_gfsv16_ugwpv1 `, `fv3_ccpp_gfsv16_ugwpv1_warmstart` and `fv3_ccpp_gfsv16_ugwpv1_debug` - contains updates and bugfixes for `nc_compare.py` and the CI tests from @MinsukJi-NOAA - update Python3 environment on jet.intel, gaea.intel, cheyenne.{intel,gnu} - turn off (again) test `fv3_ccpp_decomp` on jet.intel, this test didn't work in the past, but recently it "passed", because the error checking with `nc_compare.py` failed silently and we didn't notice it Co-authored-by: valery.yudin <valery.yudin@noaa.gov> Co-authored-by: Michael Toy <michael.toy@noaa.gov> Co-authored-by: MinsukJi-NOAA <minsuk.ji@noaa.gov> * Update regression tests from GFSv15+Thompson to GFSv16+Thompson, include "Add one regional regression test in DEBUG mode. (ufs-community#419)" (ufs-community#421) * Add one regional regression test in DEBUG mode. * Update .gitmodules and submodule pointer for fv3atm for code review and testing * Update regression tests from GFSv15+Thompson to GFSv16+Thompson * Combine several COMPILE lines in tests/rt.conf and tests/rt_gnu.conf * Regression test log for cheyenne.{gnu,intel},gaea.intel, hera.gnu, jet.intel,hera.intel,orion.intel;wcoss_cray and wcoss_dell_p3; Co-authored-by: Phil Pegion <38869668+pjpegion@users.noreply.github.com> Co-authored-by: jiandewang <jiande.wang@noaa.gov> Co-authored-by: Denise Worthen <denise.worthen@noaa.gov> Co-authored-by: Dusan Jovic <48258889+DusanJovic-NOAA@users.noreply.github.com> Co-authored-by: Mark Potts <33099090+mark-a-potts@users.noreply.github.com> Co-authored-by: BinLi-NOAA <bin.li@noaa.gov> Co-authored-by: dustinswales <dustin.swales@noaa.gov> Co-authored-by: Kyle Gerheiser <3209794+kgerheiser@users.noreply.github.com> Co-authored-by: RatkoVasic-NOAA <37597874+RatkoVasic-NOAA@users.noreply.github.com> Co-authored-by: Ali.Abdolali <37336972+aliabdolali@users.noreply.github.com> Co-authored-by: Jun Wang <junwang-noaa@users.noreply.github.com> Co-authored-by: Jun Wang <37633869+junwang-noaa@users.noreply.github.com> Co-authored-by: XiaqiongZhou-NOAA <48254930+XiaqiongZhou-NOAA@users.noreply.github.com> Co-authored-by: Ali <ali.abdolali@noaa.gov> Co-authored-by: Bin Li <Bin.Li@gaea13.ncrc.gov> Co-authored-by: MiKyung Lee <58964324+mlee03@users.noreply.github.com> Co-authored-by: valery.yudin <valery.yudin@noaa.gov> Co-authored-by: Michael Toy <michael.toy@noaa.gov> Co-authored-by: MinsukJi-NOAA <minsuk.ji@noaa.gov>
* upstream/develop: update MOM6 to GFDL 20210224 main branch commit (ufs-community#439) Add GNU and Cheyenne Support to Automated RT (ufs-community#444) Move Noah MP init to CCPP and update Noah MP regression tests, ice flux init bug fix in CCPP (ufs-community#425) Feature/rt automation (ufs-community#403) Update ccpp-physics. Make RRTMGP thread safe (ufs-community#418) Update regression tests from GFSv15+Thompson to GFSv16+Thompson, include "Add one regional regression test in DEBUG mode. (ufs-community#419)" (ufs-community#421) UGWP v0 v1 combined (ufs-community#396) add optional mesh in MOM6; add dz_min and min_seaice as configurable variables for coupled model (ufs-community#399) updates FMS to 2020.04.01 (ufs-community#392) Move LSM vegetation lookup tables into CCPP, clean up RUC snow cover on ice initialization (remove IPD step 2) (ufs-community#407) Update CMEPS for HAFS integration; add datm and coupled-model tests on Gaea (ufs-community#401) Remove legacy gnumake build from fv3atm and NEMS, remove legacy Python 2.7 support, rename v16beta to v16 and RT updates (ufs-community#384) MOM6 bugfixes, GFDL update, update CDMBGWD settings; fix for restart reproducibility (without waves) when USE_LA_LI2016=True, sign error on fprec passed to ocean, GFDL update, resolution dependent cdmbgwd settings (ufs-community#379) dycore options to add zero-gradient BC to reconstruct interface u/v and change dz_min as input (ufs-community#369) Update develop from NOAA-GSL: RUC ice, MYNN sfclay, stochastic land perturbations (ufs-community#386) update cpl gfsv16 tests, rrtmgp fix and bug fixes in cmeps (ufs-community#378) point fv3 to EMC develop branch (ufs-community#377) Remove IPD steps 3 and 5 (ufs-community#357) Update CMEPS (ufs-community#345) Implementation of CCPP timestep_init and timestep_final phases (ufs-community#337) Remove unnecessary SIMD instruction sets for Jet, first round of cleanup in rt.conf, initialize cld_amt to zero for regional runs (dycore) (ufs-community#353) add frac grid input, update and add additional cpld tests (ufs-community#354) Add checkpoint restarts for ufs-cpld (ufs-community#342) Update the format of rt.conf (ufs-community#349) Remove IPD (step 1) (ufs-community#331) Feature/ww3update (ufs-community#334) Replace old regional SDF with FV3_GFS_v15_thompson_mynn (ufs-community#333) Update modules with hpc-stack v1.1.0 (ufs-community#319) Regression test log for PR ufs-community#323 for jet.intel (ufs-community#336) RRTMGP and Thompson MP coupling (ufs-community#323) Add 2 new tests for DATM-MOM6-CICE6 application (ufs-community#332) Add optional bulk flux calculation in ufs-datm (ufs-community#266) Final-final GFS v16 updates / restart reproducibility bugfixes (ufs-community#325) Updates to build for JEDI linking/control, add wcoss2 (ufs-community#295) Update CICE, Move regression test input outside baseline directory (ufs-community#270) Feature/update mom6 and retain b4b results for 025x025 resolution (ufs-community#290) Update for Jet, bug fixes in running with frac_grid=T and GFDL MP, and in restarting with frac_grid=T (ufs-community#304) Updates to stochastic_physics_wrapper (ufs-community#280) Update develop from gsd/develop 2020/11/20: Unified gravity wave drag, updates to other GSL physics (ufs-community#297) Fix to allow quilting with non-factors for layout (ufs-community#250) rt update (ufs-community#261)
* Add preamble script from global workflow. * Call preamble script in j-jobs and ex-scripts * Call preamble in other scripts. * Make names of j-jobs and ex-scripts consistent. * Working towards nco vars in table 1. * Change default bin directory to exec * Appen FATAL ERROR to print_err_msg_exit. * Replace some cp, cd, mkdir calls with their corresponding _vrfy versions * Add job and jobid to the job-card. * Add cyc and subcyc to rocoto xml * Add a j-job preamble script for setpdy. * Add a j-job postamble as well. * Define some Table 1 vars in setup. * Remove unused SRC_DIR, and rename others * Rename CYCLE_BASEDIR to COMIN_BASEDIR * Create the NCO root directories in setup. * Remove source machine file wrapper. * Bug fix in job_preamble. * Make make_ics/lbcs use DATA directory properly. * Make run_fcst use DATA directory properly. * Made run_post use DATA directory properly. * Make make_grid use DATA properly (untested). * Make make_sfc_climo use DATA properly (untested). * Make make_orog use DATA properly (untested). * Bug fix for none-nco mode. * Don't pass arguments from j-jobs to ex-scripts. * Make forecast and post-output go to COMOUT. * Remove CYCLE_DIR and use COMIN instead. * Bug fix for community mode. * Append cyc to COMIN in NCO mode. * Fix rocoto run_post dependency with run_fcst issue. * Use OPSROOT instead of PTMP and STMP. * Move nco vars in config_defaults. * Move logdir location to COMROOT. * Set all root directories to EXPTDIR in community mode. * Use pgmout and pgmerr. * Fix inline post. * Make pgmout/err redirection work with community mode. * Use print_err in get_obs_mrms. * Add prep_step. * Add post_step. * Add dbn_alert to post-processed grib2 output. * Download extrn files directly to COMIN. * Make make_ics/lbcs directly output to COMIN. * Change names of extrn_mdl_var_defns files. * Name fixes for DO_ENSEMBLE=false, dyn/phy * Don't create symlinks to grib2 files in NCO mode. * Append rrfs to make_ics/lbcs output. * Modify extrn_mdl_var_defns names. * Move forecast output to DATA/RUN.PDY. This location can be used to store output of other tasks as well. * Move templates to parm. * Fix for new parm location. * Move metplus one level up. * Fixes for community mode. * Rename SCRIPTSDIR and JOBSDIR. * Move all FIX** directories in to a fix/ directory. * Make FIXrrfs be EXPTDIR for community mode. * Symlink upp and ufs_utils parm files to top level parm directory. * Remove UPP_DIR and UFS_UTILS_DIR. * Define cycle with subcyc when it is non-zero. * Don't delete COMIN_BASEDIR if it already exists. * Disassociate NCO mode from pre-generated grid. * Don't choose fix location based on RUN_ENVIR. * Bug fix in make_lbcs. * Add flag to symlink or copy fix files. * Change slurm log file locations * Minor fix for inline post in nco mode. * Add unique workflow ID to avoid clashes between different runs, while keeping the relation between different tasks, which PID can not do. * Make verification tasks NCO complaint. * Pass RUN_ENVIR to we2e script. * Fixes for merge conflicts. * Add versions for wcoss2. * Fix symlinks. * Minor changes. * Move grid/orog/sfcc completion files to EXPTDIR/grid/orog etc. * Output modified namelist file with seeds in current directory. * Fixes for unittests. * Bugfix wrf_io version * Fix CI issue with bin locations. * Allow NCO root directories to be set individually. * Don't append workflow id in community mode. * Add helper script to rename model e.g. rrfs->aqm * Bug fixes and naming changes for consitency. * Replace instances of USHrrfs etc with a generic USHdir etc. * Add unittest for whole workflow now that the merge made it possible. * Remove unused process_args utility. * Remove hard coded paths from configs. * Don't replace existing var value with None. * Add config.nco to unittest. * Fix for Orion issue. * Fix default OPSROOT location in run_we2e. * Modeify setup_we2e script to run fundamental tests on all machines. * Fix conflicting ics/lbcs temp location by moving to DATA. * Bug fix in load_modules taken from PR #353. * Specify default shell instead of symlinking. * Turn off grid/orog/sfc_climo tasks for NCO test cases. * Use PDY and cyc in ex-scripts. * Remove CDATE from xml and define int job_preamble. * Use machine specific list of tests if available. * Run all tests in community mode so that the last NCO test case gets reported as finished. * Minor changes * Avoid using preamble in functions. * Use preamble in function too. * Turn on debugging for utility functions. * Turn on debug & verbose in CI. * Turn off set -e for init_env
* update lmod * update lmod * update hpc-stack and miniconda * fix lmod-setup.sh bug for Gaea * update files to run with new miniconda and MET VX * fix typo * fixed typo * update vx task * Update build_gaea_intel The list of modules to be loaded needs updates. * Update load_modules_run_task.sh Fixed a typo * Update load_modules_run_task.sh * updated vx task Co-authored-by: Parallel Works app-run user <Edward.Snyder@mgmt-edwardsnyder-pclusternoaav2-00061.pw-noaa-us-east-1.pw.local> Co-authored-by: Parallel Works app-run user <Edward.Snyder@mgmt-edwardsnyder-pclusternoaav2-00062.pw-noaa-us-east-1.pw.local> Co-authored-by: Parallel Works app-run user <Edward.Snyder@mgmt-edwardsnyder-pclusternoaav2-00063.pw-noaa-us-east-1.pw.local> Co-authored-by: Parallel Works app-run user <Edward.Snyder@mgmt-edwardsnyder-pclusternoaav2-00064.pw-noaa-us-east-1.pw.local> Co-authored-by: Natalie Perlin <68030316+natalie-perlin@users.noreply.github.com>
Description
1. Remove unnecessary SIMD instruction sets for Jet
When option
SIMDMULTIARCH
is used at compile time (currently only on Jet), four SIMD instruction sets are compiled into the executable. The cmake configcmake/Intel.cmake
defines these flags for the build. By reducing the instruction sets fromto
we are reducing the compile time by about 50%. Further, nobody has looked at the results (in terms of performance and accuracy) when
CORE-AVX512
is used. With the two flags-axSSE4.2,CORE-AVX2
we can run the same high performance code on the newer jet partitions that we run on other Intel systems (NOAA RDHPC, NCAR, ...), and we have a fallback option for the older Jet platforms that do not understand AVX2 SIMD instruction sets.2. First round of cleanup in rt.conf
See #352 for a detailed description. This work is in preparation and to facilitate the upcoming overhaul of the global model regression tests.
Also included:
rt_stampede.conf
(we are not running regression tests there, no need to maintain that file)Issue(s) addressed
Testing
Regression tests will be run on all tier-1 platforms. For systems for which the baseline is not expected to change (see below), the existing baseline will be copied to the new date tag and used to verify against. For systems for which the baseline is expected to change, a new baseline will be created and used to verify against.
No changes are expected on the following systems:
rt.sh
against existing baseline on 12/31/2020)rt.sh
against existing baseline on 12/31/2020)rt.sh
against existing baseline on 12/31/2020)Changes are expected on the following systems (because additional tests are run and, for jet only, the compiler flags have changed):
Final regression testing on 01/06/2021: all tests passed, logs updated in the PR.
Dependencies
NCAR/ccpp-physics#539
NOAA-EMC/GFDL_atmos_cubed_sphere#49
NOAA-EMC/fv3atm#220
#353