Jgfouca/branch for acme split 2018 08 08 #2743

jgfouca · 2018-08-08T18:29:01Z

Test suite: scripts_regresion_tests
Test baseline:
Test namelist changes:
Test status: bit for bit

Fixes #1543

User interface changes?: N

Update gh-pages html (Y/N)?: N

Code review: @jedwards4b

Incorporates fix from billsacks/cime@4cc4f2b [BFB]

So that the dot cannot go on to make an invalid case name. [BFB]

Fix format statement preventing write of coupler aux history files. [BFB]

[BFB]

craype, and hdf5/netcdf related modules

Set environment variable HDF5_DISABLE_VERSION_CHECK=2 for edison

When BFBFLAG is set to true and when INFO_DBUG > 1, the routine seq_diag_avect_mct uses a reproducible sum algorithm that is not as accurate as the algorihm implemented in shr_reprosum_calc. In particular, when summing a vector of INFs, the current algorithm returns zero. Here we replace the existing algorithm with a call to shr_reprosum_calc. This change is BFB for standard usage (INFO_DBUG == 1). It is not BFB with respect to the associated diagnostic, written to cpl.log, when INFO_DBUG > 1. However, these diagnostics are not used in the simulation, and simulation results will BFB. [BFB]

shr_reprosum_calc aborts if input summands include INF or NaN values. For debugging purposes, it can be useful to allow INF or NaN values, returning the IEEE standard results for such a situation (either NaN, positive INF, or negative INF, depending on the situation). An optional logical parameter, allow_infnan, is being added to the shr_reprosum_calc. When set to .true. the routine determines whether summands for an existing field contain NaN or INF values and returns the appropriate value without going through the reproducible sum algorithm (which is very slow and requires signficant memory when summing these special values). Other fields in a multiple field call to shr_reprosum_calc will be computed in the usual fashion. When allow_infnan == .false. or when the parameter is omitted, then the routine aborts with an informative error message when the input contain INF or NaN values, as is done currently. The default can be changed (from allow_infnan=.false. to allow_infnan=.true.) via a new optional parameter, repro_sum_allow_infnan_in, in shr_reprosum_setopts. A new drv_in namelist parameter, reprosum_allow_infnan, has also been added that will be passed to shr_reprosum_setopts to set the default. This can be set in user_nl_cpl. Since the default is not being changed, this change is BFB. If allow_infnan is set to .true., then runs that failed because of INFs or NaNs would now continue to run (longer), but jobs that did not fail with the original default will be BFB even with the default changed. [BFB]

[BFB]

Use ENV to retrieve it from the environment.

Fix domain specification for T42 configuration. This previously omitted the ice domain, so when trying to build a T42_T42 grid configuration, the ice domain would remain unset. This is relevant to the single column model, which uses the T42 grid by default.

…version of netcdf/hdf5. Also turn off file locking HDF5_USE_FILE_LOCKING=FALSE. And turn on logging of MPI rank with compute node, which adds a lot to e3sm.log*

An integration test is added for BGCEXP_BCRC_CNPECACNT_1850 compset

Upstreammerge in order to pull-in fixes from recent CIME updates. * master: Update CIME to ESMCI cime5.7.0 2 (#2428) Remove initialization of tbot to posinf

Add capture of system status and current workload for Summit to CIME performance provenance capture logic. [BFB]

Add syslog.summit checkpointing script for monitoring job progress. [BFB]

…pdate-NOLOCK (PR #2424) Update versions of modules for netcdf/hdf5 at NERSC. On edison, remove the variable that was disabling the HDF version check. Turn off file locking HDF5_USE_FILE_LOCKING=FALSE for NERSC machines. Turn on logging of MPI rank and compute node information for Cori.

Update CIME to ESMCI cime5.7.0-3 Squash merge of jgfouca/branch-for-to-acme-2018-07-12 Bug fixes: Another critical V2 build fix. [BFB]

Another upstream merge to pull in more V2 fixes. * master: Update CIME to ESMCI cime5.7.0-3 (#2437) Add an all-active multi-instance test to e3sm_integration Change milti-instance infile names back to old form For all 3 NERSC machines (cori-knl, cori-haswell,edison), update the version of netcdf/hdf5. Also turn off file locking HDF5_USE_FILE_LOCKING=FALSE. And turn on logging of MPI rank with compute node, which adds a lot to e3sm.log*

Any fail in a core phase should cause the test status to be FAIL.

…upgrade_wait_for_tests_cdash * jgfouca/cime/wait_for_test_upgrades: wait_for_test logging working

Currently MPI task to compute node mapping information is output in two locations, once in CAM, where it is truncated after the first 256 MPI tasks, and once in CLM, where it is truncated after the first 100 MPI tasks, both only for these two components. This is not useful in current production runs. The use of environment variables, such as MPICH_CPUMASK_DISPLAY on Cray systems, generate data that are unnecessarily verbose for our needs. Here a share routine is introduced that writes out one line per compute node. Each line contains the compute node name and the list of MPI tasks assigned to that node for a given communicator. This is then called in the driver and writes out the task-to-node mapping for the entire coupled model. Separate branches will then introduce this into the individual components, replacing the current logic in both CAM and CLM, for example. The share routine also optionally returns the number of compute nodes and the task-to-node mapping, which is needed in the internal CAM load balancing. With the call to the shr_taskmap_write routine in the driver, the mapping data generated by the system when setting the corresponding environment variable is redundant. This is removed for the systems currently setting the variable. Fixes #2457 BFB * origin/worleyph/cime/taskmap: Avoid empty env blocks Remove unnecessary white space in task-to-node map output Modify driver output format Uncomment MV2_CPU_MAPPING definition for Anvil Modify task map output format Unset environment variables to output task-to-node mapping Output MPI task to compute node mapping

…(PR #2480) Big update to wait_for_tests/jenkins Changes: * Upgrade cdash XML spoofing for prettier cdash pages for e3sm * Add ability for jenkins jobs to use alternate baseline area, nice for test cleanup * New test test suite that includes DIFFs * Add ability to turn off all test teardowns from scripts_regr command line * New Jenkins test to upload a realistic dashboard result * Make TESTBUILDFAIL produce a bldlog file (So that uploading of log files for failed builds can be tested) * Add logging to TestStatus processing (waiting) [BFB] * jgfouca/cime/upgrade_wait_for_tests_cdash: Make pylint happy Remove useless commented-out code Make TESTBUILDFAIL produce a bldlog file Big update to wait_for_test/jenkins wait_for_test logging working Remove debug stuff Progress

For a long time, only model build logs were being uploaded. [BFB]

…2484) This PR implements "smart" archiving of old jenkins test results. The previous implementation simply deleted any result that looked like it came from a previous jenkins run of the same job. The new implementation will scan these old results, populating the directories $CIME_OUTPUT_ROOT/old_test_archive/old_cases $CIME_OUTPUT_ROOT/old_test_archive/old_builds $CIME_OUTPUT_ROOT/old_test_archive/old_runs $CIME_OUTPUT_ROOT/old_test_archive/old_archives ... with the appropriate directories from previous runs. The system will allow this old_test_archive directory to fill up until it reaches MAX_GB_OLD_TEST_DATA of data. Once that happens, old job data will be delete in chronological order until we are under MAX_GB worth of data. This MAX_GB is a new per-machine setting for e3sm. [BFB] * jgfouca/cime/archive_old_test_results: Restore melvin to 1TB of test data Lots of fixes Progress

* esmci_remote_for_split/master: (651 commits) Make pylint happy Make it so key members are always defined. add exception for archive_metadata :-( add average and aux cpl hist files response to comments fix pylint issues exclude baselines redo regex for extension matching update for mom add debug info add debug info remove unused glob import fix whitespace issue remove debug print statements use archive info have hist_utils use archive.xml info fix merge issue remove trailing whitespace need to specify compiler to mpilibs Use RawConfigParser instead of ConfigParser ...

jgfouca

Annotations complete.

jgfouca · 2018-08-08T18:30:40Z

scripts/Tools/Makefile

@@ -180,7 +180,7 @@ ifdef NETCDF_C_PATH
    LIB_NETCDF_C:=$(NETCDF_C_PATH)/lib
  endif
  ifndef LIB_NETCDF_FORTRAN
-    LIB_NETCDF_FORTRAN:=$(NETCDF_C_PATH)/lib
+    LIB_NETCDF_FORTRAN:=$(NETCDF_FORTRAN_PATH)/lib


Please review this change.

This is fine

jgfouca · 2018-08-08T18:31:24Z

scripts/Tools/xmlquery

@@ -330,10 +330,11 @@ def _main_func(description):
        wrapper=textwrap.TextWrapper()
        wrapper.subsequent_indent = "\t\t\t"
        wrapper.fix_sentence_endings = True
+
+    cnt = 0


This is a bugfix for xmlquery. The output was wrong with --value when multiple values were requested (comma was in the wrong place).

jgfouca · 2018-08-08T18:32:26Z

scripts/lib/CIME/SystemTests/system_tests_common.py

@@ -601,6 +601,11 @@ def build_phase(self, sharedlib_only=False, model_only=False):
            TESTRUNPASS.build_phase(self, sharedlib_only, model_only)
        else:
            if (not sharedlib_only):
+                blddir = self._case.get_value("EXEROOT")
+                bldlog = os.path.join(blddir, "{}.bldlog.{}".format(get_model(), get_timestamp("%y%m%d-%H%M%S")))


Make TESTBUILDFAIL more realistic by having it produce a log file.

jgfouca · 2018-08-08T18:33:19Z

scripts/lib/CIME/test_status.py

+    rv = None
+    for perm in itertools.permutations(lines):
+        ts = TestStatus(test_dir="/", test_name="ERS.foo.A")
+        ts._parse_test_status("\n".join(perm)) # pylint: disable=protected-access


Big increase in robustness of testing. All permutations of phase orders are tested, they should all produce a consistent result.

jgfouca · 2018-08-08T18:34:49Z

scripts/lib/CIME/test_status.py

+        rv = TEST_PASS_STATUS
+        run_phase_found = False
+        for phase in phases: # ensure correct order of processing phases
+            if phase in self._phase_statuses:


e3sm was having problems with incorrect test status reports for tests, so I refactored this code a bit. Basically, the idea is to give priority to the "core" phases when trying to determine the overall test status.

jgfouca · 2018-08-08T18:37:08Z

scripts/lib/CIME/wait_for_tests.py

+    xmlet.SubElement(phase_elem, "StartDateTime").text = time.ctime(current_time)
+    xmlet.SubElement(phase_elem, "Start{}Time".format("Test" if phase == "Testing" else phase)).text = str(int(current_time))
+
+    return site_elem, phase_elem


Lots of e3sm dashboard upgrades, should not impact cesm.

jgfouca · 2018-08-08T18:37:35Z

scripts/lib/CIME/wait_for_tests.py

+        test_log_path = "/dev/null"
+
+    prior_ts = None
+    with open(test_log_path, "w") as log_fd:


wait_for_tests now logs it's behavior when waiting for a test.

jgfouca · 2018-08-08T18:38:40Z

scripts/tests/scripts_regression_tests.py


            # TODO: Any further checking of xml output worth doing?

+    ###########################################################################
+    def live_test_impl(self, testdir, expected_results, last_phase, last_status):


Add new "live" tests for wait_for_test testing. These tests handle dynamic TestStatus files instead of static.

jgfouca · 2018-08-08T18:39:15Z

scripts/tests/scripts_regression_tests.py

@@ -2911,6 +3010,9 @@ def _main_func(description):
    parser.add_argument("--no-cmake", action="store_true",
                        help="Do not run cmake tests")

+    parser.add_argument("--no-teardown", action="store_true",


Add ability to disable teardowns from command line.

jgfouca · 2018-08-08T18:39:46Z

src/drivers/mct/main/cime_comp_mod.F90

@@ -935,6 +936,7 @@ subroutine cime_pre_init2()
         wall_time_limit=wall_time_limit           , &
         force_stop_at=force_stop_at               , &
         reprosum_use_ddpdd=reprosum_use_ddpdd     , &
+         reprosum_allow_infnan=reprosum_allow_infnan, &


Someone else will have to explain fortran changes.

jgfouca · 2018-08-08T18:40:58Z

@rljacob , did we make significant changes to the coupler on the e3sm side?

jedwards4b · 2018-08-08T19:26:33Z

scripts/Tools/Makefile

@@ -180,7 +180,7 @@ ifdef NETCDF_C_PATH
    LIB_NETCDF_C:=$(NETCDF_C_PATH)/lib
  endif
  ifndef LIB_NETCDF_FORTRAN
-    LIB_NETCDF_FORTRAN:=$(NETCDF_C_PATH)/lib
+    LIB_NETCDF_FORTRAN:=$(NETCDF_FORTRAN_PATH)/lib


This is fine

jedwards4b · 2018-08-08T19:30:20Z

scripts/lib/CIME/provenance.py

@@ -23,6 +23,8 @@ def _get_batch_job_id_for_syslog(case):
            return os.environ["SLURM_JOB_ID"]
        elif mach in ['mira', 'theta']:
            return os.environ["COBALT_JOBID"]
+        elif mach in ['summit']:


I understand that this is just extending bad code already present, but we shouldn't need to check the machine name here, instead we should get case.get_value(BATCH_SYSTEM)

jedwards4b · 2018-08-08T19:35:38Z

src/drivers/mct/cime_config/namelist_definition_drv.xml

+  <entry id="reprosum_allow_infnan">
+    <type>logical</type>
+    <category>reprosum</category>
+    <group>seq_infodata_inparm</group>


yuch - but I guess as long as the default is false...

jedwards4b · 2018-08-08T19:48:01Z

One issue:

scripts/lib/jenkins_generic_job.py
No exception type(s) specified
114
except:

py3 expects an exception type

jgfouca · 2018-08-08T19:48:54Z

@jedwards4b , yeah, after tests pass I'll be sure to get travis working.

jgfouca · 2018-08-08T22:40:21Z

There will be some additional fails on Melvin on the dashboard until we fix an OpenMP issue on that machine.

jedwards4b · 2018-08-13T20:34:59Z

@jgfouca was the scripts regression test run on a machine with fortran unit test support?
The Fortran unit testing is broken:
https://my.cdash.org/index.php?project=CIME&date=2018-08-13

Gautam Bisht and others added 30 commits May 2, 2018 13:47

Fixes creation of coupler history filename

6b17573

Incorporates fix from billsacks/cime@4cc4f2b [BFB]

Replace dots in branch name with underscores

3853dee

So that the dot cannot go on to make an invalid case name. [BFB]

Take out slashes too

3cefbb5

Merge branch 'bishtgautam/cime/mct-bugfix' (PR #2317)

fe872c3

Fix format statement preventing write of coupler aux history files. [BFB]

Ignore unicode chars from command output

f5dc00f

[BFB]

Update module versions for edison after maintenance.

858887a

craype, and hdf5/netcdf related modules

Add evn setting to edison to disable HD5 check

614a91a

Set environment variable HDF5_DISABLE_VERSION_CHECK=2 for edison

Vectorize loops in coupler attr-vector multiplications

be0aae3

Minor cleanup

440db83

Update E3SM config_compilers.xml to V2

3bffe68

[BFB]

Remove pop support

e2ba026

Cannot use $NETCDF_PATH unless it is defined in the block

7953a53

Use ENV to retrieve it from the environment.

Fix up titan

6a85d5c

Fixups for PNETCDF

c163008

Remove dupes

35137b3

For all 3 NERSC machines (cori-knl, cori-haswell,edison), update the …

f91303d

…version of netcdf/hdf5. Also turn off file locking HDF5_USE_FILE_LOCKING=FALSE. And turn on logging of MPI rank with compute node, which adds a lot to e3sm.log*

CIME scripts changes to support ne4_oQU480 and T62_oQU480 configurations

feefcb0

Adds a SMS land test for a BGC compset

e990647

An integration test is added for BGCEXP_BCRC_CNPECACNT_1850 compset

Merge branch 'master' into jgfouca/cime/update_compilers_to_v2

1065c0f

Upstreammerge in order to pull-in fixes from recent CIME updates. * master: Update CIME to ESMCI cime5.7.0 2 (#2428) Remove initialization of tbot to posinf

Fix qopenmp issues, remove old intel specs

1a0417d

Add system workload provenance capture on Summit

4cce389

Add capture of system status and current workload for Summit to CIME performance provenance capture logic. [BFB]

Enable job progress monitoring on Summit

c1e4e78

Add syslog.summit checkpointing script for monitoring job progress. [BFB]

Update CIME to ESMCI cime5.7.0-3 (#2437)

f90a16e

Update CIME to ESMCI cime5.7.0-3 Squash merge of jgfouca/branch-for-to-acme-2018-07-12 Bug fixes: Another critical V2 build fix. [BFB]

Corrects the simulation length of FC5AV1C-L test

ea3bdad

Fix test_status determination of overall test status

ce709f4

Any fail in a core phase should cause the test status to be FAIL.

jgfouca added 11 commits August 3, 2018 16:36

Remove useless commented-out code

0cbbc8a

Merge branch 'jgfouca/cime/wait_for_test_upgrades' into jgfouca/cime/…

5b7ee4e

…upgrade_wait_for_tests_cdash * jgfouca/cime/wait_for_test_upgrades: wait_for_test logging working

Make pylint happy

635294d

Progress

f67e01c

Be sure to upload build logs for sharedlib build fails

05f8299

For a long time, only model build logs were being uploaded. [BFB]

Lots of fixes

b75e805

Restore melvin to 1TB of test data

44a523e

jgfouca added ty: enhancement tp: CIMElib tp: script tools labels Aug 8, 2018

jgfouca self-assigned this Aug 8, 2018

jgfouca requested review from jedwards4b and mvertens August 8, 2018 18:29

rljacob added the in progress label Aug 8, 2018

jgfouca commented Aug 8, 2018

View reviewed changes

jgfouca requested a review from rljacob August 8, 2018 18:41

jedwards4b approved these changes Aug 8, 2018

View reviewed changes

Make codacy happy

51b0a64

jgfouca merged commit 07d476d into master Aug 8, 2018

jgfouca deleted the jgfouca/branch-for-acme-split-2018-08-08 branch August 8, 2018 22:46

rljacob removed the in progress label Aug 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jgfouca/branch for acme split 2018 08 08 #2743

Jgfouca/branch for acme split 2018 08 08 #2743

jgfouca commented Aug 8, 2018 •

edited

Loading

jgfouca left a comment

jgfouca Aug 8, 2018

jedwards4b Aug 8, 2018

jgfouca Aug 8, 2018

jgfouca Aug 8, 2018

jgfouca Aug 8, 2018

jgfouca Aug 8, 2018

jgfouca Aug 8, 2018

jgfouca Aug 8, 2018

jgfouca Aug 8, 2018

jgfouca Aug 8, 2018

jgfouca Aug 8, 2018

jgfouca commented Aug 8, 2018

jedwards4b Aug 8, 2018

jedwards4b Aug 8, 2018

jedwards4b Aug 8, 2018

jedwards4b commented Aug 8, 2018

jgfouca commented Aug 8, 2018

jgfouca commented Aug 8, 2018

jedwards4b commented Aug 13, 2018

Jgfouca/branch for acme split 2018 08 08 #2743

Jgfouca/branch for acme split 2018 08 08 #2743

Conversation

jgfouca commented Aug 8, 2018 • edited Loading

jgfouca left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jgfouca commented Aug 8, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jedwards4b commented Aug 8, 2018

jgfouca commented Aug 8, 2018

jgfouca commented Aug 8, 2018

jedwards4b commented Aug 13, 2018

jgfouca commented Aug 8, 2018 •

edited

Loading