Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to cime5.3-alpha10 #1490

Merged
merged 296 commits into from
May 18, 2017
Merged

Update to cime5.3-alpha10 #1490

merged 296 commits into from
May 18, 2017

Conversation

agsalin
Copy link
Member

@agsalin agsalin commented May 4, 2017

Pulling in another 2 weeks of CIME changes into ACME.
CIME hash 15297cd from May 2.

Required bug fixes for:

  • Changing error to warning for using "-id" with single dash
  • acme templates missing "arg ="
  • Restore pio1 Cmake path pointing to pio2/cmake
  • Update to gcc5.3.0 to avoid internal compiler error

Fixes #1343
Fixes #1388
Fixes #1396
Fixes #1401
Fixes #1426
Fixes #1473 (not 100% this one made it in, if not, the next merge will get it)

Still had 2 fails of acme_developer on penn machine, but look to be weird gcc issues -- var tracking overflow. Same tests passed on redsky, so expecting it will pass on ACME machines.

jedwards4b and others added 30 commits April 16, 2017 16:22
Now --user-mods on the command line (including testmods) will take
precedence over the user_mods set by the compset - for user_nl files,
shell_commands and SourceMods.

I have tested this with this diff to the A compset

diff --git a/src/drivers/mct/cime_config/config_compsets.xml b/src/drivers/mct/cime_config/config_compsets.xml
index c11354e..7e6c2c9 100644
--- a/src/drivers/mct/cime_config/config_compsets.xml
+++ b/src/drivers/mct/cime_config/config_compsets.xml
@@ -40,6 +40,7 @@
   <compset>
     <alias>A</alias>
     <lname>2000_DATM%NYF_SLND_DICE%SSMI_DOCN%DOM_DROF%NYF_SGLC_SWAV</lname>
+    <user_mods>/Users/sacks/temporary/user_mods_compset</user_mods>
   </compset>

   <compset>

Along with this create_newcase command:

./create_newcase -case test_0414m -compset A -res f45_g37 \
--run-unsupported \
--user-mods-dir /Users/sacks/temporary/user_mods_command_line

where the contents of the two relevant user_mods directories are:

--- user_mods_compset/shell_commands ---
./xmlchange STOP_N=101
--- user_mods_compset/SourceMods/src.drv/mysrc.F90 ---
user_mods_compset
--- user_mods_compset/user_nl_cpl ---
user_mods_compset

--- user_mods_command_line/shell_commands ---
./xmlchange STOP_N=102
--- user_mods_command_line/SourceMods/src.drv/mysrc.F90 ---
user_mods_command_line
--- user_mods_command_line/user_nl_cpl ---
user_mods_command_line

The final contents are:

--- user_nl_cpl ---
user_mods_compset
user_mods_command_line
--- shell_commands ---
./xmlchange --force STOP_N=102
--- SourceMods/src.drv/mysrc.F90 ---
user_mods_command_line

And

$ ./xmlquery STOP_N
	STOP_N: 102

thus demonstrating that the user_mods on the command-line takes
precedence over the compset's user_mods.
This was already the policy, but was not being enforced correctly.
The previous implementation had two problems:

1. If you specified a user_mods on the command-line along with a compset
that has its own user_mods, then the compset's user_mods get applied
twice.

2. The new place where there was a call to apply_user_mods happened too
early: xmlchange commands can not be done at that point.

This fixes these problems.

I have tested this with the same changes described in
9cc7740. I tested create_newcase with
no user_mods, user_mods just from the command line, user_mods just from
the compset, and user_mods from the command line and the compset.
Make Machines.get_value more likely to return values of the correct type
Force user to always go through case.submit
Now --user-mods on the command line (including testmods) will take
precedence over the user_mods set by the compset - for user_nl files,
shell_commands and SourceMods.

I have tested this with this diff to the A compset

diff --git a/src/drivers/mct/cime_config/config_compsets.xml b/src/drivers/mct/cime_config/config_compsets.xml
index c11354e..7e6c2c9 100644
--- a/src/drivers/mct/cime_config/config_compsets.xml
+++ b/src/drivers/mct/cime_config/config_compsets.xml
@@ -40,6 +40,7 @@
   <compset>
     <alias>A</alias>
     <lname>2000_DATM%NYF_SLND_DICE%SSMI_DOCN%DOM_DROF%NYF_SGLC_SWAV</lname>
+    <user_mods>/Users/sacks/temporary/user_mods_compset</user_mods>
   </compset>

   <compset>

Along with this create_newcase command:

./create_newcase -case test_0414m -compset A -res f45_g37 \
--run-unsupported \
--user-mods-dir /Users/sacks/temporary/user_mods_command_line

where the contents of the two relevant user_mods directories are:

--- user_mods_compset/shell_commands ---
./xmlchange STOP_N=101
--- user_mods_compset/SourceMods/src.drv/mysrc.F90 ---
user_mods_compset
--- user_mods_compset/user_nl_cpl ---
user_mods_compset

--- user_mods_command_line/shell_commands ---
./xmlchange STOP_N=102
--- user_mods_command_line/SourceMods/src.drv/mysrc.F90 ---
user_mods_command_line
--- user_mods_command_line/user_nl_cpl ---
user_mods_command_line

The final contents are:

--- user_nl_cpl ---
user_mods_compset
user_mods_command_line
--- shell_commands ---
./xmlchange --force STOP_N=102
--- SourceMods/src.drv/mysrc.F90 ---
user_mods_command_line

And

$ ./xmlquery STOP_N
	STOP_N: 102

thus demonstrating that the user_mods on the command-line takes
precedence over the compset's user_mods.
The previous implementation had two problems:

1. If you specified a user_mods on the command-line along with a compset
that has its own user_mods, then the compset's user_mods get applied
twice.

2. The new place where there was a call to apply_user_mods happened too
early: xmlchange commands can not be done at that point.

This fixes these problems.

I have tested this with the same changes described in
9cc7740. I tested create_newcase with
no user_mods, user_mods just from the command line, user_mods just from
the compset, and user_mods from the command line and the compset.
Point is: I want to make it hard to miss
@rljacob rljacob self-requested a review May 4, 2017 17:29
@jgfouca
Copy link
Member

jgfouca commented May 9, 2017

@agsalin @rljacob the changes look like what I'd expect. I suggest waiting until the situation is a little better in our nightly tests before merging to next (input repo problems and hung test on melvin).

@rljacob
Copy link
Member

rljacob commented May 10, 2017

I'd like to do a test merge to master and look at it with gitx to make sure it doesn't add another instance of the complete CIME history,

@rljacob rljacob added the CIME label May 13, 2017
Copy link
Member

@rljacob rljacob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a test merge to master and looked at the tree with GitX. It doesn't have another copy of the CIME history. This is ok to merge. There were a couple of conflicts could be resolved with checkout --ours.

@rljacob
Copy link
Member

rljacob commented May 14, 2017

Is this ready to go?

Upstream merge to resolve conflicts.

* master: (93 commits)
  Adds the --force-move option and implies --copy-only when --last-date is specified without --force-move
  added spunup A_WCYCL options (1850S and 2000S) for v0atm
  Updates land initial conditions for ne120_oRRS18v3
  Fix jenkins_generic_job when user selects specific machines.
  Fix small problems with MPAS components from debug tests
  Adds a warning when using the --last-date option and to its help
  Implement the copy_only option for short term archiving. This copies files rather than moving them
  Implemented most of the machinery for testing with "incomplete" log files
  Fix code format issue - replace unused variable with _
  Update template.st_archive
  Adds options to the st_archive to specify the last date (--last-date) to archive, and whether to disable archiving incomplete log files (--no-incomplete-logs)
  Turn off salinity restoring by default until it passes exact restart
  This is a fix to a PR #1501 (github issue #1263). I allocated the tmp_array to be too large and GNU compiler stopped with an array mismatch. Change this to be state%ncol in size, not pcol
  Fix a floating invalid that we found with certain PE's (and with intel v17.02)
  bug fix
  Revert mpas-o commit with new threaded vector reconstruction consistency changes
  Changing jobmin to 1 for batch q on blues
  Modifies SMS_D_Ln5_P8x4 to SMS_Ln5 to avoid cetus and melvin issues
  Point at corrected salinity restoring files for oEC60to30v3 and oRRS18to6v3
  Make sure clean clm cleans up obj dir
  ...
jgfouca added a commit that referenced this pull request May 15, 2017
Update to cime5.3.1

Pulling in another 2 weeks of CIME changes into ACME.
CIME hash 15297cd from May 2.

Required bug fixes for:

    Changing error to warning for using "-id" with single dash
    acme templates missing "arg ="
    Restore pio1 Cmake path pointing to pio2/cmake
    Requires #1464 to be pushed and merged
    Update to gcc5.3.0 to avoid internal compiler error

Still had 2 fails of acme_developer on penn machine, but look to be
weird gcc issues -- var tracking overflow. Same tests passed on
redsky, so expecting it will pass on ACME machines.

* agsalin/update-to-cime5.3.1: (216 commits)
  Make single-dash before multichar arg a warning
  Fix acme template change for new parser
  Update Sandia worksations to gcc5.3.0
  Comment out invalid_args check
  Revert change in pio1 to point into pio2/cmake
  Fix single submit
  Add tests
  Do not override walltime unless test
  Update ChangeLog
  nag compiler needs a width
  Add checks to verify the create_newcase directory was created as expected
  removed unpack commented region
  fix typo in parse_args
  Bug fixes
  Minor pylint fix
  fix pylint issues
  rename function
  updated config_grids.xml to be the same as master
  fixed comment on new aquaplanet mode
  updates to have aquaplanet not depend on a new grid definition with a null mask
  ...
@jgfouca
Copy link
Member

jgfouca commented May 15, 2017

OK, it's on next.

@rljacob
Copy link
Member

rljacob commented May 15, 2017

Why did you merge master to this branch?

@jgfouca
Copy link
Member

jgfouca commented May 16, 2017

@rljacob To resolve conflicts

@rljacob
Copy link
Member

rljacob commented May 16, 2017

Summary of testing on next:

Many expected namelist diffs:
Differences in namelist 'seq_timemgr_inparm' from new features:
found extra variable: 'esp_cpl_offset'
found extra variable: 'esp_run_on_pause'
found extra variable: 'glc_avg_period'

Several pio namelist vars:
BASE: pio_numiotasks = 1
COMP: pio_numiotasks = -99
BASE: pio_stride = 1
COMP: pio_stride = 16

@jayeshkrishna said: ESMCI/cime#1441 was an NML change that reverted a feature that reset strides/tasks etc by default - from "-99" to a value . This caused other issues as discussed in ESMCI/cime#1433 and the resolution was to remove the feature. So the values of strides/num io tasks etc should now be -99 etc, the defaults that were set before the feature was introduced

in drof_in
BASE: datamode = 'DIATREN_ANN_RX1'
COMP: datamode = 'COPYALL'
@bishtgautam said "DIATREN_ANN_RX1 has been removed from DROF"

There were several tests that would have reported only namelist diffs but had FAIL because of a problem with the memory test. This affected any test which had not been blessed in a while and so had an old-format cpl.log Fixed by @jgfouca and re-merged to next.

The ERP test was changed to use BUILD_THREADED=TRUE in all builds which caused some baseline diffs in these tests because they have nthrds=1 one and so used to be built with BUILD_THREADED=FALSE:
ERP_Ln9.ne30_ne30.FC5.skybridge_intel.cam-outfrq9s
ERP_Ld3.ne30_oECv3_ICG.A_WCYCL1850S.skybridge_intel
This ERP test only had namelist diffs because it had 8x4 and so was always built with BUILD_THREADED=TRUE
ERP_Ld5_P8x4.ne4_ne4.FC5AV1C-04P2.skybridge_intel

These tests had the mem test fail, namelist diffs AND baseline diffs for reasons we are still trying to understand:
ERS_IOP.f45_g37_rx1.DTEST.skybridge_intel
ERS.f45_g37_rx1.DTEST.skybridge_intel

@rljacob rljacob changed the title Update to cime5.3.1 Update to cime5.3-alpha10 May 17, 2017
@rljacob
Copy link
Member

rljacob commented May 17, 2017

@agsalin the initial PR comment needs a more complete summary of the changes especially the data models. See PR #1300 for example.

@rljacob
Copy link
Member

rljacob commented May 17, 2017

@jgfouca I'm trying to run a test on blues with this branch and getting an xmllint error:

blogin1[124]: ./create_test SMS.f45_g37_rx1.DTEST.blues_intel --test-id cime531
No project info available
Creating test directory /lcrc/project/ACME/jacob/acme_scratch/SMS.f45_g37_rx1.DTEST.blues_intel.cime531
RUNNING TESTS:
  SMS.f45_g37_rx1.DTEST.blues_intel
Starting CREATE_NEWCASE for test SMS.f45_g37_rx1.DTEST.blues_intel with 1 procs
Finished CREATE_NEWCASE for test SMS.f45_g37_rx1.DTEST.blues_intel in 1.958798 seconds (FAIL). [COMPLETED 1 of 1]
    Case dir: /lcrc/project/ACME/jacob/acme_scratch/SMS.f45_g37_rx1.DTEST.blues_intel.cime531
    Errors were:
        ERROR: Command: '/usr/bin/xmllint --noout --schema /lcrc/group/earthscience/jacob/ACME/cime/config/xml_schemas/config_batch.xsd /lcrc/group/earthscience/jacob/ACME/cime/config/acme/machines/config_batch.xml' failed with error '/lcrc/group/earthscience/jacob/ACME/cime/config/acme/machines/config_batch.xml:161: element queue: Schemas validity error : Element 'queue', attribute 'string': The attribute 'string' is not allowed.
        /lcrc/group/earthscience/jacob/ACME/cime/config/acme/machines/config_batch.xml fails to validate'

Due to presence of batch system, create_test will exit before tests are complete.
To force create_test to wait for full completion, use --wait
At test-scheduler close, state is:
FAIL SMS.f45_g37_rx1.DTEST.blues_intel (phase CREATE_NEWCASE)
    Case dir: /lcrc/project/ACME/jacob/acme_scratch/SMS.f45_g37_rx1.DTEST.blues_intel.cime531
test-scheduler took 2.03897094727 seconds
Exit 100

xmllint version:

blogin1[130]: xmllint --version
xmllint: using libxml version 20706
   compiled with: Threads Tree Output Push Reader Patterns Writer SAXv1 FTP HTTP DTDValid HTML Legacy C14N Catalog XPath XPointer XInclude Iconv ISO8859X Unicode Regexps Automata Expr Schemas Schematron Modules Debug Zlib 

jgfouca and others added 2 commits May 17, 2017 11:01
The version of CIME we started with (15297cd) had a bug in docn which
was fixed in a later version (10fbc43). Instead of starting over,
just bring in the 2 files that fix the bug.
Bug spotted by baseline compare fails with DTEST.
@rljacob
Copy link
Member

rljacob commented May 18, 2017

@jgfouca the PR description also needs a list of ACME bugs fixed. Issues that were "fixed in CIME".

@jgfouca
Copy link
Member

jgfouca commented May 18, 2017

@rljacob done

@jgfouca jgfouca merged commit 0540b7f into master May 18, 2017
jgfouca added a commit that referenced this pull request May 18, 2017
Update to cime5.3-alpha10

Pulling in another 2 weeks of CIME changes into ACME.
CIME hash 15297cd from May 2.

Required bug fixes for:

    Changing error to warning for using "-id" with single dash
    acme templates missing "arg ="
    Restore pio1 Cmake path pointing to pio2/cmake
    Update to gcc5.3.0 to avoid internal compiler error

Still had 2 fails of acme_developer on penn machine, but look to be
weird gcc issues -- var tracking overflow. Same tests passed on
redsky, so expecting it will pass on ACME machines.

[BFB]

* agsalin/update-to-cime5.3.1: (219 commits)
  Add fix to bug in docn that was in this cime version
  Fix upstream merge resolution mistake
  Bug fix: Handle failures to get mem usage from baselines
  Make single-dash before multichar arg a warning
  Fix acme template change for new parser
  Update Sandia worksations to gcc5.3.0
  Comment out invalid_args check
  Revert change in pio1 to point into pio2/cmake
  Fix single submit
  Add tests
  Do not override walltime unless test
  Update ChangeLog
  nag compiler needs a width
  Add checks to verify the create_newcase directory was created as expected
  removed unpack commented region
  fix typo in parse_args
  Bug fixes
  Minor pylint fix
  fix pylint issues
  rename function
  ...
jgfouca pushed a commit that referenced this pull request Jun 2, 2017
Fix pylint errors
Also, add logging-format-interpolation to list of pylint warnings we don't care about. That way, we can use python3 string formatting without getting warnings.

Test suite: code_checker
Test baseline:
Test namelist changes:
Test status: bit for bit

Fixes [CIME Github issue #]

User interface changes?: None

Code review: @JEdwards
jgfouca added a commit that referenced this pull request Jun 2, 2017
Update to cime5.3-alpha10

Pulling in another 2 weeks of CIME changes into ACME.
CIME hash 15297cd from May 2.

Required bug fixes for:

    Changing error to warning for using "-id" with single dash
    acme templates missing "arg ="
    Restore pio1 Cmake path pointing to pio2/cmake
    Update to gcc5.3.0 to avoid internal compiler error

Still had 2 fails of acme_developer on penn machine, but look to be
weird gcc issues -- var tracking overflow. Same tests passed on
redsky, so expecting it will pass on ACME machines.

[BFB]

* agsalin/update-to-cime5.3.1: (219 commits)
  Add fix to bug in docn that was in this cime version
  Fix upstream merge resolution mistake
  Bug fix: Handle failures to get mem usage from baselines
  Make single-dash before multichar arg a warning
  Fix acme template change for new parser
  Update Sandia worksations to gcc5.3.0
  Comment out invalid_args check
  Revert change in pio1 to point into pio2/cmake
  Fix single submit
  Add tests
  Do not override walltime unless test
  Update ChangeLog
  nag compiler needs a width
  Add checks to verify the create_newcase directory was created as expected
  removed unpack commented region
  fix typo in parse_args
  Bug fixes
  Minor pylint fix
  fix pylint issues
  rename function
  ...
@jgfouca jgfouca deleted the agsalin/update-to-cime5.3.1 branch June 7, 2017 14:28
jgfouca added a commit that referenced this pull request Feb 27, 2018
Update to cime5.3-alpha10

Pulling in another 2 weeks of CIME changes into ACME.
CIME hash 15297cd from May 2.

Required bug fixes for:

    Changing error to warning for using "-id" with single dash
    acme templates missing "arg ="
    Restore pio1 Cmake path pointing to pio2/cmake
    Update to gcc5.3.0 to avoid internal compiler error

Still had 2 fails of acme_developer on penn machine, but look to be
weird gcc issues -- var tracking overflow. Same tests passed on
redsky, so expecting it will pass on ACME machines.

[BFB]

* agsalin/update-to-cime5.3.1: (219 commits)
  Add fix to bug in docn that was in this cime version
  Fix upstream merge resolution mistake
  Bug fix: Handle failures to get mem usage from baselines
  Make single-dash before multichar arg a warning
  Fix acme template change for new parser
  Update Sandia worksations to gcc5.3.0
  Comment out invalid_args check
  Revert change in pio1 to point into pio2/cmake
  Fix single submit
  Add tests
  Do not override walltime unless test
  Update ChangeLog
  nag compiler needs a width
  Add checks to verify the create_newcase directory was created as expected
  removed unpack commented region
  fix typo in parse_args
  Bug fixes
  Minor pylint fix
  fix pylint issues
  rename function
  ...
jgfouca added a commit that referenced this pull request Mar 14, 2018
Update to cime5.3-alpha10

Pulling in another 2 weeks of CIME changes into ACME.
CIME hash 15297cd from May 2.

Required bug fixes for:

    Changing error to warning for using "-id" with single dash
    acme templates missing "arg ="
    Restore pio1 Cmake path pointing to pio2/cmake
    Update to gcc5.3.0 to avoid internal compiler error

Still had 2 fails of acme_developer on penn machine, but look to be
weird gcc issues -- var tracking overflow. Same tests passed on
redsky, so expecting it will pass on ACME machines.

[BFB]

* agsalin/update-to-cime5.3.1: (219 commits)
  Add fix to bug in docn that was in this cime version
  Fix upstream merge resolution mistake
  Bug fix: Handle failures to get mem usage from baselines
  Make single-dash before multichar arg a warning
  Fix acme template change for new parser
  Update Sandia worksations to gcc5.3.0
  Comment out invalid_args check
  Revert change in pio1 to point into pio2/cmake
  Fix single submit
  Add tests
  Do not override walltime unless test
  Update ChangeLog
  nag compiler needs a width
  Add checks to verify the create_newcase directory was created as expected
  removed unpack commented region
  fix typo in parse_args
  Bug fixes
  Minor pylint fix
  fix pylint issues
  rename function
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment