Jgfouca/branch for acme split 2019 05 28 #3123

jgfouca · 2019-05-28T16:49:42Z

Change List:

Add first stage of ocean/ice shelf coupling
Bring COMPOSE semi-Lagrangian tracer transport code into HOMME and E3SM (required minor changes to build.py)
Add checkpoint of timing data immediately after initialization
bless_test_results: Be able to handle build-only tests

Test suite: scripts_regression_tests
Test baseline:
Test namelist changes:
Test status: BFB

Fixes [CIME Github issue #]

User interface changes?:

Update gh-pages html (Y/N)?:

This commit introduces infrastructure changes needed for adding coupling between the ocean and ice shelves in the GLC component. It is a squash of work done primarily by Jeremy Fyke between 2015 and 2017, with updates by Jon Wolfe to make it compatible with changes to E3SM since then. It is BFB for existing compsets. This commit introduces the CIME changes, but does not touch the components. [BFB] - Bit-For-Bit

… compset

A GG case now runs to completion and appears to be calculating fluxes.

This ensures there are 0-values where floating ice is not present and removes NaN that were present previously.

A previous commit was using ocean state fields on the ocean grid, presumably as a holdover from when this melt calculation was done on the ocean grid. This commit fixes this by switching to the ocean state fields remapped to the GLC grid.

This commit makes minor adjustments to the method for solving the ocean-iceshelf interface temperature, following changes that had been implemented in standalone MPAS-O in the last few years. This makes finding the root slightly more stable.

This commit makes a slight modification of the calculation of the ocean freezing temperature beneath ice shelf cavities to account for dependence on salinity. This adjustment matches what is currently done in MPAS-Ocean. Also update parameter values in shr_const_mod to match exactly what is in MPAS-Ocean. Descriptions of those parameters are also added.

The melt flux calculation was using an incorrect variable for the ice shelf pressure.

This forces the attribute vectors to be updated by the time that prep_glc_calculate_subshelf_boundary_fluxes() is called. Without this, Sg_tbot and Sg_dztbot were still 0 when that routine was executed, leading to a divide by zero. There may be a subtler way to force the update, but this has the added benefit of allowing these fields to be visualized on the ocean grid in the coupler history file for debugging purposes.

Adds commented note in prep_glc_calculate_subshelf_boundary_fluxes explaining the change made in 4092b4e Also, comment out the rhoeff index for now - it is not currently used, but will be implemented later.

Now both mass fluxes are positive for an addition of mass to the component in question.

The call to cime_run_glc_setup_send was inadvertently deleted in commit b24d24f This commit reconciles changes in this branch with the contents of cime_run_glc_setup_send. Specifically, cime_run_glc_setup_send is modified to separate steps needed for coupling from LND and OCN. This separation is also applied to prep_glc_accum_avg in prep_glc_mod.F90. Finally, prep_glc_mrg and the routine it calls have been renamed to indicate they are specific to forcing coming from LND.

This commits moves remaining details of the new OCN-GLC coupling from the main cime_run() routine to subroutines. A routine cime_run_ocnglc_coupling() does the actual calculations and is called from cime_run_ocn_recv_post() to ensure it gets runs after each time OCN completes. This commit also creates separate prep_glc_accum_*() routines for fields coming from LND and OCN so they can be called independently. They are called from the respective cime_run_*_recv_post() routines for hte component from which the fields are coming. prep_glc_accum_avg() is still combined for fields coming from the two components because this step happens in the same place for fields coming from either LND or OCN.

The call from prep_ocn_mod.F90 is not needed.

This capability will be disabled for CESM. This commit also corrects usage of glcshelf_c2_ocn instead of glc_c2_ocn in a couple places.

Whitespace, indentation, removal of unneeded comments, keyword arguments for long argument lists.

In prep_glc_mod, there were two places where a list of coupling fields from OCN were hardcoded in calls to seq_map_map. This commit replaces those hard-coded lists with a variable that is generated in seq_flds_mod from the definitions of the coupling fields that are required.

These new coupling fields are disabled for CESM.

The OCN-GLC time averaging accumulators should only be read/written to coupler restart files if OCN-GLC are actually coupled. Otherwise there will be an attempt to read/write unallocated things. This required passing the ocn_c2_glc flag to the i/o routines, because that information is not in infodata.

Currently a timing checkpoint is saved at the end of the first simulation day, capturing both initialization cost and possibily anomalous first simulation day performance. However, when evaluating performance of initialization for high resolution cases, this may not be sufficiently fine grain, especially if the evaluation job runs out of time before completing the first simulation day. Here the env_run.xml TPROF_IN_INIT variable is introduced that enables saving timing checkpoints at two locations within the initialization. (This adds a new namelist variable, to seq_infodata_inparm in drv_in.) TPROF_IN_INIT is FALSE by default, in which case no additional checkpoint data is output. However, existing timer names associated with the writing out of performance data are modified slightly, to better attribute time associated with the new performance data writes. [BFB] - Bit-For-Bit [NML] - Namelist Changing

The addition of a namelist variable and env_run.xml support for enabling optional checkpointing of performance data during intialization in the previous commit is backed out here. Instead a new checkpoint timing file is now output right before the beginning of the run loop, with a name consistent with the other checkpoint timing files, i.e. including the simulation timestamp in the suffix. This is not optional. The internal reworking of the checkpoint write logic and tweaking of timer names in cime_comp_mod.F90 is preserved, and missing documentation for the TPROF_TOTAL env_run.xml variable is also added to config_component_e3sm.xml. [BFB] - Bit-For-Bit

…heta-l. Update E3SM namelist files with COMPOSE options. Make the theta-l E3SM target build with COMPOSE.

jedwards4b · 2019-05-28T17:25:33Z

scripts/lib/CIME/build.py

@@ -233,7 +233,7 @@ def _build_libraries(case, exeroot, sharedpath, caseroot, cimeroot, libroot, lid
    if mpilib == "mpi-serial":
        libs.insert(0, mpilib)

-    if cam_target == "preqx_kokkos":
+    if cam_target in ("preqx_kokkos", "theta-l"):


Can we combine this and line 499 so that it only appears in one place?

Good idea, will do

jedwards4b · 2019-05-28T17:35:38Z

src/drivers/mct/main/cime_comp_mod.F90

+    call t_adj_detailf(+1)
+
+    call t_startf("sync1_tprf")
+    call mpi_barrier(ckpt_mpicom,ierr)


This code introduces an mpi_barrier to the cime_run loop - it could have serious performance implications. In other cases timing barriers are inside if (barrier_alarm) code blocks. It seems like this should be the same.

@matthewhoffman was that barrier needed?

@rljacob , I'm not familiar with this code - it is not part of the PR from me.

@worleyph made this change I think (7b1e744 in E3SM)

This same logic existed in-line previously. I just encapsulated it in a subroutine to cut down on duplicate code. It is called in one new location, just before the start of the run loop (but outside), immediately preceding the existing RUN_LOOP_BSTART barrier (which is annoying to have both, but seemed preferable to the other options I considered).
In summary, it is executed immediately after initialization (so once - a new feature, and the point of the PR), after the end of the first day (also once, has been this way for a long time), and whenever tprof_alarm is true (also for a long time). The barriers also occur during cime_final.

There has always been the possibility that components would write out their own checkpoints (or even do their timing independently). We have never supported that, but the design has always been in the back of my head, thus the use of a formal parameter and not hardwiring mpicom_GLOID. This is a subroutine, and the communicator is clearly a passed parameter. I don't find this confusing :-). You just have to check the call sites? So... my opinion is to leave it as is. However, do what you feel that you need to do.

Actually, I'd strongly prefer that this remain a formal parameter. There are a lot of comunicators floating around in cime_comp_mod.F90. It seems a strange design to hardwire GLOID in a subroutine that is otherwise pretty generic.

I see why we might want to change it back to GLOID. If we want to search the code for everywhere a barrier is called on GLOID, it would be missed. Its not obvious from the name of the function that a barrier is going to called inside it.

t_drvstartf, t_drvstopf also has the (optional) parameter to specify communicator to be used in a barrier, though it is t_barrierf and so not normally enabled. It also never uses GLOID (currently), but rather CPLID and omp(eci)%mpicom_compid, and mpicom_barrier (as called from component_exch, so even more once removed).

seq_domain_check_grid also has a barrier (and an allreduce) using an mpicom formal parameter.

Could make this an optional parameter that defaults to GLOID if the parameter is not specified, so grep would pick it up? But, aince all of the calls (all two of them) are local to the file cime_comp_mod.F90 at the moment, would not be hard to hunt down.

In general, if there is a need to identify all barrier (and allreduce?) calls using GLOID, I wouldn't depend on grep without further investigation of all barrier and allreduce calls, unless we eliminate all communicator parameters. So, my preference is the same, to leave as it, but will make changes if people request it (or you can make the changes yourself).

Telecon: we are fine with this as is.

jedwards4b · 2019-05-28T17:38:06Z

src/drivers/mct/shr/seq_flds_mod.F90

@@ -2399,6 +2410,155 @@ subroutine seq_flds_set(nmlfile, ID, infodata)
    call set_glc_elevclass_field(name, attname, longname, stdname, units, x2l_fluxes_from_glc, &
         additional_list = .true.)

+    if (trim(cime_model) == 'e3sm') then


Is this really the appropriate test here or should it be something about active glc model?

@jedwards4b , originally @billsacks had wanted much of the OCN-GLC functionality enabled only in E3SM. In the end we settled on an implementation that avoids needing to know which model is being used and instead functionality is enabled by compset definition via the OCN init routine. As such, this test could potentially be removed if you and Bill prefer.

(I'm sort of back from vacation today, but still taking most of the day off, so may not respond to further discussion until tomorrow.)

As @matthewhoffman says, I requested this change after consultation with @mvertens . We wanted to ensure that these new fields don't impact CESM performance. I suspect that the new flag that @matthewhoffman is referring to isn't available at the time of seq_flds_set, so I don't think it will work to use that logical here (but I may be wrong).

@billsacks is correct that the needed new flags are not available when seq_flds_set is called. seq_flds_set is called from cime_pre_init2, which occurs before cime_init which is the routine from which the new flags are set (in ocn_init). So I think the current implementation here is the best we can do to prevent these new coupler fields from affecting CESM without significant restructuring.

jedwards4b · 2019-05-28T17:40:11Z

@billsacks is on vacation this week, I've asked @Katetc to review the glc changes.

rljacob · 2019-05-28T18:59:02Z

@Katetc FYI, Bill looked carefully at the glc-ocean coupling code in its E3SM PR here: E3SM-Project/E3SM#2726

billsacks · 2019-05-30T13:23:12Z

@Katetc There is no need for you to review these code changes unless you want to, since (as @rljacob says) they have already undergone a pretty careful review by myself and others. However, before this comes to master, can you please run testing to ensure that CESM tests pass and are bit-for-bit. I think we discussed this before, but I forget what testing we came up with. I'm now thinking aux_glc and prealpha; prebeta is probably unnecessary (unless we said differently before).

jgfouca · 2019-05-30T16:38:06Z

@billsacks , @jedwards4b , did anyone kick off tests? I'm getting some pressure to do a CIME merge to E3SM soon and it can't happen until this PR is merged.

jedwards4b · 2019-05-30T18:07:00Z

@fischer-ncar can you please kick off tests with this PR merged to master?

fischer-ncar · 2019-05-30T21:38:46Z

I'm getting the following error for all of my tests.

ERROR: Namelist default for variable glc2ocn_fmapname refers to unknown XML variable GLC2OCN_FMAPNAME.'

These are needed for some new drv namelist variables. When they were just in config_component_e3sm.xml, CESM tests failed with a message like this: ERROR: Namelist default for variable glc2ocn_fmapname refers to unknown XML variable GLC2OCN_FMAPNAME.'

billsacks · 2019-05-31T17:22:02Z

@jgfouca @Katetc - The plan I worked out with @fischer-ncar is: I am doing our aux_glc testing today. Once I get that all passing, @fischer-ncar will run CESM prealpha tests.

I think I have fixed the problem @fischer-ncar ran into. I have pushed this and some other minor changes (fixing whitespace / indentation) to a branch on my fork (https://github.com/billsacks/cime/tree/jgfouca/branch-for-acme-split-2019-05-28). Once I get tests passing there, I'll hand off these changes.

@jgfouca how do you prefer that I hand these changes back to you? Should I simply merge these changes back to your branch here, or do you prefer something different?

jgfouca · 2019-05-31T17:30:36Z

@billsacks , yes, just merge them into this PR.

billsacks · 2019-05-31T18:52:58Z

Okay, I have pushed two commits. The first just cleans up whitespace. The second fixes the problem @fischer-ncar ran into, by moving a bunch of xml variables from the e3sm-specific file to the shared file.

This now passes all aux_glc tests on cheyenne, and results are bit-for-bit. (Also, performance doesn't seem to be affected significantly from a few spot-checks.) This test suite covers I and T compsets. @fischer-ncar can you go ahead and retry prealpha tests with this branch?

fischer-ncar · 2019-05-31T21:35:26Z

The prealpha testing found

ERROR: Need to provide valid mapping file between glc and ocn in xml variable glc2ocn_fmapname

For B, F, and X compsets.

These mapping files are only needed in certain configurations. We need to use idmap_ignore so that the driver's buildnml won't abort for configurations where these mapping files aren't provided and aren't needed.

billsacks · 2019-05-31T23:04:43Z

@fischer-ncar - I just pushed a fix for your issue

jgfouca · 2019-06-03T22:00:37Z

@billsacks , @fischer-ncar , how are we looking?

billsacks · 2019-06-03T22:11:24Z

@jgfouca Chris's testing is still underway. Cheyenne (the machine, not the whole town, AFAIK) just went down due to weather problems, so it may be a while before full test results are in. But it looks like we're past the major issues at this point. @fischer-ncar and I agree that, if you're wanting to get this to master, you can go ahead and do so, and we'll follow up with additional issues/PRs if Chris runs into any more problems in his testing.

jgfouca · 2019-06-03T22:19:59Z

@billsacks , thanks.

Jeremy Fyke and others added 30 commits February 6, 2019 09:27

Updates to get the LISIO test to run.

a206830

Cleanup to get LISIO test to run (plus reduce output)

f0ce5ec

Updates for T62_oEC60to30v3wLI_ais20 grid and GMPAS-MALI-DIB-IAF-ISMF…

52a5556

… compset

Fix value check in BL flux computation

3f4ba10

A GG case now runs to completion and appears to be calculating fluxes.

Add floating mask to ice-shelf melt calculation

e2c0949

This ensures there are 0-values where floating ice is not present and removes NaN that were present previously.

CIME: Calculate T0 before it's used

86baffb

Correct index to pressure variable

b728995

The melt flux calculation was using an incorrect variable for the ice shelf pressure.

Update comments in prep_glc_mod

978a6ee

Adds commented note in prep_glc_calculate_subshelf_boundary_fluxes explaining the change made in 4092b4e Also, comment out the rhoeff index for now - it is not currently used, but will be implemented later.

Change sign of freshwater flux to ice sheet

b451905

Now both mass fluxes are positive for an addition of mass to the component in question.

Remove unused variable for g2x_gx

4a70eaa

Add loop over GLC ensembles for subshelf melt calculation

b39f9ad

Include missing integer assignment

a584b20

Improve logic in cime_run_ocnglc_coupling

c0c8103

Improve logic in cime_run_glc_setup_send

33d4c1f

Remove duplicate call to prep_glc_calculate_subshelf_boundary_fluxes

9525cc6

The call from prep_ocn_mod.F90 is not needed.

Only enable iceshelf fluxes in E3SM

0f5c740

This capability will be disabled for CESM. This commit also corrects usage of glcshelf_c2_ocn instead of glc_c2_ocn in a couple places.

Cleanup in prep_glc_mod.F90

155bb07

Whitespace, indentation, removal of unneeded comments, keyword arguments for long argument lists.

Make new OCN/GLC cpl fields only in E3SM

aea6529

These new coupling fields are disabled for CESM.

COMPOSE: add COMPOSE options to namelist_*.xml; use COMPOSE in E3SM t…

6d92d78

…heta-l. Update E3SM namelist files with COMPOSE options. Make the theta-l E3SM target build with COMPOSE.

rljacob requested a review from billsacks May 28, 2019 17:01

jedwards4b requested changes May 28, 2019

View reviewed changes

jedwards4b requested review from Katetc and removed request for billsacks May 28, 2019 17:39

jgfouca added 2 commits May 28, 2019 13:05

Import some fixes

93e1b5a

Minor refactor of kokkos stuff in build.py

51de77f

jedwards4b approved these changes May 30, 2019

View reviewed changes

billsacks added 2 commits May 31, 2019 10:57

Cleanup whitespace / indentation

fb19ae5

Use idmap_ignore for optional mapping files

e218c9b

These mapping files are only needed in certain configurations. We need to use idmap_ignore so that the driver's buildnml won't abort for configurations where these mapping files aren't provided and aren't needed.

billsacks requested review from billsacks and removed request for Katetc June 3, 2019 22:10

billsacks approved these changes Jun 3, 2019

View reviewed changes

jgfouca merged commit 5c43a15 into master Jun 3, 2019

jgfouca deleted the jgfouca/branch-for-acme-split-2019-05-28 branch June 3, 2019 22:20

billsacks mentioned this pull request Jun 4, 2019

Ice shelf coupling needs flag to turn off glcshelf_c2_ice #3130

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jgfouca/branch for acme split 2019 05 28 #3123

Jgfouca/branch for acme split 2019 05 28 #3123

jgfouca commented May 28, 2019

jedwards4b May 28, 2019

jgfouca May 28, 2019

jedwards4b May 28, 2019

rljacob May 28, 2019

matthewhoffman May 28, 2019

jgfouca May 28, 2019

worleyph May 28, 2019 •

edited

Loading

worleyph May 28, 2019 •

edited

Loading

worleyph May 28, 2019

rljacob May 28, 2019

worleyph May 28, 2019

jgfouca May 29, 2019

jedwards4b May 28, 2019

matthewhoffman May 28, 2019

billsacks May 30, 2019

matthewhoffman May 30, 2019

jedwards4b commented May 28, 2019

rljacob commented May 28, 2019

billsacks commented May 30, 2019

jgfouca commented May 30, 2019

jedwards4b commented May 30, 2019

fischer-ncar commented May 30, 2019

billsacks commented May 31, 2019

jgfouca commented May 31, 2019

billsacks commented May 31, 2019

fischer-ncar commented May 31, 2019

billsacks commented May 31, 2019

jgfouca commented Jun 3, 2019

billsacks commented Jun 3, 2019

jgfouca commented Jun 3, 2019

Jgfouca/branch for acme split 2019 05 28 #3123

Jgfouca/branch for acme split 2019 05 28 #3123

Conversation

jgfouca commented May 28, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

worleyph May 28, 2019 • edited Loading

Choose a reason for hiding this comment

worleyph May 28, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jedwards4b commented May 28, 2019

rljacob commented May 28, 2019

billsacks commented May 30, 2019

jgfouca commented May 30, 2019

jedwards4b commented May 30, 2019

fischer-ncar commented May 30, 2019

billsacks commented May 31, 2019

jgfouca commented May 31, 2019

billsacks commented May 31, 2019

fischer-ncar commented May 31, 2019

billsacks commented May 31, 2019

jgfouca commented Jun 3, 2019

billsacks commented Jun 3, 2019

jgfouca commented Jun 3, 2019

worleyph May 28, 2019 •

edited

Loading

worleyph May 28, 2019 •

edited

Loading