-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jgfouca/branch for acme split 2019 05 28 #3123
Conversation
This commit introduces infrastructure changes needed for adding coupling between the ocean and ice shelves in the GLC component. It is a squash of work done primarily by Jeremy Fyke between 2015 and 2017, with updates by Jon Wolfe to make it compatible with changes to E3SM since then. It is BFB for existing compsets. This commit introduces the CIME changes, but does not touch the components. [BFB] - Bit-For-Bit
A GG case now runs to completion and appears to be calculating fluxes.
This ensures there are 0-values where floating ice is not present and removes NaN that were present previously.
A previous commit was using ocean state fields on the ocean grid, presumably as a holdover from when this melt calculation was done on the ocean grid. This commit fixes this by switching to the ocean state fields remapped to the GLC grid.
This commit makes minor adjustments to the method for solving the ocean-iceshelf interface temperature, following changes that had been implemented in standalone MPAS-O in the last few years. This makes finding the root slightly more stable.
This commit makes a slight modification of the calculation of the ocean freezing temperature beneath ice shelf cavities to account for dependence on salinity. This adjustment matches what is currently done in MPAS-Ocean. Also update parameter values in shr_const_mod to match exactly what is in MPAS-Ocean. Descriptions of those parameters are also added.
The melt flux calculation was using an incorrect variable for the ice shelf pressure.
This forces the attribute vectors to be updated by the time that prep_glc_calculate_subshelf_boundary_fluxes() is called. Without this, Sg_tbot and Sg_dztbot were still 0 when that routine was executed, leading to a divide by zero. There may be a subtler way to force the update, but this has the added benefit of allowing these fields to be visualized on the ocean grid in the coupler history file for debugging purposes.
Adds commented note in prep_glc_calculate_subshelf_boundary_fluxes explaining the change made in 4092b4e Also, comment out the rhoeff index for now - it is not currently used, but will be implemented later.
Now both mass fluxes are positive for an addition of mass to the component in question.
The call to cime_run_glc_setup_send was inadvertently deleted in commit b24d24f This commit reconciles changes in this branch with the contents of cime_run_glc_setup_send. Specifically, cime_run_glc_setup_send is modified to separate steps needed for coupling from LND and OCN. This separation is also applied to prep_glc_accum_avg in prep_glc_mod.F90. Finally, prep_glc_mrg and the routine it calls have been renamed to indicate they are specific to forcing coming from LND.
This commits moves remaining details of the new OCN-GLC coupling from the main cime_run() routine to subroutines. A routine cime_run_ocnglc_coupling() does the actual calculations and is called from cime_run_ocn_recv_post() to ensure it gets runs after each time OCN completes. This commit also creates separate prep_glc_accum_*() routines for fields coming from LND and OCN so they can be called independently. They are called from the respective cime_run_*_recv_post() routines for hte component from which the fields are coming. prep_glc_accum_avg() is still combined for fields coming from the two components because this step happens in the same place for fields coming from either LND or OCN.
The call from prep_ocn_mod.F90 is not needed.
This capability will be disabled for CESM. This commit also corrects usage of glcshelf_c2_ocn instead of glc_c2_ocn in a couple places.
Whitespace, indentation, removal of unneeded comments, keyword arguments for long argument lists.
In prep_glc_mod, there were two places where a list of coupling fields from OCN were hardcoded in calls to seq_map_map. This commit replaces those hard-coded lists with a variable that is generated in seq_flds_mod from the definitions of the coupling fields that are required.
These new coupling fields are disabled for CESM.
The OCN-GLC time averaging accumulators should only be read/written to coupler restart files if OCN-GLC are actually coupled. Otherwise there will be an attempt to read/write unallocated things. This required passing the ocn_c2_glc flag to the i/o routines, because that information is not in infodata.
Currently a timing checkpoint is saved at the end of the first simulation day, capturing both initialization cost and possibily anomalous first simulation day performance. However, when evaluating performance of initialization for high resolution cases, this may not be sufficiently fine grain, especially if the evaluation job runs out of time before completing the first simulation day. Here the env_run.xml TPROF_IN_INIT variable is introduced that enables saving timing checkpoints at two locations within the initialization. (This adds a new namelist variable, to seq_infodata_inparm in drv_in.) TPROF_IN_INIT is FALSE by default, in which case no additional checkpoint data is output. However, existing timer names associated with the writing out of performance data are modified slightly, to better attribute time associated with the new performance data writes. [BFB] - Bit-For-Bit [NML] - Namelist Changing
The addition of a namelist variable and env_run.xml support for enabling optional checkpointing of performance data during intialization in the previous commit is backed out here. Instead a new checkpoint timing file is now output right before the beginning of the run loop, with a name consistent with the other checkpoint timing files, i.e. including the simulation timestamp in the suffix. This is not optional. The internal reworking of the checkpoint write logic and tweaking of timer names in cime_comp_mod.F90 is preserved, and missing documentation for the TPROF_TOTAL env_run.xml variable is also added to config_component_e3sm.xml. [BFB] - Bit-For-Bit
…heta-l. Update E3SM namelist files with COMPOSE options. Make the theta-l E3SM target build with COMPOSE.
scripts/lib/CIME/build.py
Outdated
@@ -233,7 +233,7 @@ def _build_libraries(case, exeroot, sharedpath, caseroot, cimeroot, libroot, lid | |||
if mpilib == "mpi-serial": | |||
libs.insert(0, mpilib) | |||
|
|||
if cam_target == "preqx_kokkos": | |||
if cam_target in ("preqx_kokkos", "theta-l"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we combine this and line 499 so that it only appears in one place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, will do
call t_adj_detailf(+1) | ||
|
||
call t_startf("sync1_tprf") | ||
call mpi_barrier(ckpt_mpicom,ierr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code introduces an mpi_barrier to the cime_run loop - it could have serious performance implications. In other cases timing barriers are inside if (barrier_alarm)
code blocks. It seems like this should be the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matthewhoffman was that barrier needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rljacob , I'm not familiar with this code - it is not part of the PR from me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This same logic existed in-line previously. I just encapsulated it in a subroutine to cut down on duplicate code. It is called in one new location, just before the start of the run loop (but outside), immediately preceding the existing RUN_LOOP_BSTART barrier (which is annoying to have both, but seemed preferable to the other options I considered).
In summary, it is executed immediately after initialization (so once - a new feature, and the point of the PR), after the end of the first day (also once, has been this way for a long time), and whenever tprof_alarm is true (also for a long time). The barriers also occur during cime_final.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There has always been the possibility that components would write out their own checkpoints (or even do their timing independently). We have never supported that, but the design has always been in the back of my head, thus the use of a formal parameter and not hardwiring mpicom_GLOID. This is a subroutine, and the communicator is clearly a passed parameter. I don't find this confusing :-). You just have to check the call sites? So... my opinion is to leave it as is. However, do what you feel that you need to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'd strongly prefer that this remain a formal parameter. There are a lot of comunicators floating around in cime_comp_mod.F90. It seems a strange design to hardwire GLOID in a subroutine that is otherwise pretty generic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see why we might want to change it back to GLOID. If we want to search the code for everywhere a barrier is called on GLOID, it would be missed. Its not obvious from the name of the function that a barrier is going to called inside it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t_drvstartf, t_drvstopf also has the (optional) parameter to specify communicator to be used in a barrier, though it is t_barrierf and so not normally enabled. It also never uses GLOID (currently), but rather CPLID and omp(eci)%mpicom_compid, and mpicom_barrier (as called from component_exch, so even more once removed).
seq_domain_check_grid also has a barrier (and an allreduce) using an mpicom formal parameter.
Could make this an optional parameter that defaults to GLOID if the parameter is not specified, so grep would pick it up? But, aince all of the calls (all two of them) are local to the file cime_comp_mod.F90 at the moment, would not be hard to hunt down.
In general, if there is a need to identify all barrier (and allreduce?) calls using GLOID, I wouldn't depend on grep without further investigation of all barrier and allreduce calls, unless we eliminate all communicator parameters. So, my preference is the same, to leave as it, but will make changes if people request it (or you can make the changes yourself).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Telecon: we are fine with this as is.
@@ -2399,6 +2410,155 @@ subroutine seq_flds_set(nmlfile, ID, infodata) | |||
call set_glc_elevclass_field(name, attname, longname, stdname, units, x2l_fluxes_from_glc, & | |||
additional_list = .true.) | |||
|
|||
if (trim(cime_model) == 'e3sm') then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really the appropriate test here or should it be something about active glc model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jedwards4b , originally @billsacks had wanted much of the OCN-GLC functionality enabled only in E3SM. In the end we settled on an implementation that avoids needing to know which model is being used and instead functionality is enabled by compset definition via the OCN init routine. As such, this test could potentially be removed if you and Bill prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I'm sort of back from vacation today, but still taking most of the day off, so may not respond to further discussion until tomorrow.)
As @matthewhoffman says, I requested this change after consultation with @mvertens . We wanted to ensure that these new fields don't impact CESM performance. I suspect that the new flag that @matthewhoffman is referring to isn't available at the time of seq_flds_set
, so I don't think it will work to use that logical here (but I may be wrong).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@billsacks is correct that the needed new flags are not available when seq_flds_set
is called. seq_flds_set
is called from cime_pre_init2
, which occurs before cime_init
which is the routine from which the new flags are set (in ocn_init
). So I think the current implementation here is the best we can do to prevent these new coupler fields from affecting CESM without significant restructuring.
@billsacks is on vacation this week, I've asked @Katetc to review the glc changes. |
@Katetc FYI, Bill looked carefully at the glc-ocean coupling code in its E3SM PR here: E3SM-Project/E3SM#2726 |
@Katetc There is no need for you to review these code changes unless you want to, since (as @rljacob says) they have already undergone a pretty careful review by myself and others. However, before this comes to master, can you please run testing to ensure that CESM tests pass and are bit-for-bit. I think we discussed this before, but I forget what testing we came up with. I'm now thinking aux_glc and prealpha; prebeta is probably unnecessary (unless we said differently before). |
@billsacks , @jedwards4b , did anyone kick off tests? I'm getting some pressure to do a CIME merge to E3SM soon and it can't happen until this PR is merged. |
@fischer-ncar can you please kick off tests with this PR merged to master? |
I'm getting the following error for all of my tests. ERROR: Namelist default for variable glc2ocn_fmapname refers to unknown XML variable GLC2OCN_FMAPNAME.' |
These are needed for some new drv namelist variables. When they were just in config_component_e3sm.xml, CESM tests failed with a message like this: ERROR: Namelist default for variable glc2ocn_fmapname refers to unknown XML variable GLC2OCN_FMAPNAME.'
@jgfouca @Katetc - The plan I worked out with @fischer-ncar is: I am doing our I think I have fixed the problem @fischer-ncar ran into. I have pushed this and some other minor changes (fixing whitespace / indentation) to a branch on my fork (https://github.com/billsacks/cime/tree/jgfouca/branch-for-acme-split-2019-05-28). Once I get tests passing there, I'll hand off these changes. @jgfouca how do you prefer that I hand these changes back to you? Should I simply merge these changes back to your branch here, or do you prefer something different? |
@billsacks , yes, just merge them into this PR. |
Okay, I have pushed two commits. The first just cleans up whitespace. The second fixes the problem @fischer-ncar ran into, by moving a bunch of xml variables from the e3sm-specific file to the shared file. This now passes all |
The prealpha testing found ERROR: Need to provide valid mapping file between glc and ocn in xml variable glc2ocn_fmapname For B, F, and X compsets. |
These mapping files are only needed in certain configurations. We need to use idmap_ignore so that the driver's buildnml won't abort for configurations where these mapping files aren't provided and aren't needed.
@fischer-ncar - I just pushed a fix for your issue |
@billsacks , @fischer-ncar , how are we looking? |
@jgfouca Chris's testing is still underway. Cheyenne (the machine, not the whole town, AFAIK) just went down due to weather problems, so it may be a while before full test results are in. But it looks like we're past the major issues at this point. @fischer-ncar and I agree that, if you're wanting to get this to master, you can go ahead and do so, and we'll follow up with additional issues/PRs if Chris runs into any more problems in his testing. |
@billsacks , thanks. |
Change List:
Test suite: scripts_regression_tests
Test baseline:
Test namelist changes:
Test status: BFB
Fixes [CIME Github issue #]
User interface changes?:
Update gh-pages html (Y/N)?:
Code review: @jedwards4b