Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor nems field exchange; set default masks for mapping in med_internalstate #279

Merged
merged 45 commits into from
Apr 22, 2022

Conversation

DeniseWorthen
Copy link
Collaborator

@DeniseWorthen DeniseWorthen commented Apr 4, 2022

Description of changes

Refactors esmFldsExchange_nems.F90 to use separate advertise and initialize phases and to check that a component is present before advertising a field to or from that component. Implements default src and dst mask values in place of the code currently in med_map_mod.F90

Specific notes

Are changes expected to change answers? (specify if bfb, different at roundoff, more substantial)

No

Any User Interface Changes (namelist or namelist defaults changes)?

No

Testing performed

Testing performed if application target is CESM:

  • [x ] (other) please described in detail
    • machines and compilers cheyenne - prealpha tests with nuopc driver only
    • details (e.g. failed tests):

Testing performed if application target is UFS-coupled:

  • (recommended) UFS-coupled testing
    • description: Tested at UWM hash e3b19c11 using this PR branch for CMEPS
    • details (e.g. failed tests): all coupled, hafs and ng-godas tests pass for both Intel and GNU

Testing performed if application target is UFS-HAFS:

  • (recommended) UFS-HAFS testing
    • description:
    • details (e.g. failed tests):

Hashes used for testing:

DeniseWorthen and others added 30 commits September 7, 2021 17:46
* add local flag to control whether to write the dststatus file
for a particular RH. This prevents writing a dststatus file for
consf_aofrac which contains garbage (since the RH is a copy, and
dststatus field is not set) or a dststatus file for mapcopy
* sending only cpl_scalars back in export gave mediator error
ESMF_GeomBaseGet Value unrecognized or out of range
med_methods_mod.F90:424
* updates CMEPS with latests changes from ESCOMP, including

- xgrid capablility (cesm only)
- refactored accumulation field bundles 
- cleaned up med.F90 for mesh creation 
- refactored mediatory history functionality
* this is running but not sure of right setting here
* moved files into ufs
* implemented different interface for ufs flux_atmocn_mod
* add compile fixes for ufs
* removed reference to util and replaced with ufs
* clean up of Makefile
* removed references to use of med_kind_mod and introduced ufs_kind_mod
* removed unneeded files needed by ufs
* Optional tiled history files for ATM (ESCOMP#257)
* mapping is done in prep-atm; this mapping is unneeded
tidy up; changes for hafs tests to work. switch dst/src masking back for hafs mode.
This is not required if wmesmf can read correct values (water=1,land=0) from config
file
revert extraneous changes
call FldsExchange_nems twice, once for advertise
and once for initialize
@DeniseWorthen
Copy link
Collaborator Author

@uturuncoglu I ran the UFS HAFS regression tests and they all passed. Also, I will make a PR back to UWM with these changes. That will also bring back any other update that has been made since I last updated the EMC fork (Feb 3).

@uturuncoglu
Copy link
Collaborator

@DeniseWorthen that is great. Do I need to create PR to authoritative repo or NOA-EMC fork? I could not be sure. Of course, it would be draft, at this point.

@DeniseWorthen
Copy link
Collaborator Author

@uturuncoglu I'm not 100% sure where in CMEPS your xgrid work touches. Do you need changes in FldsExchange_nems? I'm happy to work w/ you at getting those put in now (even if not functional) if that saves effort.

@uturuncoglu
Copy link
Collaborator

Yes, there are some mods in FldsExchange_nems since I introduced two new coupling model. I have also some change in flux computation part under ufs/ directory.

@mvertens
Copy link
Collaborator

mvertens commented Apr 5, 2022

@uturuncoglu @denise - this must be tested with cesm as well. We need to fire off the prealpha tests using cesm2_3_beta08 as a baseline. @uturuncoglu - are you willing to do this? If not I can take this on.

@uturuncoglu
Copy link
Collaborator

@mvertens sure. I could run it and let you know.

@mvertens
Copy link
Collaborator

mvertens commented Apr 5, 2022

Thank you! You will need to merge in the latest cmeps master into this PR to have this working - but that should be part of the testing.
Does that make sense?

@uturuncoglu
Copy link
Collaborator

@mvertens i checkout CMEPS master and merge with @DeniseWorthen branch. So, it would be fine at this point.

@mvertens
Copy link
Collaborator

mvertens commented Apr 5, 2022

@uturuncoglu - that sounds great. Thank you.

@uturuncoglu
Copy link
Collaborator

@mvertens it will take longer than I thought. I have an issue with my disk quota since I am keeping all 35-days long runs for exchange grid work. I'll try to solve them first and start tests again.

@mvertens
Copy link
Collaborator

mvertens commented Apr 5, 2022

@uturuncoglu - no problem. Thank you so much for doing this!!! Let me know if you want me to take this on if it gets too complicated on your end.

@DeniseWorthen
Copy link
Collaborator Author

DeniseWorthen commented Apr 5, 2022

@uturuncoglu There is nothing time-critical in this PR on my side, it is just work I started on as part of the Wave coupling and I thought I'd take the time to get it committed. If it is easier to proceed on the X-grid changes w/o the changes in this PR, that is fine too. We can always circle back to it.

@uturuncoglu
Copy link
Collaborator

@DeniseWorthen Thanks. No, that is fine for me. I think this must go first and then I could make required modifications for the exchange grid. I'll update all once I run CESM pre-alpha tests.

@uturuncoglu
Copy link
Collaborator

@mvertens @DeniseWorthen I run all tests and here is the list of failed test and their error logs (I run them separately with create_test again after running full test suite and cleaning my scratch since some of them was failing with disk quota and to be sure). At this point, I don't think the errors caused by the changes in this PR. So, it seems it is safe to marge this PR but let me know what do you think?

DAE_N2_D_Lh12_Vnuopc.f10_f10_mg37.I2000Clm50BgcCrop.cheyenne_intel.clm-DA_multidrv

2022-04-05 23:35:23: Test 'DAE_N2_D_Lh12_Vnuopc.f10_f10_mg37.I2000Clm50BgcCrop.cheyenne_intel.clm-DA_multidrv' failed in phase 'CREATE_NEWCASE' with exception 'ERROR: _N option not supported by nuopc driver, use _C instead'
  File "/glade/scratch/turuncu/CESM_pr_279/cime/scripts/Tools/../../scripts/lib/CIME/test_scheduler.py", line 1080, in _run_catch_exceptions
    return run(test)
  File "/glade/scratch/turuncu/CESM_pr_279/cime/scripts/Tools/../../scripts/lib/CIME/test_scheduler.py", line 669, in _create_newcase_phase
    expect(False, "_N option not supported by nuopc driver, use _C instead")
  File "/glade/scratch/turuncu/CESM_pr_279/cime/scripts/Tools/../../scripts/lib/CIME/utils.py", line 163, in expect
    raise exc_type(msg)

 ---------------------------------------------------

ERP_D_Ln9_Vnuopc.C48_C48_mg17.QPC6.cheyenne_intel.cam-outfrq9s

Building test for ERP in directory /glade/scratch/turuncu/ERP_D_Ln9_Vnuopc.C48_C48_mg17.QPC6.cheyenne_intel.cam-outfrq9s.20220405_233500_pe3gnl
/glade/scratch/turuncu/CESM_pr_279/components/cam/src/dynamics/fv3/atmos_cubed_sphere/tools/fv_mp_mod.F90(75): error #6580: Name in only-list does not exist or is not accessible.   [MPP_NODE]

ERROR: BUILD FAIL: cam.buildlib failed, cat /glade/scratch/turuncu/ERP_D_Ln9_Vnuopc.C48_C48_mg17.QPC6.cheyenne_intel.cam-outfrq9s.20220405_233500_pe3gnl/bld/atm.bldlog.220406-004859

The details build log is in /glade/scratch/turuncu/ERP_D_Ln9_Vnuopc.C48_C48_mg17.QPC6.cheyenne_intel.cam-outfrq9s.20220406_153639_s2y85c/bld/atm.bldlog.220406-153959.

ERP_D_Ln9_Vnuopc.f09_f09_mg17.FSD.cheyenne_intel.cam-outfrq9s_contrail

81:MPT ERROR: Rank 81(g:81) received signal SIGFPE(8).
81:     Process ID: 58605, Host: r6i6n12, Program: /glade/scratch/turuncu/ERP_D_Ln9_Vnuopc.f09_f09_mg17.FSD.cheyenne_intel.cam-outfrq9s_contrail.20220406_154930_sfdkz3/bld/cesm.exe
81:     MPT Version: HPE MPT 2.22  03/31/20 15:59:10
81:
81:MPT: --------stack traceback-------
81:OMP: Warning #190: Forking a process while a parallel region is active is potentially unsafe.
46:MPT ERROR: Rank 46(g:46) received signal SIGFPE(8).
46:     Process ID: 21616, Host: r13i2n20, Program: /glade/scratch/turuncu/ERP_D_Ln9_Vnuopc.f09_f09_mg17.FSD.cheyenne_intel.cam-outfrq9s_contrail.20220406_154930_sfdkz3/bld/cesm.exe
46:     MPT Version: HPE MPT 2.22  03/31/20 15:59:10

I also run this by activating ESMF PET log but there is no any error in there. So, this requires further investigation.

SMS_D_Ln9_Vnuopc.ne0CONUSne30x8_ne0CONUSne30x8_mt12.FCnudged.cheyenne_intel.cam-outfrq9s_refined_camchem

1908:MPT: #1  0x00002b37033d5306 in mpi_sgi_system (
1908:MPT: #2  MPI_SGI_stacktraceback (
1908:MPT:     header=header@entry=0x7fff74fe8c50 "MPT ERROR: Rank 1908(g:1908) received signal SIGFPE(8).\n\tProcess ID: 7399, Host: r7i4n30, Program: /glade/scratch/turuncu/SMS_D_Ln9_Vnuopc.ne0CONUSne30x8_ne0CONUSne30x8_mt12.FCnudged.cheyenne_intel.ca"...) at sig.c:340
1908:MPT: #3  0x00002b37033d54ff in first_arriver_handler (signo=signo@entry=8,
1908:MPT:     stack_trace_sem=stack_trace_sem@entry=0x2b3712d00080) at sig.c:489
1899:MPT: #4  0x00002ac3b632d793 in slave_sig_handler (signo=8, siginfo=<optimized out>,
1899:MPT:     extra=<optimized out>) at sig.c:565
1899:MPT: #5  <signal handler called>
1899:MPT: #6  0x00000000011d1e78 in physconst::get_hydrostatic_energy (i0=1, i1=16,
1899:MPT:     j0=1, j1=1, nlev=32, ntrac=200,
1899:MPT:     tracer=<error reading variable: value requires 819200 bytes, which is more than max-value-size>, pdel=..., cp_or_cv=..., u=..., v=..., t=..., vcoord=0,
1899:MPT:     ps=..., phis=..., z=...,
1899:MPT:     dycore_idx=<error reading variable: Cannot access memory at address 0x0>,
1899:MPT:     te=..., se=<error reading variable: Cannot access memory at address 0x0>,
1899:MPT:     ke=<error reading variable: Cannot access memory at address 0x0>,
1899:MPT:     wv=<error reading variable: Cannot access memory at address 0x0>, h2o=...,
1899:MPT:     liq=<error reading variable: Cannot access memory at address 0x0>, ice=...)
1899:MPT:     at /glade/scratch/turuncu/CESM_pr_279/components/cam/src/utils/physconst.F90:1244
1899:MPT: #7  0x0000000002c385d4 in check_energy::check_energy_timestep_init (state=...,
1899:MPT:     tend=..., pbuf=0x2ae3b7a49f80,
1899:MPT:     col_type=<error reading variable: Cannot access memory at address 0x0>)
1899:MPT:     at /glade/scratch/turuncu/CESM_pr_279/components/cam/src/physics/cam/check_energy.F90:254
1899:MPT: #8  0x0000000003242f04 in dp_coupling::derived_phys_dry (phys_state=...,
1899:MPT:     phys_tend=..., pbuf2d=0x2ae3b7a49f80)
1899:MPT:     at /glade/scratch/turuncu/CESM_pr_279/components/cam/src/dynamics/se/dp_coupling.F90:700
1899:MPT: #9  0x00000000031f77b2 in dp_coupling::d_p_coupling (phys_state=...,
1899:MPT:     phys_tend=..., pbuf2d=0x2ae3b7a49f80, dyn_out=...)
1899:MPT:     at /glade/scratch/turuncu/CESM_pr_279/components/cam/src/dynamics/se/dp_coupling.F90:289
1899:MPT: #10 0x0000000002483a52 in stepon::stepon_run1 (dtime_out=225, phys_state=...,
1899:MPT:     phys_tend=..., pbuf2d=0x2ae3b7a49f80, dyn_in=..., dyn_out=...)
1899:MPT:     at /glade/scratch/turuncu/CESM_pr_279/components/cam/src/dynamics/se/stepon.F90:110
1899:MPT: #11 0x0000000000a209eb in cam_comp::cam_run1 (
1899:MPT:     cam_in=<error reading variable: value requires 147400 bytes, which is more than max-value-size>,
1899:MPT:     cam_out=<error reading variable: value requires 151800 bytes, which is more than max-value-size>)
1899:MPT:     at /glade/scratch/turuncu/CESM_pr_279/components/cam/src/control/cam_comp.F90:243
1899:MPT: #12 0x00000000009d38fc in atm_comp_nuopc::datainitialize (gcomp=..., rc=0)
1899:MPT:     at /glade/scratch/turuncu/CESM_pr_279/components/cam/src/cpl/nuopc/atm_comp_nuopc.F90:873
1899:MPT: #13 0x00002ac3b00a9432 in ESMCI::MethodElement::execute(void*, int*) const ()
1899:MPT:     at /glade/p/cesmdata/cseg/PROGS/build/28560/esmf-8.2.0b23/src/Superstructure/Component/src/ESMCI_MethodTable.C:377
1899:MPT: #14 0x00002ac3b00aa896 in ESMCI::MethodTable::execute (this=0x17541d20,
1899:MPT:     labelArg=..., object=0x1753f020, userRc=0x7ffee57be498,
1899:MPT:     existflag=0x7ffee57be222)
1899:MPT:     at /glade/p/cesmdata/cseg/PROGS/build/28560/esmf-8.2.0b23/src/Superstructure/Component/src/ESMCI_MethodTable.C:563

The full log can be seen in /glade/scratch/turuncu/SMS_D_Ln9_Vnuopc.ne0CONUSne30x8_ne0CONUSne30x8_mt12.FCnudged.cheyenne_intel.cam-outfrq9s_refined_camchem.20220406_162722_xutg84/run/cesm.log.3676318.chadmin1.ib0.cheyenne.ucar.edu.220406-192524

@uturuncoglu
Copy link
Collaborator

@mvertens Let me know if you want me to do more test? How do you want to proceed with this PR?

@mvertens
Copy link
Collaborator

@fischer-ncar @jedwards4b - are these expected fails for beta08? I think its fine to proceed with accepting and merging these PRs - but wanted to verify this first.

@fischer-ncar
Copy link
Contributor

fischer-ncar commented Apr 12, 2022

For cesm2_3_beta08
These two tests passed.
DAE_N2_D_Lh12_Vnuopc.f10_f10_mg37.I2000Clm50BgcCrop.cheyenne_intel.clm-DA_multidrv
SMS_D_Ln9_Vnuopc.ne0CONUSne30x8_ne0CONUSne30x8_mt12.FCnudged.cheyenne_intel.cam-outfrq9s_refined_camchem

These two tests failed.
ERP_D_Ln9_Vnuopc.C48_C48_mg17.QPC6.cheyenne_intel.cam-outfrq9s
ERP_D_Ln9_Vnuopc.f09_f09_mg17.FSD.cheyenne_intel.cam-outfrq9s_contrail

@DeniseWorthen
Copy link
Collaborator Author

@mvertens Please don't merge. Adding comp_present conditionals appears to resolve the issue w/ the ATM-WAV configuration, so I may want to make further changes to this PR branch.

@DeniseWorthen
Copy link
Collaborator Author

@mvertens This is ready for any final testing on your end. The additional checks for the presence of components allows me to run ATM-WAV only coupling for UWM. I ran all tests for UWM and all baselines passed.

@mvertens
Copy link
Collaborator

@uturuncoglu - are you comfortable with my merging this PR?

@uturuncoglu
Copy link
Collaborator

@mvertens It looks fine to me since those errors was not related with the PR but need to be investigated in the near future (not expected ones).

@mvertens
Copy link
Collaborator

@uturuncoglu - thank you. Actually - those failures are not errors - but newly asked output from the mediator to the wav.
I ran these differences by @alperaltuntas today - and we are both comfortable with these new export answers.

@mvertens mvertens merged commit ef360ea into ESCOMP:master Apr 22, 2022
junwang-noaa pushed a commit to NOAA-EMC/CMEPS that referenced this pull request Jun 9, 2022
* add new flux computation for UFS model and add new coupling mode for exchange grid implementation
* fix area field for new flux algorithm
* send fluxes to atmospheric model
* initial implementation for sending fluxes to UFS ATM
* update CCPP host model
* fix latent and sensible heat fluxes and clean code
* add new coupling mode for side by side flux comparison
* fix CCPP host model for latent and sensible heat fluxes
* fix aoflux calculation on agrid and add missing error checks
* add option to write meshes and update code that retrieve area information from xgrid
* update ccpp host based on recent changes in ccpp framework
* fix for providing cell area to CCPP host model
* make ccpp physics options configurable
* Refactor nems field exchange; set default masks for mapping in med_internalstate (ESCOMP#279)
Refactors esmFldsExchange_nems.F90 to use separate advertise and initialize phases and to check that a component is present before advertising a field to or from that component. Implements default src and dst mask values in place of the code currently in med_map_mod.F90. Fixes #63 and #64.
* use mesh file instead of grid name (ESCOMP#285)
This was done so that vertical component used in grid name does not affect tests.

Co-authored-by: Dom Heinzeller <climbfuji@ymail.com>
Co-authored-by: Francis Vitt <fvitt@ucar.edu>
Co-authored-by: Courtney Peverley <courtneyp@izumi.cgd.ucar.edu>
Co-authored-by: Jim Edwards <jedwards@ucar.edu>
Co-authored-by: mvertens <mvertens@users.noreply.github.com>
Co-authored-by: Denise Worthen <denise.worthen@noaa.gov>
Co-authored-by: Mariana Vertenstein <mvertens@ucar.edu>
Co-authored-by: Ufuk Turuncoglu <ufuk.turuncoglu@noaa.gov>
@DeniseWorthen DeniseWorthen deleted the feature/refac_fldxch_nems branch January 10, 2023 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor setting the source and destination mask values for mapping Refactor esmFldsExchange_nems
4 participants