Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for exchange grid with add_gusts false: Only add gust fields if add_gusts is true #501

Merged
merged 1 commit into from
Sep 28, 2024

Conversation

billsacks
Copy link
Member

@billsacks billsacks commented Sep 25, 2024

Description of changes

Runs with aoflux_grid = "xgrid" and add_gusts = .false. have been failing with:

20240903 141813.539 ERROR            PET003 ESMCI_Container.h:178 ESMCI::Container::get() Invalid argument  - key does not exist: So_ugustOut

This PR fixes this issue by avoiding adding the two new gust fields to the list of mediator fields if add_gusts is false.

The explanation for why this caused an issue with the exchange grid is:

  • In mediator/esmFldsExchange_cesm_mod.F90: esmFldsExchange_cesm, So_ugustOut and So_u10withGust are added as atmosphere-ocean fluxes regardless of the value of add_gusts.
  • But in mediator/med_phases_aofluxes_mod.F90: set_aoflux_out_pointers, the fldbun_getfldptr call is only made if add_gusts is true.
  • There is a difference in behavior with xgrid than with ogrid here: with xgrid, the call to fldbun_getfldptr is responsible for adding the field to FBaof_x. (I think with ogrid or agrid, the fields have been added to the relevant field bundle already.)
  • So we end up with these gust fields not in FBaof_x, but still in fldnames_aof_out, and this leads to an inconsistency.

@megandevlan and @jedwards4b: I'm not positive that this is the right fix: It does solve the issue, but I'd like one or both of your input on whether this is the right way to solve the issue. In particular, can we rely on CAM working correctly when add_gusts is false if these fields are absent? It looks to me like CAM will set the relevant fields to 0 if they are absent; will that lead to correct behavior when add_gusts = .false.?

Specific notes

Contributors other than yourself, if any:

CMEPS Issues Fixed (include github issue #):

Are changes expected to change answers? No, but there are FIELDLIST diffs for cases with add_gusts false: the following fields will no longer be present on the cpl hi files when add_gusts is false:

  • Med_aoflux_ocn_So_u10withGust
  • Med_aoflux_ocn_So_ugustOut
  • Med_aoflux_atm_So_u10withGust
  • Med_aoflux_atm_So_ugustOut
  • atmExp_So_u10withGust
  • atmExp_So_ugustOut

Any User Interface Changes (namelist or namelist defaults changes)? No

Testing performed

Please describe the tests along with the target model and machine(s)
If possible, please also added hashes that were used in the testing

Ran three versions of SMS_D_Ln9.ne30pg3_ne30pg3_mg17.FLTHIST.derecho_intel.cam-outfrq9s from cesm3_0_alpha03c:

  1. add_gusts = .false., aoflux_grid = "ogrid"
  2. add_gusts = .false., aoflux_grid = "xgrid" (This is the case that was failing previously.
  3. add_gusts = .true., aoflux_grid = "xgrid"

All three pass now.

Also ran SMS_Ld40.TL319_t232.C_JRA.derecho_intel (which has add_gusts = .false.) with aoflux_grid = "xgrid"; this also passes now but failed before the changes in this PR.

I have also run the following tests with comparisons against baselines: all pass and are bit-for-bit, though with FIELDLIST diffs as expected:

  • SMS.TL319_t232.G_JRA.derecho_intel.mom-no_stoch_physics
  • ERP_Ln9.f09_f09_mg17.F1850.derecho_intel.cam-outfrq9s
  • SMS_Ld2.ne30pg3_t232.BLT1850.derecho_intel.allactive-defaultio with change to user_nl_cpl to set add_gusts false

@billsacks
Copy link
Member Author

I have run a few additional tests with baseline comparisons and have updated the description above to note these tests.

Copy link
Collaborator

@jedwards4b jedwards4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for correcting this logic, the fields should not be required if add_gusts is false. Can you run a couple of tests using cam6 as well?

@megandevlan
Copy link
Collaborator

Thanks @billsacks and @jedwards4b for getting to this so quickly! I believe that CAM should behave as expected when add_gusts = false, so this fix should be fine.

@billsacks
Copy link
Member Author

billsacks commented Sep 28, 2024

Can you run a couple of tests using cam6 as well?

I got hung up in this testing due to #505 . But I have now done the following additional tests, using this diff:

diff --git a/cime_config/config_compsets.xml b/cime_config/config_compsets.xml
index 099602a..29b1333 100644
--- a/cime_config/config_compsets.xml
+++ b/cime_config/config_compsets.xml
@@ -38,6 +38,11 @@
 
   <!-- 1850 compsets Default, Mosart, Wave for CESM3 -->
 
+  <compset>
+    <alias>B1850</alias>
+    <lname>1850_CAM60_CLM50%SP_CICE_MOM6_MOSART_DGLC%NOEVOLVE_SWAV</lname>
+  </compset>
+
   <compset>
     <alias>BLT1850</alias>
     <lname>1850_CAM70%LT_CLM60%BGC-CROP_CICE_MOM6_MOSART_DGLC%NOEVOLVE_SWAV</lname>

Ran these tests with CAM6:

  • SMS_Ld2.ne30pg3_t232.B1850.derecho_intel.allactive-defaultio (with default ogrid): passes and bit-for-bit
  • SMS_Ld2.ne30pg3_t232.B1850.derecho_intel.allactive-defaultio with aoflux_grid = "xgrid": passes (can't compare against baseline because the baseline fails as expected due to the issue fixed in this PR)

Note that I had also previously run ERP_Ln9.f09_f09_mg17.F1850.derecho_intel.cam-outfrq9s, which also uses CAM6; that also passed and was bit-for-bit with the baseline.

I have also run these additional tests with BLT1850 (which has add_gusts = .true.)

@jedwards4b let me know if you feel this testing is sufficient; if so, I think this is ready to merge if you agree.

@jedwards4b
Copy link
Collaborator

Thanks for doing that test. I think that we are ready to merge.

@billsacks billsacks merged commit b1e2325 into ESCOMP:main Sep 28, 2024
2 checks passed
@billsacks billsacks deleted the fix_xgrid_f_compset branch September 28, 2024 14:44
billsacks added a commit that referenced this pull request Sep 28, 2024
Fix xgrid reproducibility by using srcTermProcessing=0 for all xgrid FieldRegridStore calls

### Description of changes

Use srcTermProcessing=0 for all xgrid FieldRegridStore calls.

This is needed for bit-for-bit reproducibility. Most of the calls to FieldRegridStore weren't setting srcTermProcessing at all; this can lead to irreproducibility of the results. Two of the calls had srcTermProcessing set to 1 before, which leads to reproducibility but (according to Gerhard Theurich) is often worse for performance than a value of 0.

Note that we use a value of 0 in the calls in med_map_mod.

### Specific notes

Contributors other than yourself, if any: Guidance from @oehmke and @theurich

CMEPS Issues Fixed (include github issue #): Resolves #505 

Are changes expected to change answers? YES: roundoff-level differences expected for runs with `aoflux_grid = "xgrid"` (but those runs had expected roundoff-level differences from run to run before this fix anyway)

Any User Interface Changes (namelist or namelist defaults changes)? No

### Testing performed
Please describe the tests along with the target model and machine(s) 
If possible, please also added hashes that were used in the testing

In the context of cesm3_0_alpha03c, with the change here put on top of the change in #501, ran `REP_Ld2.ne30pg3_t232.BLT1850.derecho_intel.allactive-xgrid`, where the `xgrid` testmod contained:

include_user_mods:
```
../defaultio
```

user_nl_cpl:
```
aoflux_grid = "xgrid"
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants