Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FMS2: Replace time_interp_external IDs with a MOM-defined type #376

Merged
merged 4 commits into from
Jun 23, 2023

Conversation

marshallward
Copy link
Member

All instances of an FMS ID to the internal interpolation content is replaced with a derived type containing additional metadata recording the field's origin filename and fieldname.

This additional information is required in order to replicate the axis data from the field, which is no longer provided by FMS2.

The abstraction of this type also allows us to either extend it or redefine it in other frameworks as needed in the future.

This primarily affects the usage of the following functions:

  • init_external_field
  • time_interp_external
  • horiz_interp_and_extrap_tracer

References to FMS1 move data into an axistype, but the contents are transferred to an axis_info when leaving the FMS1 layer. FMS2 directly populates the axis_info content.

The get_external_field_info calls are modified to return axis_info rather than axistype.

The redundant get_axis_data function is also removed from MOM_interp_infra, since get_axis_info provides an equivalent operation.

The following solvers are updated:

  • MOM_open_boundary
  • MOM_ice_shelf
  • MOM_oda_driver
  • MOM_MEKE
  • MOM_ALE_sponge
  • MOM_diabatic_aux

Of these, OBC was the most significant. The integer handle (fid) was previously used to determine if each segment field was constant or (if negative) read from a file. After being replaced by the derived type, a new flag was added to make this determination.

All of the coupled drivers have been modified, since they support time interpolation of T and S fields.

  • FMS
  • MCT
  • NUOPC

The NUOPC driver also includes modifications to its CFC11 and CFC12 fields. Changes to the MOM CFC modules replaces an id == -1-like test, which is not used by the derived type. This check has been removed, and we now solely rely on the present(cfc_handle) test.


This does not resolve all of our dependencies on FMS1, since content from time_interp_external must be replaced with content from time_interp_external2. But this PR allows us to begin the work of using the replacement module.

@marshallward
Copy link
Member Author

Since this touches the NUOPC and MCT couplers, I think it's important to get feedback (if not approval) from NCAR. @gustavo-marques @alperaltuntas @mnlevy1981 Are any of you able to review this?

Also feedback from @MJHarrison-GFDL @andrew-c-ross @kshedstrom would be appreciated.

Having said that, there will probably be a more intensive PR coming later which replaces the actual time_interp_external calls.

@codecov
Copy link

codecov bot commented Jun 7, 2023

Codecov Report

Merging #376 (fcba22f) into dev/gfdl (273da2f) will increase coverage by 0.01%.
The diff coverage is 42.37%.

❗ Current head fcba22f differs from pull request most recent head c010f0e. Consider uploading reports for the commit c010f0e to get more accurate results

@@             Coverage Diff              @@
##           dev/gfdl     #376      +/-   ##
============================================
+ Coverage     38.38%   38.39%   +0.01%     
============================================
  Files           268      268              
  Lines         76010    76021      +11     
  Branches      13987    13987              
============================================
+ Hits          29174    29186      +12     
- Misses        41597    41598       +1     
+ Partials       5239     5237       -2     
Impacted Files Coverage Δ
src/ice_shelf/MOM_ice_shelf.F90 0.00% <0.00%> (ø)
src/ocean_data_assim/MOM_oda_driver.F90 0.00% <0.00%> (ø)
src/parameterizations/lateral/MOM_MEKE.F90 43.21% <0.00%> (ø)
...rc/parameterizations/vertical/MOM_diabatic_aux.F90 61.38% <ø> (ø)
src/tracer/MOM_CFC_cap.F90 18.03% <0.00%> (ø)
src/core/MOM_open_boundary.F90 24.09% <12.50%> (-0.01%) ⬇️
src/parameterizations/vertical/MOM_ALE_sponge.F90 21.46% <20.00%> (ø)
src/framework/MOM_interpolate.F90 10.60% <25.00%> (ø)
src/framework/MOM_horizontal_regridding.F90 45.29% <62.50%> (ø)
config_src/infra/FMS1/MOM_interp_infra.F90 74.13% <84.21%> (-2.95%) ⬇️

... and 2 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@jiandewang
Copy link

@marshallward since there are changes in NUOPC_cap, I am going to test it in UFS.

@marshallward
Copy link
Member Author

Thanks @jiandewang I didn't realize you were using NUOPC.

@kshedstrom
Copy link

In repro mode, my Supercritical test is hanging after printing:

NOTE from PE     0: diag_manager_mod::diag_manager_init: diag_manager is using fms2_io
NOTE from PE     0: diag_manager_mod::diag_manager_init: prepend_date only supported when diag_manager_init is called with time_init present.
 MOM_in domain decomposition
whalo =    4, ehalo =    4, shalo =    4, nhalo =    4
  X-AXIS =  100 100
  Y-AXIS =  150
 MOM_in domain decomposition
whalo =    2, ehalo =    2, shalo =    2, nhalo =    2
  X-AXIS =   50  50
  Y-AXIS =   75
 MOM_MOSAIC domain decomposition
whalo =    9, ehalo =    9, shalo =    9, nhalo =    9
  X-AXIS =  200 200
  Y-AXIS =  300
 MOM_MOSAIC domain decomposition
whalo =    4, ehalo =    4, shalo =    4, nhalo =    4
  X-AXIS =  100 100
  Y-AXIS =  150

It runs fine in debug mode, where the next thing printed is:

WARNING from PE     0: Open_file is appending .nc to the filename ./ocean_geometry

It also runs fine with one core in repro mode. The ones that hang are using 2-4 cores.

@kshedstrom
Copy link

It runs on gaea with ifort22.
It runs with FMS 2023.01.
It fails on chinook with FMS 2023.01.01 and gfortran 11, but only when using multiple cores.
CCS2 (coupled MOM6-SIS2) also fails on chinook.

@kshedstrom
Copy link

Never mind, it's an FMS problem, not a problem with this patch.

@klindsay28
Copy link

I migrated a lot of the CFC_cap code out of the NUOPC driver and into MOM_tracer_flow_control.F90 and MOM_CFC_cap.F90 with NCAR#242. So if dev/gfdl gets to mom-ocean/main before dev/ncar, we'll need to merge these mods into dev/ncar, or you'll get to do it if dev/ncar gets to mom-ocean/main before dev/gfdl.

@marshallward
Copy link
Member Author

@klindsay28 Enough problems are starting to bubble up that I would not expect this in the next PR to main, so most likely we'll have to wrangle with the merge conflicts. Thanks for the heads up though.

@jiandewang
Copy link

@marshallward it failed to read in UFS restart file. I think it is because the time dimension is 1 instead of unlimited in that file. See sample file at /lustre/f2/dev/ncep/Jiande.Wang/For-Marshall.
This restart file is from CPC 3dvar ODA products which was generated at least 3 years ago (of course from an very old version of MOM6).
Note infra/FMS2 code is used in the compiling

@marshallward
Copy link
Member Author

marshallward commented Jun 14, 2023

Thanks @jiandewang I had a look at your error:

FATAL from PE     0: NetCDF: Invalid dimension ID or name: get_unlimited_dimension_name: file:INPUT/MOM.res.nc

This may be an error already in dev/gfdl rather than something about this PR. Either way, it's good that you caught it.

I think you are right that open_file expects to find an unlimited dimension, and get_unlimited_dimension_name (an FMS function) aborts if it's missing. We may need to find a way to relax this requirement.

All instances of an FMS ID to the internal interpolation content is
replaced with a derived type containing additional metadata recording
the field's origin filename and fieldname.

This additional information is required in order to replicate the axis
data from the field, which is no longer provided by FMS2.

The abstraction of this type also allows us to either extend it or
redefine it in other frameworks as needed in the future.

This primarily affects the usage of the following functions:

- init_external_field
- time_interp_external
- horiz_interp_and_extrap_tracer

The following solvers are updated:

- MOM_open_boundary
- MOM_ice_shelf
- MOM_oda_driver
- MOM_MEKE
- MOM_ALE_sponge
- MOM_diabatic_aux

Of these, OBC was the most significant.  The integer handle (fid) was
previously used to determine if each segment field was constant or (if
negative) read from a file.  After being replaced by the derived type, a
new flag was added to make this determination.

All of the coupled drivers have been modified, since they support time
interpolation of T and S fields.

- FMS
- MCT
- NUOPC

The NUOPC driver also includes modifications to its CFC11 and CFC12
fields.  Changes to the MOM CFC modules replaces an `id == -1`-like
test, which is not used by the derived type.  This check has been
removed, and we now solely rely on the `present(cfc_handle)` test.

While this could change behavior, there does not seem to be any scenario
where init_external_field would return -1 but would be passed to the
function.  (But I may eat these words.)
With removal of axis-based operations in FMS2 I/O, this patch removes
references to these calls and replaces them with MOM `axes_info` types.
References to FMS1 read into an `axistype`, but the contents are
transferred to an `axis_info`.  FMS2 directly populates the `axis_info`
content.

The `get_external_field_info` calls are modified to return `axis_info`
rather than `axistype`.

The redundant `get_axis_data` function is also removed from
`MOM_interp_infra`, since `get_axis_info` provides an equivalent
operation.

Generally speaking, this is not an improvement of the codebase.  The
FMS1 layer does a redundant copy of data from `axistype` to `axis_info`.
The FMS2 layer is significantly worse, and re-opens the file to read the
axis data for each field!  But if the intention is to leverage the
existing API, then I don't think we have any choice at the moment.

Assuming this is a relatively infrequent operation, this should not
cause any measureable issues, but it needs to be watched carefully.
This patch shifts all remaining time_interp_external functions from
time_interp_external to equivalent ones in time_interp_external2.

Internally, time-interpolated fields are initialized with `ongrid` set
to `.true.`, and such fields are assumed to be on-grid.

This seems to hold for all existing instances of `time_interp_external`,
but needs to be monitored in the future somehow.
@marshallward
Copy link
Member Author

marshallward commented Jun 15, 2023

I updated this PR to replace the time_interp_external module calls with time_interp_external2. This includes a change which assumes that time-interpolated external fields are always on-grid (via ongrid=.true. in init_external_field; Thanks to @uramirez8707 for the suggestion.)

This appears to be a safe assumption, but I could be wrong about it. But if we assume it, then it resolves the few outstanding issues with adopting time_interp_external2.

@marshallward
Copy link
Member Author

Another issue has emerged: It seems that the FMS1 init_external_field was case insensitive, but the new FMS2 version is case-sensitive. I am looking into how to handle this.

@marshallward
Copy link
Member Author

I added a commit to support case-insensitive field access in FMS2, which appears to work in my local testing. As it is, I believe this particular PR is ready to merge.

However, it might be helpful to resolve the (presumably unrelated) issue detected by @jiandewang before merging, so that it can be tested in UFS.

@kshedstrom
Copy link

The current version of this runs the dumbbell default tests, the dumbbell scaled tests, the dumbbell rotated test, but not the dumbbell scaled and rotated test. Stack trace:

Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x7f18ec73856f in ???
#1  0x11d187d in __mom_interpolate_MOD_time_interp_external_3d
	at //import/c1/AKWATERS/kate/ESMG/ESMG-configs/src/MOM6/src/framework/MOM_interpolate.F90:174
#2  0x4cd5b3 in __mom_open_boundary_MOD_update_obc_segment_data
	at //import/c1/AKWATERS/kate/ESMG/ESMG-configs/src/MOM6/src/core/MOM_open_boundary.F90:3898
#3  0x15ed6c2 in __mom_boundary_update_MOD_update_obc_data
	at //import/c1/AKWATERS/kate/ESMG/ESMG-configs/src/MOM6/src/core/MOM_boundary_update.F90:160
#4  0xf37547 in __mom_dynamics_split_rk2_MOD_step_mom_dyn_split_rk2
	at //import/c1/AKWATERS/kate/ESMG/ESMG-configs/src/MOM6/src/core/MOM_dynamics_split_RK2.F90:507
#5  0x18d52e6 in step_mom_dynamics
	at //import/c1/AKWATERS/kate/ESMG/ESMG-configs/src/MOM6/src/core/MOM.F90:1174
#6  0x18de1f1 in __mom_MOD_step_mom
	at //import/c1/AKWATERS/kate/ESMG/ESMG-configs/src/MOM6/src/core/MOM.F90:853
#7  0x151d7b4 in mom6
	at //import/c1/AKWATERS/kate/ESMG/ESMG-configs/src/MOM6/config_src/drivers/solo_driver/MOM_driver.F90:484
#8  0x151ed0f in main
	at //import/c1/AKWATERS/kate/ESMG/ESMG-configs/src/MOM6/config_src/drivers/solo_driver/MOM_driver.F90:27

Something isn't right with the rotation of data.in.

@marshallward
Copy link
Member Author

@kshedstrom Are you saying this fails in dev/gfdl as well? Or just in this PR?

@kshedstrom
Copy link

@marshallward Sorry, answers don't change from dev/gfdl. It's just that I was in debugger mode to look at something else and it tripped over the invalid numbers. But I do need to figure out what's wrong with my rotated cases now that I have the debug executable.

@kshedstrom
Copy link

Running on chinook04 (gcc 11.3), answers match between the two MOM6 versions. Going back to chinook01 (gcc 8.3) with dev/gfdl, the answers still mostly match in repro mode, also in debug mode without this flag: -ffpe-trap=invalid,zero,overflow
Turning it back on again, I managed to reproduce my blow-up. I need to ask about a debugger on chinook04. Sorry to have bothered you with a compiler bug.

Copy link
Member

@Hallberg-NOAA Hallberg-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I agree with the broad direction of this PR, it has introduced (for the first time) the use of code from the src/framework directory by code in the config_src/infra directories. I think that this is potentially dangerous, in that it could lead to predecessor cycles, and that it could inhibit the compilation of parts of MOM6 as libraries. I would like to see this PR revised to avoid this backward dependency, or else I would like to have a broader discussion about the relative roles of the code in the various src/ and config_src/ directories before we collectively adopt this substantial revision to the effective code structure of the model.

@marshallward
Copy link
Member Author

MOM_error is easily replaced with MOM_err, and maybe something can be done about axis_info (perhaps the infra-specific axistype can be used in its place, although IIRC the problem is that there is no analogue to set_axis_info.). But aside from reimplementing lowercase() at the infra level, for the sole purpose of avoiding the appearance of a circular dependency, I can't see any other solution.

IMHO, I see no problem with config_src referencing content in src when the content is properly platform-independent, as in the case of lowercase() and axis_info/set_axis_info. But I'll do whatever you request in this case, though I will need some advice on how to move forward.

The FMS1 implementation of init_external_field is case-insensitive, but
the FMS2 implementation is case-sensitive, which can cause errors in
older established input files.

This patch sweeps through the fields of the input files and checks for a
case-insensitive match (using lowercase()).  This requires an additional
open/close of the file.
@Hallberg-NOAA Hallberg-NOAA dismissed their stale review June 23, 2023 19:29

After further consideration and discussion, there is not a problem with config_src/infra code using modules from the src/framework directory, so long as there is no predecessor cycle created between modules. This PR does not create any problematic predecessor cycles, so no changes are needed to address this potential concern.

Copy link
Member

@Hallberg-NOAA Hallberg-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are moving the code in the right direction and addressing important problems that have arisen between SIS2 and some versions of FMS. It should be merged into dev/gfdl now that the pipeline regression testing has passed at gitlab.gfdl.noaa.gov/ogrp/MOM6/-/pipelines/19606 .

@Hallberg-NOAA Hallberg-NOAA merged commit dd1ee34 into NOAA-GFDL:dev/gfdl Jun 23, 2023
@marshallward marshallward deleted the fms2_interp_update branch July 21, 2023 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants