-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle different grid coordinate formats and naming #74
Comments
I support the idea of using the CF-Convention as the standard xESMF is designed around. To me this would imply that
Using this approach, there's no need to rename variables before and after regridding. However, I can't find mentions of the definition of However, this would break backward compatibility, and does not provide a mechanism to pass pre-computed bounds. A solution to this would be to have a |
I also agree that supporting CF conventions would be very useful, and would allow me to remove a quite a few kludges in my code when using xESMF. With CF convention support in mind, there are many scattered efforts across the Pangeo ecosystem to interpret CF conventions to permit more intuitive APIs and cleaner code. For example: Would this be something where a common
Tagging some people I've heard bring up an idea like this before:
|
Thanks @jthielen. Another person I think should be kept in the loop in this discussion is @snowman2 who maintains pyproj and rioxarray. I bring these projects up because:
The combination of these two projects has resulted in something that I've thought is much better than what I was attempting in geoxarray. @snowman2 creates a "spatial_ref" coordinate variable which itself has a Edit: I should have mentioned that I'd like geoxarray into something similar to rioxarray but not rasterio specific (no rasterio/gdal dependency). |
@djhoese thanks for the mention, hopefully I can provide something useful to the conversation.
if the dimension is not set, then |
I agree something like cf-xarray would be useful. I'm concerned however that diving into this topic here could distract from the issue at hand. Is there a place where all the potential use cases for cf-xarray could be listed, with links to existing code ? I think having this in one place would be helpful in designing an API for cf-xarray. |
discourse.pangeo.io? |
My $0.02: cf-xarray is definitely the right way to go. I envision a lightweight library for parsing cf metadata that all these packages can depend on. Given the technical nature of this discussion, maybe we can keep it on github. An issue on https://github.com/pangeo-data/pangeo would be appropriate. |
Sounds good. I can plan on writing something up today to get that discussion started over there (unless someone beats me to it!), and then leave this issue alone w.r.t. cf-xarray until it would be ready to use. Update: see pangeo-data/pangeo#771 |
This is a meta-issue summarizing those frequently-asked questions:
The problem
xESMF requires the input grid objects to contain variables
lon
/lat
of shape(n_lat, n_lon)
, and optionallylon_b
/lat_b
of shape(n_lat+1, n_lon+1)
for conservative regridding.This leads to the naming problem that the original name might be
latitude
,'lat_bnds'
,'latitude_bnds'
, and the boundary formatting problem that the original boundary array might have shape(n_lat, n_lon, 4)
instead of(n_lat+1, n_lon+1)
. The current fix is to rename the coordinate (#5) and reformatting the boundary (#14 (comment))The two problems often occur together: the CF-convention uses the name
lat_bnds
with a shape of(n_lat, n_lon, 4)
.An upstream cause is that an
xarray.Dataset
has no notion of cell boundaries; other packages like xgcm tries to workaround that (#13). Xarray also does not force CF-convention.Desired features for the solution
1. Unambiguous
xesmf should always be very clear and strict on the expected grid format. This prevents tricky edge cases and user confusion. What if the input dataset/dictionary contains all the three variables
lat_b
,'latitude_b'
,'latitude_bnd'
? Which one is picked? Or should it throw a "duplication error"? (#38 (comment))There can be options to choose one of the names, but at a time there should only be one valid name, which can be printed out by something like
xesmf.get_config()
.The expected boundary array format also has to be explicit. There can be extra, preprocessing function to reformat the boundary, but such option has to be set explicitly so that users are aware of what they are doing.
2. Simple
This is a simple problem and needs a simple, intuitive solution. Although being annoying, such issue does not affect xesmf's core functionality (the regridding computation). I am hesitate to put complex preprocessing logic or clever heuristics to guess the name and tweak the grid format. Complex code adds maintenance cost, and causes confusion for users who do not need such feature.
An alternative, simple "fix" to this issue, is to add some example notebooks showing how to preprocess various different grid formats, without complicating the package source code.
If changes are made in source code, there won't be too many lines of new code and many
if
/else
switches - those are an indication of complex logic.Proposed solutions
1. Allow custom coordinate naming, by implementing
xesmf.config.set(grid_name_dict=...)
as global config or context manager.(Originally proposed at #38 (comment))
The
grid_name_dict
exactly followsxarray.Dataset.rename(name_dict)
.xesmf.config.set
works similarly asxarray.set_options
ordask.config.get
. There should also be axesmf.config.get('grid_name_dict')
to print the current expected grid name.Example usage:
xesmf.config.set
might also be used to set other general configurations, although I haven't thought of an example. If there're no other configurable parameters, can also just implement a single-purpose functionxesmf.set_grid_name()
/xesmf.get_grid_name()
.2. Implement utility functions for reformatting cell boundaries
Change
(n_lat, n_lon, 4)
to(n_lat+1, n_lon+1)
, similar to OCGIS #32 (comment)Another very useful util is inferring boundaries from centers #13 (comment):
Optionally, those functions can be wrapped in the high-level API like
xe.Regridder(..., boundary_format='4_corners')
orxe.Regridder(..., boundary_format='inferred')
.3. Simple support for CF-convention, built on step 1 and 2
Given the popularity of CF-convention, it makes sense to support such input data out-of-box (#38 #73). I emphasize "simple" because xesmf has no reason to check all CF-compliant attributes such as
unit = 'degrees_east'
orstandard_name = 'latitude'
-- this is not the task for a regridding package.For coordinate naming, can just set
xesmf.config.set(grid_name_dict=xe.config.cf_grid_name)
, whereis pre-defined for convenience. Can also add more pre-defined dictionaries for other names like
latitude_bounds
,lat_bnds
, or simply let users set their own.The boundary formatting should explicitly go through step 2. Handling the boundary decoding automatically can often lead to corner cases and errors. For example what if the input grid is a 4-tile grid of shape (n_lat, n_lon, 4), but gets mis-interpreted as 4 corners? Should the
Regridder
throw an error when seeing 3-D grids, or check whether it is another representation?Any comments & suggestions? PRs are particularly welcome, especially on the
'grid_name_dict'
part.The text was updated successfully, but these errors were encountered: