-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regridding to a list of polygons (spatial averaging) #24
Conversation
@andersy005 @huard @raphaeldussin I'd like to have your input on adding |
I'm all for using the libs that will make the code more elegant and thus easier to maintain! I'm not super familiar with the Mesh part of ESMF, is the regridding possible from grid -> mesh and mesh -> grid ? I think we should try to allow all the possible in/out type combination that are supported by ESMF. What do you think? |
The trade-off here is having to worry about these additional dependencies. In my experience, these are not "light" dependencies. I agree they would make the user-interface a lot friendlier. However, my concern is that they'll complicate the job of maintainers. Could we think about a compromise? For example, we could have a notebook that shows how to interface geopandas with xESMF. We run the notebook as part of the test suite, but geopandas is not part of the installation hard requirements. |
Another alternative is to make these additional packages soft dependencies by moving the i.e. def func(ds):
try:
from shapely.geometry import MultiPolygon
except ImportError as exc:
message = f"shapely package is missing... Please install it via pip or conda: conda install shapely..."
raise exc(message)
# Do actual work here
... rather than from shapely.geometry import MultiPolygon
def func(ds):
# Do actual work here
... |
Thanks for the replies! The last commits add the However, I was not able to make the nearests, bilinear and patch methods work with lists including polygons of more than 4 nodes. The documentation of ESMF says these should work, but I get a |
The I finally didn't need |
Caveats of this PR:
In all cases, the error is as said above:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First batch of comments.
I'll take some more time to try it out.
My main concern is to see the My suggestion (for discussion) would be to
@mathause You might be interested to review this PR and provide your view on the implementation and the API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not familiar enough with ESMF to comment on the implementation but I would agree that the interface feels overloaded & using polylist_in/out
seems unwieldy. @huard's suggestion seems reasonable to me.
-
I find the name
locstream
extremely unintuitive - is this common knowledge for ESMF users? -
Note that the gridcells are not trusted in the notebook, therefore the HTML repr of xarray does not look nice, you may have to switch to the text repr (https://github.com/pangeo-data/xESMF/blob/c0b7d9ed9b102b26825e10ec7c85e0c02e521486/doc/notebooks/Spatial_Averaging.ipynb)
-
The weights are exact and not an approximation, right? That is worth mentioning.
-
How fast is the example? How long does it take when using all countries? (i.e. is it practicable in real life?) Do you know what the bottleneck is? (Creating the mesh or creating the weights?)
setup.py
Outdated
@@ -16,7 +16,7 @@ | |||
if on_rtd: | |||
INSTALL_REQUIRES = [] | |||
else: | |||
INSTALL_REQUIRES = ['esmpy>=8.0.0', 'xarray', 'numpy', 'scipy'] | |||
INSTALL_REQUIRES = ['esmpy>=8.0.0', 'xarray', 'numpy', 'scipy', 'shapely'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could shapely be made an optional dependency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could. I thought it was easy enough to install, but I don't know of all possible edge cases!
@mathause locstream is the name used in ESMF, for better or worse :) |
We should probably define the ESMF jargon in the docs though. Otherwise, there is a risk that users just don't know that a feature exists because it's called something else. |
@mathause I had thought about it, and agree it would be nice to make the link with regionmask. Could you please open a new issue and propose ideas to leverage the strengths of xesmf and regionmask? |
I had a flash this afternoon and tried a modified version of the For the notebook example, time went from 1.7 s to 0.9s. However, I am using a numpy >=1.16 method in order to convert structured arrays to unstructured ones. I also tried a version with 2D arrays instead of structured ones and the speed up was nowhere near what I get here. |
@bekozi This PR is inspired by what you've done in OCGIS to compute averages over polygons. Would you mind taking a look ? |
@raphaeldussin Review reminder ; ) There is another PR in the pipeline that started from this one. Merging this would make it easier to review. |
@@ -443,12 +547,30 @@ def esmf_regrid_finalize(regrid): | |||
regrid.destroy() | |||
regrid.srcfield.destroy() | |||
regrid.dstfield.destroy() | |||
regrid.srcfield.grid.destroy() | |||
regrid.dstfield.grid.destroy() | |||
# regrid.srcfield.grid.destroy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the reason for commenting out these lines?
ESMF is prone to memory leaks if objects are not destroyed properly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the new structure, we have the BaseRegridder
object that takes ESMF objects as input. Thus, destroying them after the weight computation could cause issues for a user who created those themself.
More specifically, I commented out these lines because SpatialAverager
uses the same grid in two different BaseRegridder
instances, so I instead of recreating it, I decided to reuse the same object.
A partial solution could be to destroy the grids where appropriate in Regridder
and SpatialAverager
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok let's see if this causes problems down the line. I will put an issue in to keep track of the potential problem
|
||
# double check | ||
assert regrid.finalized | ||
assert regrid.srcfield.finalized | ||
assert regrid.dstfield.finalized | ||
assert regrid.srcfield.grid.finalized | ||
assert regrid.dstfield.grid.finalized | ||
# assert regrid.srcfield.grid.finalized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
xesmf/frontend.py
Outdated
filename=None, | ||
reuse_weights=False, | ||
extrap_method=None, | ||
extrap_dist_exponent=None, | ||
extrap_num_src_pnts=None, | ||
add_nans=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no docstring for this option. Is it similar than the fix I introduced in PR #19 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woups, I don't exactly remember, but this might be a leftover from my first BaseRegridder
draft. I'll remove it. NaN's are indeed added to the weights the same way as done in #19, only the conditional statement has changed.
just a few minor things, also linting failed and so did readthedocs so I could not read the new notebook |
I can't understand the linting bug... I update pre-commit, did |
fails in PR #24 for no obvious reason
Voilà! |
just a few more nitpicks: I would put the tutorial under intermediate, not beginners. import warnings
warnings.filterwarnings("ignore") and remove last empty cell (or should we remove keep-empty in the precommit yaml?) |
I'm happy with the PR, great job @aulemahal |
This PR adds support for a list of polygons as arguments of Regridder. When used in combination with a conservative method, the regridding output is equivalent to a spatial averaging of the input data on each polygon.
Uses
shapely
for the polygon implementation. The list of polygon is translated as a Mesh with multiple elements of various number of nodes. In the regridding process, the same behaviour than havinglocstream
as output is used, thus regridded data has alocations
dim that matches the order of the polygon list.The
Regridder
will flatten the list of polygons (MultiPolygons object will be expanded) and will ignore potential holes in polygons. It makes its job easier, but is problematic for the end user. I thus created theSpatialAverager
subclass that takes care of theses caveats. It divides the list of polygons into exteriors (remembering the "owners" of flattenned out MultiPolygons) and holes and then creates the weights using 2 Regridder calls. Weights are merged (flattenned polygons re-combined and holes subtracted) and normalized (requiring a conversion from COO sparse matrices to CSC). The SpatialAverager is then instantiated with these merged weights.A new notebook was added and the "current limitations" page was update to highlight that this is not a efficient and elegant implementation of meshes, but a step forward.
Closes #11
Previous comment: