-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
let's enumerate all the ways to represent a "grid" in python #356
Comments
I'll add "my" |
It sounds like you guys want a python class that has some method objects to act on the grid rather than simply an xarray schema to use, so this may not be the most helpful... but CESM wants grids in the SCRIP format: that's a netcdf file with very specific fields. From a very old version of the User Guide (Figure 2.2.2 on page 8)
At some point a Interestingly enough, SCRIP was originally a full suite of Fortran-based grid tools (SCRIP stands for Spherical Coordinate Remapping and Interpolation Package) though the package itself is no longer supported... presumably because ESMF does everything it used to do. |
@fmaussion Thanks for the mention. So many good projects listed here that I didn't know about. For everyone else, I am one of the maintainers of the pyresample package which is very similar to some of these other libraries and focuses on resampling large satellite data arrays to new projections (used by the satpy library. Pyresample has a couple different geometry objects and some are subclasses of others so I'll just list the two most important and most used:
I should also point out a similar issue to this on the pyresample repository: pytroll/pyresample#57 For anyone who hasn't been linked to it already this type of information may be useful to the geoxarray project I'm starting (pydata/xarray#2288, https://github.com/geoxarray/geoxarray). I had been considering duplicating some of the work done by pyresample's AreaDefinition and SwathDefinition objects in geoxarray, but don't have anything implemented yet. So...what is the end goal of this issue? |
My goals for now is just to understand the extent of duplication of effort in developing grid models and to start a discussion on whether a more unified approach would be worthwhile. Some of the downsides of having many different implementations for very similar concepts are:
As a concrete example of the lack of interoperability, xgcm, my package, has its own representation for finite volume grid cells (cell center, cell face, etc.). @JiaweiZhuang's xesmf package can do conservative regridding, for which it needs to know about the cell extents. Lacking a universally accepted way to describe these grids, Jiawei invented a [completely reasonable] new convention for specifying the grid cell bounds. Consequently, despite ostensibly knowing about the same type of object, xgcm and xesmf are not interoperable. (I am not trying to pick on Jiawei--xesmf is amazing! Examples could have been drawn from any combination of two packages from the above list.) Reviewing the docs from the list, I think there are two basic categories of things called "grids":
Just as numeric and numarray decided to merge into numpy, it could be beneficial for our [much smaller] software ecosystem to merge some of the above efforts. |
p.s. Worth noting that definition 2 (finite volume grid) doesn't have to live on the Earth. This type of grid is useful also for models run in cartesian coordinates. |
Right, sorry for spaming this issue if you were just talking about category 2 in the first place. |
Whilst there may be two different 'world views' on grids (one from the modellers perspective and one from the earth observation perspective) I feel like some of the important scientific questions will be more easily answered if both didnt also succumb to the interoperability issue that @rabernat outlined (which I think is the main point of this issue)
This could be generalised even further to say that the variables may also be defined on grids, adding additional dimensions beyond space and time (i.e. wavelengths of light, surface wave direction and frequency energy bins etc etc), where a lot of the same properties relating the conservative up- and down-sampling still also apply So ideally the specification and implementation allows for a hierarchical definition of a grid from 1D through to nD? ...maybe too far down the rabbit hole and reminds me of some of the discussions I have read around xarray and the fundamental challenge dealing with nD-data resulting from the distinction between dimensions (array indicies) and coordinates |
@rabernat The xgcm versus xesmf gridded is interesting because looking at the two packages' documentation on grids I wouldn't immediately assume they were solving the same problem. Looking at the xesmf documentation the examples for gridding seem to be based on lon/lat grids. The xgcm seems like it handles a lot more cases but being unfamiliar with the package it isn't clear to me how to apply it to my remote sensing problems. I like the idea of describing the grids as DataArray objects, however, in my data cases I usually deal with 1000s x 1000s pixel grids (22k x 22k on the larger end) which isn't always represented well with xarray coordinates (not sure if this is considered large to the other things discussed here). In my satpy library (which uses pyresample) we use dask underneath xarray to help with the memory and performance issues that come up with processing these larger arrays. If we store things in With the "geoxarray" package I'm starting I was going to add an xarray accessor (http://xarray.pydata.org/en/stable/internals.html#extending-xarray) to provide a simple way to access the coordinate reference system (CRS). It sounds like anything that comes out of this issue could work well with "geoxarray" where you could have a grid description from some gridding package and add CRS information to it in a way that geoxarray could support. The brainstorm is starting... |
I would request that you also keep in mind astronomical World Coordinate Systems (WCSs), normally stored in the headers of FITS files. I don't know how similar the types of coordinate mappings might be; for astronomy FITS, it is often affine, but could be higher-order polynomials. |
This is kind of my point. I implemented the parts of a "grid" that I cared about immediately for my applications (focused on finite differencing and related operations on the native grid space), but have not yet implemented the more "geographical" aspects of grids. |
Oh we should also add all the ways that geoviews handles grids: cc @philippjfr |
@martindurant I won't exclude the idea, but I'm also not sure I know the differences needed by the astronomy community. I read a little bit of astropy's wcs documentation, but I'm not sure I have a great grasp on the concepts quite yet. @rabernat Most of that looks like 2D lon/lat grids with values for each pixel. I also wanted to point out that in the remote sensing field we typically deal with data as an "area" where one data point represents an area. The other common one is a single point. It is assumed that the coordinate for a pixel represents the center of the pixel/area/cell. |
The astro WCS is basically an object encoding an (arbitrary) function for transforming pixel coords to logical coords and usually also the inverse. In the simplest case, there will be a few keys in the file metadata like reference logical x/y, reference pix x/y, scale, but xarray doesn't need to know how things are actually stored. As was discussed at scipy, the problem comes when the coords can't be expressed as a simple grid of values at each pix location; I expect you are facing exactly the same problem. |
Was this discussed at a BOF? I missed one I wanted to go to the first day because I had to represent vispy at the visualization BOF. When you say "can't be expressed" do you mean the coordinates have to be specified for each pixel (non-uniform spacing)? |
This has been a very interesting discussion so far. The ESMF mesh/grid model can serve as a useful reference point having been built up over years of use cases. This is not to say it is the best user-facing grid model! I expect ESMPy will keep pretty close to ESMF internals with xESMF being the best way to interact with xarray packages. With ESMF, we are working to integrate MOAB as our mesh backend.
@djhoese Nice. Are you standardizing on CF convention with PROJ.4 or something? At the workshop, I talked with @dopplershift and @niallrobinson about creating packages similar to
@djhoese Currently xESMF supports a subset of the features available in ESMPy...
@pbranson I don't think so at least. Time-varying grid/meshes have applications in sea level rise modeling for example. It usually the edge cases that break conventions!
@rabernat & @JiaweiZhuang, is the xESMF/xGCM conservative regridding compatiblity issue related to how corners are handled? I expect xESMF has corners with a |
@bekozi It is probably best to skim through the related xarray issue here: pydata/xarray#2288 But yes, the idea would be that you could do I'm not sure how much PROJ.4 and CF projection works with astronomy WCS stuff (@martindurant). Now if only I had the time to write it... |
Not sure how xgcm represents corners internally, but The "incompatibility" is largely because the totally different purposes of the two packages. xgcm has a strong focus on finitely-difference calculations, which requires switching between staggering locations and the connectivity information between (cubed-sphere) panels. It does store corner information nicely, but most of other features are an overkill for xESMF. xESMF just needs a plain I want to keep the absolutely simple user interface for xESMF and minimize the dependency on advanced data structure (pure numpy arrays should always work). I got many users migrated from MATLAB/NCL/IDL and they just barely know xarray and even numpy. Asking them to learn other fancier packages like xgcm will blow their mind... Maybe the interoperability with other packages can be built on top of the current interface? |
Given the diversity of user needs and the huge cultural gap between different research communities, I feel that it is almost impossible to invent a one-for-all representation of "grid". The xgcm vs xESMF issue is a rather small cultural gap: Many xESMF users are from the environment & air quality modeling community (my field of study); they only care about conservatively regridding chemical concentration and emissions fields, and never worry about taking spatial derivatives (vorticity, divergence). So they will have little incentive to learn the additional features that xgcm provides. A much bigger cultural gap is between GIS and numerical modeling. pydata/xarray#2288 is indeed a great proposal from a GIS perspective, but the modeling people mostly just think a "grid" as simple 1D/2D arrays of latitude&longitude. GIS terms like "datum", "geodetic" and even simply "coordinate reference system" can feel quite exotic to atmospheric modeling people. The major reason is that atmospheric models always assume a perfect sphere, so people never worry about the choice of geodetic datum as in GIS (although sometimes they probably should; see Monaghan et al. 2013 and Cao et al. 2017). So, for the comment:
Exactly, because xESMF (recall the name Earth System Modeling Framework) is written more for the numerical modeling people. On the contrary, GIS people will probably find xESMF inconvenient, because the grid has to be expressed in true lat/lon values on a perfect sphere. |
@JiaweiZhuang -- thanks for your response. I guess it was not sufficiently clear from my message that I am not proposing to merge any existing packages tomorrow or force your users to adopt an unfamiliar new tool. xesmf is an amazing piece of software which has already developed a wide following. The last thing I would want would be to undermine this success in any way. I am mostly concerned here with the internals of these packages, i.e. the code that the developers have to write to represent a grid. So yes, while xgcm and xesmf currently have very different user-facing goals, internally, they both had to create very similar data structures related to the representation of finite volume grid cells. (And so did many of the other packages on my list.) I agree that distinction between "GIS grids" and "GCM grids" an important one. I think this is a useful outcome of this discussion. |
@rabernat Your clarification is clear and I think we are on the same page (Also, thanks very much for the kind words about xESMF!)
A useful thing would be identifying small, individual, commonly-used functionalities for grid operations, and adding them to xarray's core or a very small extension module. For example:
Right now it is easy to create an illegal coordinate with discontinuous/non-monotonic values: dr = xr.DataArray([1,1,1,1], coords={'x': [1,2,4,3]}, dims='x') It will only throw an error at plotting, not at creation. Seems like xgcm
Discontinuity in the grid coordinates once led to a bug in xESMF (JiaweiZhuang/xESMF#16). It would be nice to have something like
For a grid object passed to xESMF, its center coordinates are totally decoupled from its corner coordinates (they are just separate numpy arrays). A consistency checking ("is the mean of the cell corners equal to the cell centers?") is quite useful for preventing data mismatch and for inferring cell boundaries (JiaweiZhuang/xESMF#5 (comment)). Not sure how xgcm ensures data consistency. Actually, xarray already has Those functionalities are pretty general for all grid objects, regardless of actual data presentation. This doesn't address the issue of diverging data structures, but should avoid some duplicated efforts. |
I have to comment on this, because I think this is only partly true. Yes, atmospheric modeling people often do not know what a map projection is, or a "coordinate reference system". But this is causing them a lot of trouble: most (all?) limited area models like WRF do think of the earth as a sphere indeed, but they use a map projection to represent their grid, which is cartesian in eastings-northings (the map projection reference frame) and irregular in lon-lat. People non-aware with CRS will deal with the issue as following:
The much better way to deal with this would be to stay in the original map projection and regrid the other datasets you want to compare your model to (e.g. satellite products) into the original model projection. I was surprised to see that a blog post I wrote about WRF map projections is the most visited entry on my webpage. I think that many people in the python/xarray ecosystem would benefit from a unified handling of CRS in a tool which will hide all this complexity from them. Combined with a tool like pyresample or xesmf they would have all they need to work in cartesian grids instead of relying on the approximations of lon/lat grids. |
@fmaussion Thanks, I really enjoy your blog post. But I have a somewhat opposite opinion on this.
Whether to use true lat/lon or the projected coordinates largely depends on the problem a user wants to solve. For common questions like "what's the temperature at this location", using lat/lon is way more convenient/intuitive because the location of a city or a observation site is mostly expressed in lat/lon. More concretely, the Directly using projected coordinates seems quite rare in atmospheric modeling. The only use case I am aware of is deriving numerical schemes on curvilinear grids. For example, instead of using lat/lon to describe a cubed-sphere panel, you can use local, orthogonal variables ranging from -1 to 1, to make the math much cleaner (ref: Section 3.1. Gnomonic projection in Finite-volume transport on various cubed-sphere grids). But this notation is never exposed to users -- cubed-sphere models like FV3 write output data with true lat/lon coordinates. For conservative regridding, it is important to stay in the true lat/lon space. Most projections, except equal-area ones, do not preserve the cell area. Conformal projection preserves local shape but distorts the area ratio between cells. Doing conservative regridding in the projected space does not preserve the area-weighed sum on the true sphere.
This plotting issue is because
Instead of a "unified handling" between GIS and modeling, I'd prefer having a convenient converter between the two, while allowing each community to keep their existing culture. |
I see these as the same thing and that is at least the initial goal for geoxarray. You have lon/lat coordinates, great use them. You have projection information following the CF standard, go for it. You have a PROJ.4 string and X/Y coordinates, you can use that too. So the data is in the format the user is used to, but libraries can use a unified tool for handling the conversion and handling the data. |
cc @adcroft, who may be interested in the discussion |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Definitely not resolved StateBot! |
Hi all, I've been reading this thread with great interest, since I am currently in the process of refactoring some old kludgy code from my graduate school days that handles analysis and plotting of high-resolution cloud-resolving model output. The CRMs I work with all have the common thread that they are based on a staggered grid in the horizontal and vertical (specifically an Arakawa C-grid). But many of them are designed to run in "idealized" mode where the grid is not typically georeferenced, by which I mean it does not specify the horizontal coordinates in terms of lat/lon, but rather a more generic physical rectilinear grid. I personally run these and other models both in "idealized" mode and in the "real-data" geo-referenced mode discussed in this thread, and I know this is very common, especially in the high-resolution storm modeling community. So first, I just wanted to bring up this particular use case to add to the very good discussion above, since I think it was mentioned only in passing. Second, my code does handle locating and plotting the various variables on their native grid, whether it be the cell centers or faces, but it does so in a rather application-specific and kludgy way. I plan on putting it up on GitHub as soon as I can get through this current refactoring session, which is long overdue. Hopefully that won't take too terribly long. More to the point though, when I started the refactoring, I thought, "Surely someone has written a library somewhere that handles all the grid-related manipulations associated with staggered grids already, so I may not have to waste as much time cleaning my own implementation up". That's when I stumbled on @ChrisBarker-NOAA 's gridded package and later this thread. In my admittedly cursory look so far, it's not clear to me whether the "gridded" package can handle the aforementioned case where the grid is not represented in lat/lon coordinates. I could generate "fake" lat/lon coordinates using, say, a flat earth approximation, etc., but I was hoping there would be a more "native" way. Any insight and discussions would be greatly appreciated, and I'm happy to continue this sort of discussion and help out in the larger efforts being discussed here as much as I can. |
Gridded does currently assume lay-Lon coords. But other than the naming, it is actually treating them as orthogonal coordinates. That is, not handling projections or wrapping around the earth, etc. So it would be easy to adapt to other coordinates. In fact, I’d like to do that anyway to support projected coordinates anyway. |
Thanks for the reply! So if I'm understanding correctly, I could just pass in my coordinate arrays for the node_lat, node_lon, etc. keyword arguments and it'll just work? Perhaps this isn't the best place for this, but is there a way to use gridded to compute "corner" points of a staggered grid? I need this for using matplotlib pcolor. Would it be as simple as just passing in the appropriate coordinate arrays for the edges/centers when constructing a Variable object and selecting them appropriately in the call to pcolor? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it had not seen recent activity. The issue can always be reopened at a later date. |
I just learned about some new packages for geo-aware grid interpolation. |
Summarizing some offline discussion with @rabernat, @kwilcox, @hetland and @ocefpaf on efficient and accurate approaches to interpolation and the requisite identification of a point-in-a-cell in unstructured triangular, curvilinear quads, and mixed types (tris + quads):
|
Implementation in 2D here: https://github.com/NOAA-ORR-ERD/cell_tree2d (and on pip and conda-forge) Not all that well documented, but ask if you want to figure out how to use it. We are using it operationally for triangular and quad grids, via gridded. The tree itself is built on bounding boxes, so you use any shape cell, as long as you provide a way to do a final "point in cell" check. The paper does talk about it being amenable to multiprocessing, but it was not obvious to us how to do that, and we didn't try -- GPU or conventional. |
in reply to the above -- I think so -- but post a issue on the gridded repo to discuss. https://github.com/NOAA-ORR-ERD/gridded -CHB
|
I wrote some thoughts on interpolating from a quad grid to random points over on the discourse site. Many of these topics are already covered here and elsewhere, but I wanted to gather these ideas and put them in one place. Also, since it's just a discussion of algorithms, not specific code or packages, discourse seemed more appropriate. I'd be happy to move the discussion here if that's more appropriate. |
Hi. Sorry about this stupid question, I'm new to this but, is there any of the tool enumerated that can help me to divide the entire world map into regions/rectangles/circles with radius R? So that I can play with the R and give me different number of regions. In the end I would like to get the coordinates(lat, long) of this regions or the coordinates of the rectangles(top/left x bottom/right) as well as lat, long. Any help into a right direction will be really appreciated. Thx! |
On Mon, Dec 2, 2019 at 10:20 PM Chirica Gheorghe ***@***.***> wrote:
Sorry about this stupid question, I'm new to this but, is there any of the
tool enumerated that can help me to divide the entire world map into
regions/rectangles/circles with radius R?
well, the world is an ellipsoid, so you can't really divide it up into
rectangles, or circles ..
Triangles perhaps.
Or you could choose a projection, and then rectangles would be doable. But
they would suffer from whatever distortions that projection provides.
What is your use case here?
…-CHB
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov
|
Thx @ChrisBarker-NOAA. I know that will not be perfect but this is not an issue, we can adjust some overlapping, I'm not looking to be perfect, an approximation works as well. Thx. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it had not seen recent activity. The issue can always be reopened at a later date. |
Interesting discussion, thanks! |
Came across this looking for anyone that might have done things on grid merging dealing with edge effects |
I'm not sure exactly what you mean. Can you provide an example? |
Things like Grid Merge in Intrepid - but open sourced version, not mathematically rolling your own. |
As discussed in the Pangeo meeting, there are currently many different models for a grid in python. Some of these are explicit, some are implicit in how packages work. I am talking about both structured and unstructured grids.
Let's try to put together a list of what these are and how they differ. For now, I will just throw out some links:
What else should be on this list?
cc @dopplershift, @bekozi, @rsignell-usgs
The text was updated successfully, but these errors were encountered: