-
Notifications
You must be signed in to change notification settings - Fork 13
Description
As it has been reported multiple times that the CF encoding of CRS (Coordinate Reference System) may pose issues, I propose starting a dedicated thread to gain a better understanding of these concerns.
Furthermore, I recently converted TerraSAR GeoTiff to Zarr, using rioxarray in conjunction with gdal and xarray, which resulted in the following CF grid mapping. I am somewhat surprised by the way GDAL encoded the CRS, which could be a result of the combination of gdal with xarray.
{
"GeoTransform": "10.50014820907392 0.00012070250150375875 -2.609127035245679e-05 53.85791404256921 -1.5387410410647694e-05 -7.246156895736279e-05",
"_ARRAY_DIMENSIONS": [],
"crs_wkt": "GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563,AUTHORITY[\"EPSG\",\"7030\"]],AUTHORITY[\"EPSG\",\"6326\"]],PRIMEM[\"Greenwich\",0],UNIT[\"degree\"
,0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AXIS[\"Latitude\",NORTH],AXIS[\"Longitude\",EAST],AUTHORITY[\"EPSG\",\"4326\"]]",
"geographic_crs_name": "WGS 84",
"grid_mapping_name": "latitude_longitude",
"inverse_flattening": 298.257223563,
"longitude_of_prime_meridian": 0.0,
"prime_meridian_name": "Greenwich",
"reference_ellipsoid_name": "WGS 84",
"semi_major_axis": 6378137.0,
"semi_minor_axis": 6356752.314245179,
"spatial_ref": "GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563,AUTHORITY[\"EPSG\",\"7030\"]],AUTHORITY[\"EPSG\",\"6326\"]],PRIMEM[\"Greenwich\",0],UNIT[\"degr
ee\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AXIS[\"Latitude\",NORTH],AXIS[\"Longitude\",EAST],AUTHORITY[\"EPSG\",\"4326\"]]"
}
EDIT NOTE: A key topic of this discussion concerns the need to make grid_mapping mandatory in order to exclude the possibility of having an implicit (undefined) CRS. Furthermore, grid_mapping includes several formats, and it would be wise to recommend one of them (WKT, Proj4j, ...).
Activity
dblodgett-usgs commentedon Apr 13, 2023
What are you surprised by?
Most of what's in there is as would be expected per:
https://cfconventions.org/cf-conventions/cf-conventions.html#use-of-the-crs-well-known-text-format
This page provides a fairly complete set of mappings:
https://github.com/cf-convention/cf-conventions/wiki/Mapping-from-CF-Grid-Mapping-Attributes-to-CRS-WKT-Elements
"spatial_ref" does not appear in the CF convention but has been around in the community for a while. Someone implemented it quite a few years ago and it's been adopted in a number of places.
"_ARRAY_DIMENSIONS" is presumably being added by xarray?
christophenoel commentedon Apr 13, 2023
That's not my point. I'm just trying to understand why there is a concern (discussed many times during the meeting) with this CF way of encoding CRS while GDAL/Xarray seems to generate it without problem when converting a GeoTiff to Zarr.
Why do we want to change this ?
dblodgett-usgs commentedon Apr 13, 2023
The rub is the need to implement a cross walk between the CRS-WKT / EPSG representation of CRS and the grid_mapping representation of CRS. The
crs_wkt
is optional and all the particulargrid_mapping
attributes are required.CF-NetCDF requires that you include parameters using the attribute specification from cf that duplicate what most geospatial software supports via PROJ.
I think the premise here is that we would be saying that
crs_wkt
is required and that the othergrid_mapping
attributes are encouraged to allow compatibility with software that does not supportcrs_wkt
?djhoese commentedon Aug 2, 2023
Jumping in on this issue after this spec repository was pointed out to me. I'm the creator of the very very new geoxarray package which basically tries to take all the non-GDAL specific CRS and dimension/coordinate information functionality from rioxarray but without the GDAL/rasterio dependency. It is also meant to be more generic than rioxarray.
Anyway, I wanted to chime in with some information from my understanding having working on a similar issue in geoxarray. First, my understanding is that
spatial_ref
is for GDAL compatibility. Second, rioxarray and geoxarray both use thecrs_wkt
as a shortcut/short-circuit when reading CRS information to avoid having to convert/parse all the CF grid mapping attributes that might also exist. Both libraries depend on pyproj to do the CRS/WKT -> CF grid_mapping conversion.There is also the case of projections that aren't supported by CF. In those cases Pyproj (and therefore rioxarray and geoxarray) will only produce the
crs_wkt/spatial_ref
attributes with WKT since there are no corresponding CF attributes. I believe it has been brought up on various CF discussions that it is likely a bad idea to re-invent a way of representing CRS information especially when it isn't all inclusive/supportive and the PROJ library already has a pretty good handle on it. I'm not sure what the end result of those discussions was though.dblodgett-usgs commentedon Aug 3, 2023
Great info. Thanks @djhoese -- UW Madison representing here, I love it! (I'm an '09 Civil Engineering grad)
christophenoel commentedon Aug 18, 2023
I want to take this good news of a new expert partipating to summarize the question for novices like me. Please correct me if I'm wrong.
CF conventions (v1.10) allow to define the CRS implicitly as the geographic coordinate system based on a spherical Earth (so not WGS 84 ellipsoid) when declaring the
axis
orstandard_name
property. CF conventions also allow to express CRS definitions via thegrid_mapping
attribute which refers to a separate variable.This
grid mapping
variable provides the description of the mapping via a collection of attached attributes. There are several ways to describe the CRS:lambert_conformal_conic
,mercator
, etc.) based on the associated properties.crs_wkt
also to express multiple CRS properties in WKT (used by GDAL/OGR) in the propertycrs_wkt
Note:
spatial_ref
is the property commonly used by GDAL to define the CRS in WKT. I believe therefore that GDAL ignorescrs_wkt
in grid_mapping variable.My conclusion
From my understanding, CF advocates for using its standard properties to express any type of representation (WKT, PROJ.4, etc.). Since we want to base GeoZarr on CF, I would lean towards making the
grid_mapping
attributes as mandatory (using the mapping table above for PROJ.4, WKT representation) and encouragingcrs_wkt
to facilitate compatibility with tools that only support WKT (but GDAL will still have to implement the support of crs_wkt property)christophenoel commentedon Aug 18, 2023
Another useful information reported by Roger Lott (OGC CRS SWG co-chair):
dwilson1988 commentedon Aug 18, 2023
This solves a decent part of the original concern around the original issue (christophenoel#3) just to mention it here. I just want to make sure I'm understanding a few things:
If a Zarr dataset is created in a spatial reference that the CF conventions does not explicitly support, there would be no path to create a compliant GeoZarr dataset, correct?
If starting from an EPSG code or WKT/ProjJSON representation of a CRS, there is no straightforward way to map it to CF if I'm understanding the table you provided
Just specifying the crs_wkt attribute WOULD NOT be a compliant GeoZarr dataset
The reason I bring this up specifically is we've been outputting Zarr from our geospatial system for the last few years and all of the CRS information is specified as EPSG Code, WKT, or ProjJSON. Unless there's a way to always (and reliably!) map from CRS-WKT to CF, supporting GeoZarr would be a nonstarter for us, but we'd VERY much like to adopt GeoZarr.
Any thoughts on this?
djhoese commentedon Aug 18, 2023
Speaking from a user POV and not on the "standards"/specification design side of this: I would plan/hope to use the pyproj library in all of my Python code to do conversions between PROJ.4/WKT2/EPSG/CF. Not sure that is considered "straightforward" though.
I would hope that the issue in your point 1 above could be resolved in the future by suggesting to CF to adopt crs_wkt as an alternative to the CF grid mapping attributes (either the attributes or the crs_wkt or both with priority going to crs_wkt). I don't know how open they would be to that but the fact that not every projection is supported currently would make putting the responsibility on WKT more enticing. Similarly, add the functionality to GDAL to read crs_wkt in addition to the already supported spatial_ref.
dwilson1988 commentedon Aug 18, 2023
That's really the root of the rub here (correct me if I'm wrong): I can represent any CF grid mapping as WKT but not vice versa.
dblodgett-usgs commentedon Aug 18, 2023
@christophenoel -- I don't think that this is quite accurate.
This closed issue cf-convention/cf-conventions#410 indicates that there "is no default shape for the Earth in CF (spherical or otherwise)."
dblodgett-usgs commentedon Aug 18, 2023
I would be in full support of mandating grid mapping and allowing use of only a
crs_wkt
in cases that the CF grid mapping attributes do not support the projection of the data in question.dwilson1988 commentedon Aug 18, 2023
And to underscore @christophenoel's note above, RECOMMENDING that everyone also provide crs_wkt to ensure maximum interoperability.
christophenoel commentedon Aug 18, 2023
Does it has any impact (on a user point of view) of is it only a question of definition "geographic coordinate system based on a Earth" ?
dblodgett-usgs commentedon Aug 18, 2023
The impact from a user's point of view is that when given no
grid_mapping
specifying the shape of the earth, CF is explicitly ambiguous (there is no default shape) rather than implicit with regard to the shape of the spheroid.44 remaining items
christophenoel commentedon Apr 4, 2025
Hi @benbovy .
There are several points I would like to clarify:
benbovy commentedon Apr 4, 2025
Thanks for the clarification @christophenoel
Got it, although this is a bit in contradiction with the title of this issue, the top comment including its edit note and many of the other comments that followed?
christophenoel commentedon Apr 4, 2025
You're right. I have kept only one thread from multiple issues, but think this created a mess... let's just discuss as you prefer, we'll see later how to split the outcomes :)
mdsumner commentedon Apr 8, 2025
Until now I had thought "scalar" referred to a netcdf scalar, not "single band classic 2d y,x raster" 😃 clarified now thanks!!
christophenoel commentedon Apr 8, 2025
@mdsumner : You probably write this because of the name I suggested for the draft profile "scalar-raster" (grid-based dataset where each cell contains a single numeric value representing a measured variable). However, this name is only a suggestion on my own.
A scalar coordinate variable, is defined in CF section 5.7:
ethanrd commentedon Apr 11, 2025
Hi all - I've been catching up with this conversation after just this morning opening a CF Discussion (#411) on affine transformations and CF Coordinate Subsampling. And I'm noticing that I missed some of the related discussion here over the last few weeks.
I'm not particularly well versed in affine transformations. So, first, I wanted to ask for some clarification. @benbovy, you mention that affine transformations don't depend on tie points. I was under the impression that two of the parameters for an affine transformation are the coordinates for one of the corner points. Is that not the case? Or is that what you are hinting at when you mention scalar lat/lon coordinates?
I gave a 1-D lat/lon example in the CF discussion (so an affine translation with just translation and scale) that uses tie points and calculates the scale from the tie points. I included that because it seems to fit into the current CF coordinate subsampling, though I'm still trying to fully understand the details.
I have started working on (but didn't post yet) a 2-D lat/lon full affine translation example and include one tie point. I also include the six affine parameters but the first and fourth ones match the tie point lat/lon values. I wanted to include both and see what others in the CF community think. After catching up on this issue, I'd also like to hear if my assumption that two parameters match a corner coordinate values is correct.
Thanks,
Ethan
mdsumner commentedon Apr 11, 2025
It is in GDAL, most commonly top left corner with negative dy.
But note in a world file (an 6-value affine in a sidecar text file) that offset is the centre point. There would be various implementations across different formats and GDAL converts those to its model. You can also have two GCPs for a very basic stand-in for an affine, but those (and RPCs and geoocation arrays generally) won't be effected unless the output is through the warper api (you can input GCPs to gdal_translate for example but they don't change the interpretation at that point).
mdsumner commentedon Apr 11, 2025
I thought I'd put a small illustration up, probably raises more questions than settles things but I can make a better effort with a more understandable source image sometime (it's a bit of a struggle for me to use python and rasterio but I think this shows the things I mentioned in the last response)
https://gist.github.com/mdsumner/12ee320594bbf613d4f939c523b61c43
benbovy commentedon Apr 11, 2025
Thanks @ethanrd for opening the CF discussion! It would be great if CF conventions could fully support affine transformations (and possibly other transformations used for geospatial raster data).
These last days I've been familiarizing with GeoTIFF specs, which define the
ModelTiepointTag
andModelPixelScaleTag
tags that IIUC can be used together to define a simple affine transformations with no rotation or shearing. I can indeed see how this would translate to CF Coordinate Subsampling (tie points), which I'm also slowly familiarizing with. As @mdsumner illustrates for GDAL there are multiple ways of defining simple raster coordinate transformations but I think that if CF conventions could support one of them that would already be great?For 2D full affine transformations, there are likely people following this thread who are more experienced than me on this topic and who may better (in)validate your assumption.
Besides affine transformation, GDAL's raster data model supports Ground Control Points (GCPs) given at arbitrary locations on the raster (array indices). From GDAL's documentation:
Perhaps CF Coordinate Subsampling could be reused and extended to support that as well? I.e., add polynomial and/or explicitly undefined interpolation methods to the CF conventions.
There's one possible complication, though. From my understanding of CF 8.3 it seems that the tie points must be distributed on a structured grid, whereas in this case the distribution of the GCPs is unstructured.
I think that the scalar lat/lon coordinate variables that we mentioned here above are more placeholders to store useful metadata (standard_name, axis, etc.).
When reusing CF Coordinate Subsampling, one thing I'm not sure to fully understand is whether it is possible (and if yes, how) to unambiguously reconstruct the full lat/lon coordinates with their interpolated dimensions. I.e., how does we know which coordinate corresponds to which axis of the transformation? (cf. #20 (comment) and the next few comments)
mdsumner commentedon Apr 11, 2025
When the points are structured these are called geolocation arrays, not GCPs
https://gdal.org/en/stable/development/rfc/rfc4_geolocate.html
this doesn't seem as well documented as the GCP and RPC approach is in the raster data model, but these are pretty standard as 2D or 1D arrays (if the 1D arrays in particular are broken or inaccurate these are treated as rectlinear geolocation arrays rather than being collapsed to a transform (GHRSST/MURSST a famous example, stored as Float32 which is not enough at the resolution used).
There's of course also the case when 2D "geolocation arrays" are entirely degenerate, i.e. they are unprojected unnecessarily from a regular grid and stored as data in lon,lat arrays, this should not be modelled at all and just be fixed by assigning correct crs and transform (although the warper api can of course use this case to resolve to any regular grid, and it can also write to grid defined by geolocation arrays, something I wasn't aware of until recentish-ly).
Kirill888 commentedon Jun 8, 2025
@ethanrd Affine mapping between pixel and world coordinates is fairly straighforward:
https://odc-geo.readthedocs.io/en/latest/intro-geobox.html
main concern is about precisely defining where point
0,0
lies in the image plane, most usetop left corner of the top left pixel of the image
, but some place it in thecenter of the top left pixel
, mismatching the two leads to half pixel translation that is rather hard to detectKirill888 commentedon Jun 9, 2025
Coming over from pangeo forums (thanks @benbovy). Some notes in no particular order:
CF 8.3 (coordinate compression)
linear
can encode the most common case of 4 degrees of freedom Affine (no rotation or shear, but non-square pixels are allowed,GeoTransform: Tx Sx 0 Ty 0 Sy
)read
(not that it matters for geozarr case)hidden dimensions
and 2 morehidden variables
Axis order in CRS
I think this should be ignored, i.e.
pyproj.Transformer.from_crs(..., always_xy=True)
. Certainly don't tie the order of the affine parameters to the properties of CRS, nor to the order ofY
,X
dimensions in the data variable. Coordinates already havestandard_name
, so one can already tell which dimensions of the data variable form a spatial plane, and which one of them isx
vsy
. For best interop, make sure to use[prefix_dims,]* y, x [, postfix_dims]*
coordinate order.Affine/GeoTransform parameters
To me proposed representation is confusing:
as it's using parameter order from GDAL's
GeoTransform
, but calls itaffine
. To me "Affine" is a 3x3 matrix in the form:Above will be encoded in GDAL as
GeoTransform=[c, a, b, f, d, e]
. I'd say either use GDAL order and call itgeo_transform
, or call itaffine
and use row major order of the first two rows of the affine matrix:BTW this is the same order as used by STAC projection extension.
benbovy commentedon Jun 10, 2025
This seems simpler to me as well. As far as I understand, trying to reuse existing concepts as much as possible is core part of CF's philosophy, so the suggestion in https://github.com/orgs/cf-convention/discussions/411 of reusing CF tie point coordinates makes sense. However, I also think that using two extra variables and dimensions may be a bit convoluted for representing simple affine transforms. And although it is still not clear to me how the same CF 8.3 concepts can be reused / extended for complex transforms (with rotation and/or shear), I guess it will probably look even more convoluted?
That said, CF 8.3 could have some great potential if it was possible to reuse (most of) it as a common basis for encoding all the different (compact) ways of mapping pixel to world coordinates of raster / imagery data (i.e., simple and complex affine transforms, ground control points GCPs, rational polynomial coefficients RPCs, etc.), at the expense of some more metadata complexity / verbosity.