Search code, repositories, users, issues, pull requests...

opened

on Apr 13, 2023

· edited by christophenoel

As it has been reported multiple times that the CF encoding of CRS (Coordinate Reference System) may pose issues, I propose starting a dedicated thread to gain a better understanding of these concerns.

Furthermore, I recently converted TerraSAR GeoTiff to Zarr, using rioxarray in conjunction with gdal and xarray, which resulted in the following CF grid mapping. I am somewhat surprised by the way GDAL encoded the CRS, which could be a result of the combination of gdal with xarray.

{
    "GeoTransform": "10.50014820907392 0.00012070250150375875 -2.609127035245679e-05 53.85791404256921 -1.5387410410647694e-05 -7.246156895736279e-05",
    "_ARRAY_DIMENSIONS": [],
    "crs_wkt": "GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563,AUTHORITY[\"EPSG\",\"7030\"]],AUTHORITY[\"EPSG\",\"6326\"]],PRIMEM[\"Greenwich\",0],UNIT[\"degree\"
,0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AXIS[\"Latitude\",NORTH],AXIS[\"Longitude\",EAST],AUTHORITY[\"EPSG\",\"4326\"]]",
    "geographic_crs_name": "WGS 84",
    "grid_mapping_name": "latitude_longitude",
    "inverse_flattening": 298.257223563,
    "longitude_of_prime_meridian": 0.0,
    "prime_meridian_name": "Greenwich",
    "reference_ellipsoid_name": "WGS 84",
    "semi_major_axis": 6378137.0,
    "semi_minor_axis": 6356752.314245179,
    "spatial_ref": "GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563,AUTHORITY[\"EPSG\",\"7030\"]],AUTHORITY[\"EPSG\",\"6326\"]],PRIMEM[\"Greenwich\",0],UNIT[\"degr
ee\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AXIS[\"Latitude\",NORTH],AXIS[\"Longitude\",EAST],AUTHORITY[\"EPSG\",\"4326\"]]"
}

EDIT NOTE: A key topic of this discussion concerns the need to make grid_mapping mandatory in order to exclude the possibility of having an implicit (undefined) CRS. Furthermore, grid_mapping includes several formats, and it would be wise to recommend one of them (WKT, Proj4j, ...).

What are you surprised by?

Most of what's in there is as would be expected per:
https://cfconventions.org/cf-conventions/cf-conventions.html#use-of-the-crs-well-known-text-format

This page provides a fairly complete set of mappings:
https://github.com/cf-convention/cf-conventions/wiki/Mapping-from-CF-Grid-Mapping-Attributes-to-CRS-WKT-Elements

"spatial_ref" does not appear in the CF convention but has been around in the community for a while. Someone implemented it quite a few years ago and it's been adopted in a number of places.

"_ARRAY_DIMENSIONS" is presumably being added by xarray?

Author

That's not my point. I'm just trying to understand why there is a concern (discussed many times during the meeting) with this CF way of encoding CRS while GDAL/Xarray seems to generate it without problem when converting a GeoTiff to Zarr.

Why do we want to change this ?

The rub is the need to implement a cross walk between the CRS-WKT / EPSG representation of CRS and the grid_mapping representation of CRS. The crs_wkt is optional and all the particular grid_mapping attributes are required.

CF-NetCDF requires that you include parameters using the attribute specification from cf that duplicate what most geospatial software supports via PROJ.

I think the premise here is that we would be saying that crs_wkt is required and that the other grid_mapping attributes are encouraged to allow compatibility with software that does not support crs_wkt?

djhoese

Jumping in on this issue after this spec repository was pointed out to me. I'm the creator of the very very new geoxarray package which basically tries to take all the non-GDAL specific CRS and dimension/coordinate information functionality from rioxarray but without the GDAL/rasterio dependency. It is also meant to be more generic than rioxarray.

Anyway, I wanted to chime in with some information from my understanding having working on a similar issue in geoxarray. First, my understanding is that spatial_ref is for GDAL compatibility. Second, rioxarray and geoxarray both use the crs_wkt as a shortcut/short-circuit when reading CRS information to avoid having to convert/parse all the CF grid mapping attributes that might also exist. Both libraries depend on pyproj to do the CRS/WKT -> CF grid_mapping conversion.

There is also the case of projections that aren't supported by CF. In those cases Pyproj (and therefore rioxarray and geoxarray) will only produce the crs_wkt/spatial_ref attributes with WKT since there are no corresponding CF attributes. I believe it has been brought up on various CF discussions that it is likely a bad idea to re-invent a way of representing CRS information especially when it isn't all inclusive/supportive and the PROJ library already has a pretty good handle on it. I'm not sure what the end result of those discussions was though.

Great info. Thanks @djhoese -- UW Madison representing here, I love it! (I'm an '09 Civil Engineering grad)

Author

I want to take this good news of a new expert partipating to summarize the question for novices like me. Please correct me if I'm wrong.

CF conventions (v1.10) allow to define the CRS implicitly as the geographic coordinate system based on a spherical Earth (so not WGS 84 ellipsoid) when declaring the axis or standard_name property. CF conventions also allow to express CRS definitions via the grid_mapping attribute which refers to a separate variable.

This grid mapping variable provides the description of the mapping via a collection of attached attributes. There are several ways to describe the CRS:

The attribute grid_mapping_name allows to set a standard name (listed in CF annex F) which map to a known CRS (e.g. lambert_conformal_conic, mercator, etc.) based on the associated properties.
The attribute crs_wkt also to express multiple CRS properties in WKT (used by GDAL/OGR) in the property crs_wkt
Since v1.7, the Grid Mapping attributes have been extended to support various other representations (such as OGC WKT, EPSG and PROJ.4). The mappings are provided on Mapping-from-CF-Grid-Mapping-Attributes-to-CRS-WKT-Elements

Note: spatial_ref is the property commonly used by GDAL to define the CRS in WKT. I believe therefore that GDAL ignores crs_wkt in grid_mapping variable.

My conclusion

From my understanding, CF advocates for using its standard properties to express any type of representation (WKT, PROJ.4, etc.). Since we want to base GeoZarr on CF, I would lean towards making the grid_mapping attributes as mandatory (using the mapping table above for PROJ.4, WKT representation) and encouraging crs_wkt to facilitate compatibility with tools that only support WKT (but GDAL will still have to implement the support of crs_wkt property)

Author

Another useful information reported by Roger Lott (OGC CRS SWG co-chair):

Please be aware that the OGC CRS SWG has been developing an exchange format for gridded geodetic data (GGXF). At the outset of this project we assumed that the netCDF-CF conventions would be used as a starting point for GGXF. We quickly found that they were not suitable for the rigorous geodetic requirements for CRS definitions, and have developed a separate set of GGXF Conventions.

On superficial look it does not seem that these address anything in your github issue, but if you are interested in the GGXF conventions they are in Annex B of OGC 22-051r2. r3 of this doc will be posted shortly but the changes are unlikely to be significant to you (mainly editorial following the OGC public comment review). A not-quite-complete draft of r3 is in the GGXF Github issue #59 (use the 2023-07-24 version towards the end of the issue chain).

A couple of other comments that may be more relevant to the thread:
• CF netCDF conventions requires the CRS-WKT to adhere to what is sometimes called "WKT2". (See CF conventions 5.6.1 for "The crs_wkt attribute should comprise a text string that conforms to the WKT syntax as specified in reference [OGC_WKT-CRS]"). The CF conventions quote adherence to 12-063r3 (=WKT2 v1) but that has been superseded by 18-010r11 (=WKT2 v2). The difference in the two versions of WKT2 is an upgrade for dynamic CRSs (coordinates that change with time). In the the snippet that you give in the Github issue, neither crs_wkt nor spatial_ref is WKT2 of any variety. They use one of the many flavours of WKT1 (that supported by Proj/GDAL). You can tell whether the string is WKT1 or WKT2 by examination of the first keyword. If it ends in CS (e.g. GEOGCS) it is WKT1, if it ends in CRS (e.g. GEOGCRS) it is WKT2. The snippet therefore is not compliant with the CF conventions.
• GGXF, like netCDF-CF, uses WKT2 for describing CRSs and transformations.
• There has been criticism of WKT2 by some. My understanding is that this is due to the need to escape certain characters, in particular the degree symbol (°) and the double quote character ("), when WKT2 strings are embedded in other languages. It is my conjecture, but perhaps this may be the concern that the Github issue is trying to capture.

dwilson1988

My conclusion

From my understanding, CF advocates for using its standard properties to express any type of representation (WKT, PROJ.4, etc.). Since we want to base GeoZarr on CF, I would lean towards making the grid_mapping attributes as mandatory (using the mapping table above for PROJ.4, WKT representation) and encouraging crs_wkt to facilitate compatibility with tools that only support WKT (but GDAL will still have to implement the support of crs_wkt property)

This solves a decent part of the original concern around the original issue (christophenoel#3) just to mention it here. I just want to make sure I'm understanding a few things:

If a Zarr dataset is created in a spatial reference that the CF conventions does not explicitly support, there would be no path to create a compliant GeoZarr dataset, correct?
If starting from an EPSG code or WKT/ProjJSON representation of a CRS, there is no straightforward way to map it to CF if I'm understanding the table you provided
Just specifying the crs_wkt attribute WOULD NOT be a compliant GeoZarr dataset

The reason I bring this up specifically is we've been outputting Zarr from our geospatial system for the last few years and all of the CRS information is specified as EPSG Code, WKT, or ProjJSON. Unless there's a way to always (and reliably!) map from CRS-WKT to CF, supporting GeoZarr would be a nonstarter for us, but we'd VERY much like to adopt GeoZarr.

Any thoughts on this?

djhoese

Speaking from a user POV and not on the "standards"/specification design side of this: I would plan/hope to use the pyproj library in all of my Python code to do conversions between PROJ.4/WKT2/EPSG/CF. Not sure that is considered "straightforward" though.

I would hope that the issue in your point 1 above could be resolved in the future by suggesting to CF to adopt crs_wkt as an alternative to the CF grid mapping attributes (either the attributes or the crs_wkt or both with priority going to crs_wkt). I don't know how open they would be to that but the fact that not every projection is supported currently would make putting the responsibility on WKT more enticing. Similarly, add the functionality to GDAL to read crs_wkt in addition to the already supported spatial_ref.

dwilson1988

That's really the root of the rub here (correct me if I'm wrong): I can represent any CF grid mapping as WKT but not vice versa.

@christophenoel -- I don't think that this is quite accurate.

CF conventions (v1.10) allow to define the CRS implicitly as the geographic coordinate system based on a spherical Earth

This closed issue cf-convention/cf-conventions#410 indicates that there "is no default shape for the Earth in CF (spherical or otherwise)."

I would be in full support of mandating grid mapping and allowing use of only a crs_wkt in cases that the CF grid mapping attributes do not support the projection of the data in question.

dwilson1988

And to underscore @christophenoel's note above, RECOMMENDING that everyone also provide crs_wkt to ensure maximum interoperability.

Author

@christophenoel -- I don't think that this is quite accurate.

CF conventions (v1.10) allow to define the CRS implicitly as the geographic coordinate system based on a spherical Earth

This closed issue cf-convention/cf-conventions#410 indicates that there "is no default shape for the Earth in CF (spherical or otherwise)."

Does it has any impact (on a user point of view) of is it only a question of definition "geographic coordinate system based on a Earth" ?

The impact from a user's point of view is that when given no grid_mapping specifying the shape of the earth, CF is explicitly ambiguous (there is no default shape) rather than implicit with regard to the shape of the spheroid.

44 remaining items

Author

Hi @benbovy .

There are several points I would like to clarify:

The GeoZarr data model aims (in its first version) to build on the CDM/CF model, extended with useful constructs from GDAL, STAC, etc..
However, the GeoZarr data model should be probably permissive, meaning no CF conventions, including grid_mapping, are mandatory.
The thread here is not about CRS, but intended to provide an CF friendly equivalent representation to GDAL GeoTransform, i.e., coordinates that are not arrays of values but map to an affine transformation, which seems not supported in CF.
If I am not mistaken, CF does not support ProjJSON. It would indeed be useful to address two additional topics (perhaps in separate tickets): the possibility of extending CF with ProjJSON, or extending GeoZarr with ProjJSON.

Thanks for the clarification @christophenoel

The thread here is not about CRS, but intended to provide an CF friendly equivalent representation to GDAL GeoTransform

Got it, although this is a bit in contradiction with the title of this issue, the top comment including its edit note and many of the other comments that followed?

Author

You're right. I have kept only one thread from multiple issues, but think this created a mess... let's just discuss as you prefer, we'll see later how to split the outcomes :)

maxrjones

mentioned this

on Apr 7, 2025

Support flexible index usage with Zarr and Xarray NASA-IMPACT/veda-odd#125

mentioned this

on Apr 8, 2025

plan for CNG 2025 xarray-contrib/xproj#22

Not sure I fully understand the comment, to be honest.

To stay on track about the topic, it feels useful to look at the different proposed solutions properly and compare them over the next month, so we can make a solid assessment.

Until now I had thought "scalar" referred to a netcdf scalar, not "single band classic 2d y,x raster" 😃 clarified now thanks!!

Author

@mdsumner : You probably write this because of the name I suggested for the draft profile "scalar-raster" (grid-based dataset where each cell contains a single numeric value representing a measured variable). However, this name is only a suggestion on my own.

A scalar coordinate variable, is defined in CF section 5.7:

When a variable has an associated coordinate which is single-valued, that coordinate may be represented as a scalar variable (i.e. a data variable which has no netCDF dimensions) ...
The use of scalar coordinate variables is a convenience feature which avoids adding size one dimensions to variables

ethanrd

Hi all - I've been catching up with this conversation after just this morning opening a CF Discussion (#411) on affine transformations and CF Coordinate Subsampling. And I'm noticing that I missed some of the related discussion here over the last few weeks.

I'm not particularly well versed in affine transformations. So, first, I wanted to ask for some clarification. @benbovy, you mention that affine transformations don't depend on tie points. I was under the impression that two of the parameters for an affine transformation are the coordinates for one of the corner points. Is that not the case? Or is that what you are hinting at when you mention scalar lat/lon coordinates?

I gave a 1-D lat/lon example in the CF discussion (so an affine translation with just translation and scale) that uses tie points and calculates the scale from the tie points. I included that because it seems to fit into the current CF coordinate subsampling, though I'm still trying to fully understand the details.

I have started working on (but didn't post yet) a 2-D lat/lon full affine translation example and include one tie point. I also include the six affine parameters but the first and fourth ones match the tie point lat/lon values. I wanted to include both and see what others in the CF community think. After catching up on this issue, I'd also like to hear if my assumption that two parameters match a corner coordinate values is correct.
Thanks,
Ethan

It is in GDAL, most commonly top left corner with negative dy.

But note in a world file (an 6-value affine in a sidecar text file) that offset is the centre point. There would be various implementations across different formats and GDAL converts those to its model. You can also have two GCPs for a very basic stand-in for an affine, but those (and RPCs and geoocation arrays generally) won't be effected unless the output is through the warper api (you can input GCPs to gdal_translate for example but they don't change the interpretation at that point).

https://gist.github.com/mdsumner/12ee320594bbf613d4f939c523b61c43

I thought I'd put a small illustration up, probably raises more questions than settles things but I can make a better effort with a more understandable source image sometime (it's a bit of a struggle for me to use python and rasterio but I think this shows the things I mentioned in the last response)

Thanks @ethanrd for opening the CF discussion! It would be great if CF conventions could fully support affine transformations (and possibly other transformations used for geospatial raster data).

These last days I've been familiarizing with GeoTIFF specs, which define the ModelTiepointTag and ModelPixelScaleTag tags that IIUC can be used together to define a simple affine transformations with no rotation or shearing. I can indeed see how this would translate to CF Coordinate Subsampling (tie points), which I'm also slowly familiarizing with. As @mdsumner illustrates for GDAL there are multiple ways of defining simple raster coordinate transformations but I think that if CF conventions could support one of them that would already be great?

For 2D full affine transformations, there are likely people following this thread who are more experienced than me on this topic and who may better (in)validate your assumption.

Besides affine transformation, GDAL's raster data model supports Ground Control Points (GCPs) given at arbitrary locations on the raster (array indices). From GDAL's documentation:

The GDAL data model does not imply a transformation mechanism that must be generated from the GCPs ... this is left to the application. However 1st to 5th order polynomials are common.

Perhaps CF Coordinate Subsampling could be reused and extended to support that as well? I.e., add polynomial and/or explicitly undefined interpolation methods to the CF conventions.

There's one possible complication, though. From my understanding of CF 8.3 it seems that the tie points must be distributed on a structured grid, whereas in this case the distribution of the GCPs is unstructured.

I was under the impression that two of the parameters for an affine transformation are the coordinates for one of the corner points. Is that not the case? Or is that what you are hinting at when you mention scalar lat/lon coordinates?

I think that the scalar lat/lon coordinate variables that we mentioned here above are more placeholders to store useful metadata (standard_name, axis, etc.).

When reusing CF Coordinate Subsampling, one thing I'm not sure to fully understand is whether it is possible (and if yes, how) to unambiguously reconstruct the full lat/lon coordinates with their interpolated dimensions. I.e., how does we know which coordinate corresponds to which axis of the transformation? (cf. #20 (comment) and the next few comments)

https://gdal.org/en/stable/development/rfc/rfc4_geolocate.html

When the points are structured these are called geolocation arrays, not GCPs

this doesn't seem as well documented as the GCP and RPC approach is in the raster data model, but these are pretty standard as 2D or 1D arrays (if the 1D arrays in particular are broken or inaccurate these are treated as rectlinear geolocation arrays rather than being collapsed to a transform (GHRSST/MURSST a famous example, stored as Float32 which is not enough at the resolution used).

There's of course also the case when 2D "geolocation arrays" are entirely degenerate, i.e. they are unprojected unnecessarily from a regular grid and stored as data in lon,lat arrays, this should not be modelled at all and just be fixed by assigning correct crs and transform (although the warper api can of course use this case to resolve to any regular grid, and it can also write to grid defined by geolocation arrays, something I wasn't aware of until recentish-ly).

Kirill888

@ethanrd Affine mapping between pixel and world coordinates is fairly straighforward:

https://odc-geo.readthedocs.io/en/latest/intro-geobox.html

main concern is about precisely defining where point 0,0 lies in the image plane, most use top left corner of the top left pixel of the image, but some place it in the center of the top left pixel, mismatching the two leads to half pixel translation that is rather hard to detect

Kirill888

Coming over from pangeo forums (thanks @benbovy). Some notes in no particular order:

CF 8.3 (coordinate compression)

linear can encode the most common case of 4 degrees of freedom Affine (no rotation or shear, but non-square pixels are allowed, GeoTransform: Tx Sx 0 Ty 0 Sy)
How well is this supported in the current hdf/netcdf software stack, for reading and for writing? (i.e. is there reference implementation to test spec understanding against)
It's domain specific "compression" for coordinate arrays, and not part of the data model, so might be hard to access those parameters, I assume libs just expand those on read (not that it matters for geozarr case)
No clear indication if compression is lossy or exact
Requires non-trivial conversion back to Affine matrix even in 4 DoF case, possibly leading to floating point precision issues.
Requires 2 more hidden dimensions and 2 more hidden variables
Overall I'd prefer explicitly storing 6 affine parameters on a special variable referenced from the data variable, or possibly from coordinate variables or both

Axis order in CRS

I think this should be ignored, i.e. pyproj.Transformer.from_crs(..., always_xy=True). Certainly don't tie the order of the affine parameters to the properties of CRS, nor to the order of Y, X dimensions in the data variable. Coordinates already have standard_name, so one can already tell which dimensions of the data variable form a spatial plane, and which one of them is x vs y. For best interop, make sure to use [prefix_dims,]* y, x [, postfix_dims]* coordinate order.

Affine/GeoTransform parameters

To me proposed representation is confusing:

coordinate_transformation:affine_transform= "500000 30 0 0 0 -30" ;

as it's using parameter order from GDAL's GeoTransform, but calls it affine. To me "Affine" is a 3x3 matrix in the form:

$$ \begin{equation} \begin{pmatrix}x_w \\ y_w \\ 1\end{pmatrix} = \begin{pmatrix} a & b & c \\ d & e & f \\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix}x_p \\ y_p \\ 1\end{pmatrix} \end{equation} $$

Above will be encoded in GDAL as GeoTransform=[c, a, b, f, d, e]. I'd say either use GDAL order and call it geo_transform, or call it affine and use row major order of the first two rows of the affine matrix:

coordinate_transformation:affine_transform= "30 0 500000 0 -30 0" ;

BTW this is the same order as used by STAC projection extension.