-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Writing GDAL ZARR _CRS attribute not possible #6448
Comments
I think the core problem here is that Zarr itself supports arbitrary json data structures as attributes, but netCDF does not. The Zarr serialization in Xarray is designed to emulate netCDF, but we could make that optional, for example, with a flag to bypass attribute encoding / decoding and just pass the python data directly through to Zarr. However, my concern would be that netCDF4 C library would not be able to read those files (nczarr). What happens if you try to open up a GDAL-created Zarr with netCDF4? FWIW, the new GeoZarr Spec by @christophenoel does not use the GDAL convention for CRS. Instead, it recommends to use CF conventions for encoding CRS. This is more compatible with NetCDF, but won't be parsed correctly by GDAL. I am a little discouraged that we have not managed to align better across projects so far (e.g. having this conversation before the GDAL Zarr CRS convention was implemented). 😞 For example, either of these two GDAL PRs:
However, it is not too late! Let's try to reach for a standard way of encoding CRS in Zarr that can be used across languages and implementations of Zarr. My own preference would be to try to get GDAL to support the GeoZarr Spec and thus the CF-convention CRS attribute, rather than trying to get Xarray to be able to write the GDAL CRS convention. |
Very interesting topic. I assume that GDAL proposed something as Zarr specification has no provision for spatial reference system encoding. From my point of view, by making it close to NetCDF (which is so widely used in Earth Observation missions) and providing supporting librarires, xArray made the success of Zarr in the EO world. Indeed, my dream would be GDAL align to the GeoZarr spec which is mostly aligned with NetCDF/xArray. |
@christophenoel - I share your perspective. But there is a huge swath of the geospatial world who basically hate NetCDF and avoid it like the plague. These communities prefer to use geotiff and GDAL. We need to reach for interoperability. |
Some people will prefer Cloud-Optimised GeoTiff, others (geo)Zarr. By the way, I use rasterio (based on GDAL), xarray, GDAL and all tools without problems with CF conventions. The impact of GDAL using CRS attribute are only:
|
One of the main motivations behind the the rioxarray extension is GDAL compatibility. It looks like @snowman2 and @TomAugspurger have discussed saving many geotiffs loaded into xarray as GDAL-compatible Zarr for example corteva/rioxarray#433 (comment). While it seems that the ultimate solution is agreeing on a format standard, here is another small example using the rioxarray extension where format conversion doesn't currently work as you might expect: # https://github.com/pydata/xarray-data
ds = xr.open_dataset('xarray-data/air_temperature.nc',
engine='rasterio')
# TooManyDimensions: Only 2D and 3D data arrays supported.
ds.rio.to_raster('test.zarr', driver='ZARR')
# Does not error, but output not equivalent to `gdal_translate -of ZARR xarray-data/air_temperature.nc gdal_air_temp.zarr`
# for example, `gdalinfo xarray-tutorial-airtemp.zarr` gives
# Warning 1: Too many samples along the > 2D dimensions of /air.
ds.to_zarr('xarray-tutorial-airtemp.zarr') |
GDAL does support the CF conventions for storing the CRS. https://gdal.org/drivers/raster/netcdf.html#georeference "The driver first tries to follow the CF-1 Convention from UNIDATA looking for the Metadata named “grid_mapping”." |
This document is also a useful reference for storing CRS in xarray: https://corteva.github.io/rioxarray/stable/getting_started/crs_management.html |
Top reasons for using CF over a CRS attribute:
|
so just to make it clear... GDAL doesn't actually need the CRS stored with the |
GDAL may need to update the ZARR driver. The NetCDF driver does support CF. It would be good for them to be consistent. |
I am guilty of sidetracking this issue into the politics of CRS encoding. That discussion is important. But in the meantime, @wankoelias's original issue reveals is narrower technical issue with Xarray's Zarr writer: Xarray won't let you serialize a dictionary attribute to zarr, even though zarr has no problem with this. That is a problem we can fix pretty easily. The Lines 133 to 135 in 586992e
We could refactor this function to be more flexible to account for zarr's broader range of allowed attribute types (as we have evidently already done for h5netcdf). Or we could just bypass it completely in the @wankoelias - you seem to understand the issue pretty well. Would you be game for making a PR? We would be glad to support you along the way. |
just seeing this conversation. I wasn't aware of GeoZarr when I implemented the _CRS attribute in the GDAL Zarr driver. There is a practical difficulty in reusing the CF reading & writing code from the GDAL netCDF driver as it is tied to that driver, but I'd guess with sufficient effort it could be made agnostic of the carrier. No opposition from me if someone wants to tackle this. |
What is your issue?
Related to #6374
Writing a ZARR which is compatible with GDAL conventions using
xarray.Dataset.to_zarr
requires all the data variables to have a_CRS
attribute which contains the Spatial Reference System encoding (SRS).This
_CRS
attribute itself is adict
in which the SRS is encoded in at least one of these keys:wkt
,url
,projjson
Because attribute values can't be dictionaries during serialization, it does not seem possible to write GDAL compatible zarrs using xarray.
Example:
lets assume we have a Dataset
ds
like this:lets also assume we want to encode the
_CRS
aswkt
like so:(encoding the _CRS in any of the other 2 formats results in the same problem at the end)
Setting the attributes of each data variable:
no problem so far,
ds.Band1.attrs
results in:the problem now occurs with writing the dataset using:
The text was updated successfully, but these errors were encountered: