-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
open_rasterio fails when NetCDF subdataset contains more than three dimensions #174
Comments
This is currently not supported, but not against rioxarray adding support for it. The reason it currently does not exist is that it adds a bit of complexity when you have more than 3 dimensions: https://gdal.org/drivers/raster/netcdf.html#dimension
Until support is added, I recommend opening your file with |
I think this would be simplified if rasterio added support for multidimensional arrays: https://gdal.org/drivers/raster/netcdf.html#multidimensional-api-support, however support for this hasn't been added yet and it doesn't appear to be on the roadmap. |
Thanks for your informative replies. I'll think about next steps - I'd forgotten the nuances of GDAL in only supporting a subset of netCDF functionality.
The background here is that I want to use rioxarray's clip() functionality on this and other netCDF files. But perhaps rioxarray is not the best way to achieve this. |
There is some good news and some bad news here.
|
Looks like your x and y coordinates are in some kind of projected coordinate system: >>> xds.X10_153
<xarray.DataArray 'X10_153' (X10_153: 144)>
array([-760., -750., -740., -730., -720., -710., -700., -690., -680., -670.,
-660., -650., -640., -630., -620., -610., -600., -590., -580., -570.,
-560., -550., -540., -530., -520., -510., -500., -490., -480., -470.,
-460., -450., -440., -430., -420., -410., -400., -390., -380., -370.,
-360., -350., -340., -330., -320., -310., -300., -290., -280., -270.,
-260., -250., -240., -230., -220., -210., -200., -190., -180., -170.,
-160., -150., -140., -130., -120., -110., -100., -90., -80., -70.,
-60., -50., -40., -30., -20., -10., 0., 10., 20., 30.,
40., 50., 60., 70., 80., 90., 100., 110., 120., 130.,
140., 150., 160., 170., 180., 190., 200., 210., 220., 230.,
240., 250., 260., 270., 280., 290., 300., 310., 320., 330.,
340., 350., 360., 370., 380., 390., 400., 410., 420., 430.,
440., 450., 460., 470., 480., 490., 500., 510., 520., 530.,
540., 550., 560., 570., 580., 590., 600., 610., 620., 630.,
640., 650., 660., 670.], dtype=float32)
Coordinates:
* X10_153 (X10_153) float32 -760.0 -750.0 -740.0 -730.0 ... 650.0 660.0 670.0
Attributes:
units: km
long_name: x
point_spacing: even
axis: X In this scenario, if you know what the CRS is of the projected data, you could use |
Excellent. After some more digging, I've also seen your Stack Exchange answer which shows the use of write_crs in more detail. I can see about adding a page to the rasterio documentation which makes it clear that existing xarray Datasets can have a CRS added post-hoc. |
The model outputs that I've linked to in this issue do indeed have some peculiar properties, amongst which are insufficient attributes for most software to understand their georeferencing; units in kms; and X and Y names set based on clipping of the model domain in post-processing. So it's probably unreasonable to expect such files to ever be read in automatically by rioxarray. |
Related: rasterio/rasterio#1759 |
I think this is related, but in this case these "4D arrays" are degenerate in z and t, they are really 2D grids with a time stamp. There's a different error for the remote file vs local: ## remote
import rioxarray
dsn = "NETCDF:/vsicurl/https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/202202/oisst-avhrr-v02r01.20220218.nc:sst"
rioxarray.open_rasterio(dsn)
KeyError: 'NETCDF_DIM_time_VALUES'
## local
import rioxarray
dsn = "NETCDF:oisst-avhrr-v02r01.20220218.nc:sst"
rioxarray.open_rasterio(dsn)
ValueError: coordinate zlev has dimensions ('zlev',), but these are not a subset of the DataArray dimensions ('time', 'y', 'x') I'm certainly interested to have this case work in rioxarray and will explore it. Would it be acceptable to allow these? I've been down this path before (in commercial and open software) where the GDAL band metadata is used to re-store the dimensionality, but it's a mess and I don't think it's worth it given the xarray facilities and GDAL multidim that exist now. I'm not interested in the multidimensional possibilities, I just want that obvious case where sequential bands or sequential files are lined up as "time", or perhaps "z". (If rioxarray won't work with them I'll probably make a copy as COGs and use that). To show that GDAL sees these degenerate dims: dataset = "/vsicurl/https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/202202/oisst-avhrr-v02r01.20220218.nc"
from osgeo import gdal
gdal.Open(dataset).GetMetadata("SUBDATASETS")
gdal.Open(dataset).GetMetadata("SUBDATASETS")["SUBDATASET_1_DESC"]
'[1x1x720x1440] sst (16-bit integer)' rasterio works with them fine from rasterio.windows import Window
import rasterio
dsn = "NETCDF:oisst-avhrr-v02r01.20220218.nc:sst"
ds = rasterio.open(dsn)
ds.read(window = Window(0, 0, 2, 5))
dsn = "NETCDF:/vsicurl/https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/202202/oisst-avhrr-v02r01.20220218.nc:sst"
ds = rasterio.open(dsn)
ds.read(window = Window(0, 0, 2, 5)) |
sure enough, so at line 1234 of rioxarray/rioxarray/_io.py I see (1, 720, 1440)
dict_keys(['time', 'zlev', 'x', 'y'])
<xarray.IndexVariable 'zlev' (zlev: 1)>
array([0])
<xarray.IndexVariable 'time' (time: 1)>
array([16119])``` so, I think in this case it could unproblematically drop the zlev, but I can't see yet how to identify that it's the dim being left out. Will keep pursuing, I'm almost entirely unexperienced with python debugging but happy with progress so far 🙏 |
candidate fix here, at least for my issue: https://github.com/mdsumner/rioxarray/tree/fix-degen-dims it works for the file locally, but there's still that error for the remote dsn which I'll continue with. if reasonable I'll PR it - I think the linting is right, it will take me a while longer to get tests going. Thanks! note that since GDAL 3.7 we can add this config to get the crs assumed under reasonable situations: import rioxarray
import rasterio
dsn = "NETCDF:oisst-avhrr-v02r01.20220218.nc:sst"
with rasterio.Env(GDAL_NETCDF_ASSUME_LONGLAT="YES"):
x = rioxarray.open_rasterio(dsn)
x.spatial_ref
<xarray.DataArray 'spatial_ref' ()>
array(0)
Coordinates:
spatial_ref int64 0
Attributes:
crs_wkt: GEOGCS["WGS 84 (CRS84)",DATUM["WGS_1984",SP...
... @atedstone is it possible to get your file listed above still? I'd like to explore |
@mdsumner The file I linked to above is no longer available, but this one is broadly equivalent:
|
This specific use case should be simple to support as it removed the complexity mentioned here: #174 (comment) However, it would need to throw errors if the extra dimensions have a size greater than 1 until the reading capability is updated to support that. |
I also encountered this issue. Since I have a very small dataset with 1D coordinates (but 4 dimensions), this simple reproducible example might help. import rioxarray
import xarray as xr
file_nc = "cmems_so_2022-11-02.nc"
file_nc_reduced = file_nc.replace(".nc","_reduced.nc")
# create a dataset with the depth dimension reduced (only time/latitude/longitude remain)
ds = xr.open_dataset(file_nc)
ds_reduced = ds.max(dim="depth")
ds_reduced.to_netcdf(file_nc_reduced)
# this reduced dataset works
raster_red = rioxarray.open_rasterio(file_nc_reduced)
# the original dataset raises "ValueError: conflicting sizes for dimension 'time': length 50 on the data but length 1 on coordinate 'time'"
raster_full = rioxarray.open_rasterio(file_nc) What might at least be useful is to improve the ValueError, since to me it was not clear that there was one dimension too much. It looked more like a mismatch in dimension orders. After trying transposing and swap_dims, I got the idea to reduce the depth dimension (with max, which is arbitrary in this case). The
The
Example dataset (extension changed to enable uploading): |
Code Sample
Example file: ftp://ftp.climato.be/fettweis/MARv3.11/Greenland/ERA_1958-2019-10km/daily_10km/MARv3.11-20km-daily-ERA-10km-2016.nc
Problem description
open_rasterio
fails to load subdatasets (DataArrays?) from NetCDFs with more than 1 additional dimension to X, Y. Expressed alternatively, the maximum number of dimensions is 3 (e.g. TIME, y, x). This failure halts all further loading of the NetCDF file, in effect rendering the NetCDF unopenable byopen_rasterio
.Expected Output
With pure xarray:
Environment Information
Installation method
conda
Conda environment information (if you installed with conda):
Environment (
conda list
):Details about
conda
and system (conda info
):The text was updated successfully, but these errors were encountered: