Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set a spatial_ref value to 0 instead of EPSG code for better compatibility with rioxarray #164

Open
simonreise opened this issue Jun 20, 2024 · 3 comments

Comments

@simonreise
Copy link

Short description

I use odc-geo to reproject and match Landsat 8 QA mask raster to the resolution of the panchromatic band. All the other operations with data are performed using rioxarray.

The issue is that rioxarray sets spatial_ref value to 0 by default, but then odc-geo re-sets that value to the EPSG code of the projection, and, when I mask the original panchromatic band (that was loaded by rioxarray) with QA band (that was reprojected by odc-geo) using xarray.where, spatial_ref coordinate is being dropped.

Libraries import
import xarray
import rioxarray
import odc.geo.xr
Loading the panchromatic band with rioxarray
pan = r"path/to/file/LC09_L1TP_142020_20230825_20230825_02_T1/LC09_L1TP_142020_20230825_20230825_02_T1_B8.TIF"
with rioxarray.open_rasterio(pan, chunks=True, lock=True) as img:
    pan = img
Loading the QA raster with rioxarray
mask = r"path/to/file/LC09_L1TP_142020_20230825_20230825_02_T1/LC09_L1TP_142020_20230825_20230825_02_T1_QA_PIXEL.TIF"
with rioxarray.open_rasterio(mask, chunks=True, lock=True) as img:
    mask = img
Checking the coordinates of the panchromatic band
pan.coords

Coordinates:

  • band (band) int32 4B 1
  • x (x) float64 128kB 4.689e+05 4.689e+05 ... 7.08e+05 7.08e+05
  • y (y) float64 129kB 6.473e+06 6.473e+06 ... 6.232e+06 6.232e+06
    spatial_ref int32 4B 0
More detailed look into spatial_ref

The value is 0, and all the data is stored in the attrs.

pan.spatial_ref

array(0)
Coordinates:
spatial_ref () int32 0
Indexes: (0)
Attributes: (18)

Checking the coordinates of the QA band
mask.coords

Coordinates:

  • band (band) int32 4B 1
  • x (x) float64 64kB 4.689e+05 4.689e+05 ... 7.08e+05 7.08e+05
  • y (y) float64 64kB 6.473e+06 6.473e+06 ... 6.232e+06 6.232e+06
    spatial_ref int32 4B 0
More detailed look into spatial_ref

The value is also 0, and all the data is stored in the attrs.

mask.spatial_ref

array(0)
Coordinates:
spatial_ref () int32 0
Indexes: (0)
Attributes: (18)

Reprojecting the QA band

We reproject the QA band to match the geobox of the panchromatic band.

mask_rep = mask.odc.reproject(pan.odc.geobox)
Checking the coordinates of the reprojected QA band
mask_rep.coords

Coordinates:

  • band (band) int32 4B 1
  • y (y) float64 129kB 6.473e+06 6.473e+06 ... 6.232e+06 6.232e+06
  • x (x) float64 128kB 4.689e+05 4.689e+05 ... 7.08e+05 7.08e+05
    spatial_ref int32 4B 32646
More detailed look into spatial_ref

The value is 32646 - the EPSG code of the projection. More detailed data is still stored in the attrs.

mask_rep.spatial_ref

array(32646)
Coordinates:
spatial_ref () int32 32646
Indexes: (0)
Attributes: (18)

Masking the panchromatic band

Then we use our reprojected mask (mask_rep) to mask the panchromatic band. IRL the function is more complicated, but this minimum example still shows the problem.

pan_masked = pan.where(mask_rep == 0, 0)
Checking the coordinates

The spatial_ref is gone. It must be because the coordinates from two arrays had different values.

pan_masked.coords

Coordinates:

  • band (band) int32 4B 1
  • x (x) float64 128kB 4.689e+05 4.689e+05 ... 7.08e+05 7.08e+05
  • y (y) float64 129kB 6.473e+06 6.473e+06 ... 6.232e+06 6.232e+06

If we use two xarrays loaded with rioxarray, the problem does not appear. If we just simply reassign the coord to 0, (pan_w = pan.where(mask_rep.assign_coords(spatial_ref=0) == 0, 0)) the problem does not appear.

To sum up

Is it really necessary to have a EPSG code of a CRS as a default value for spatial_ref? Maybe just use 0 as a default value like rioxarray does, as the actual CRS data anyway is mostly stored in the attributes, not in the coordinate value itself? Or convince rioxarray devs to also use EPSG code as a default value? I think there should be a common convention for both libraries.

I can still manually assign the spatial_ref, but it looks like a really dirty fix.

Env

  • Windows 11 22631.3737
  • Conda==24.5.0
  • python==3.11.9
  • odc-geo==0.4.6
  • rioxarray==0.15.5
  • xarray==2024.6.0
  • rasterio== 1.3.9
@Kirill888
Copy link
Member

@simonreise When using xr.where with mask data computed from pixel values of the original image one should prefer to use "raw numpy arrays" and avoid all the coordinate mismatch nonsense, it's not only spatial_ref issue it can also be slight variations in coordinate locations (+/- 1e-14 difference kinda thing).

In your case just change:

- pan_masked = pan.where(mask_rep == 0, 0)
+ pan_masked = pan.where(mask_rep.data == 0, 0)

But sure, the value stored in the spatial_ref can be changed to always be 0 to match what rioxarray does, we don't really check for that, and in fact rioxarray loaded sources still support .odc.geobox. Not sure this would fix that issue though. xrarray.where() works best when you provide raw numpy/dask array of the same shape as the xarray data as a mask and not an xarray array.

When mask is an xarray array all the extra work of figuring out "valid overlap", common coords, common dims, common attributes needs to be done by xarray often causing not only slow downs but "unexpected behaviours" like the one you are reporting.

@Kirill888
Copy link
Member

@simonreise can you please check if forcing 0 into spatial_ref fixes the issue in your case or not:

return xarray.DataArray(
numpy.asarray(epsg, "int32"),
name=name,
dims=(),
attrs={"spatial_ref": crs_wkt, **cf},
)

replace epsg with 0 on line 217 referenced above

@simonreise
Copy link
Author

simonreise commented Jun 20, 2024

Both of your suggestions worked: adding .data to xarray.where or forcing 0 into spatial_ref. Thank you.
Upd. forcing 0 fixes the issue even if you do not add .data and adding .data works if you do not force 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants