Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with cropped xarray data causing xarray's .where() to behave differently #115

Closed
robbibt opened this issue Dec 18, 2023 · 5 comments
Closed
Labels
bug Something isn't working

Comments

@robbibt
Copy link
Contributor

robbibt commented Dec 18, 2023

While testing #114, I've been running into a really confusing issue that seems to cause xarray's .where to change the output size of my arrays.

Usually, a.where(b) simply masks out values from a that are False in b, replacing them with NaN. This normally doesn't have any effect on the shape of a (assuming the default drop=False) - you get back an array of the orginal shape after applying the mask.

For example, I might want to rasterize a polygon, then apply this to my original data as a mask. This works fine - my masked data comes out the same size as my original data:

import odc.geo.xr
from odc.geo.geobox import GeoBox
from odc.geo import geom

# Create xarray
geobox = GeoBox.from_bbox((-10, -2, 5, 4), "EPSG:4326", resolution=0.2)
xx = odc.geo.xr.xr_zeros(geobox)
print(f"Original shape is: {xx.odc.geobox.shape}")

# Create polygon overlapping xarray
poly = geom.Geometry(
    {"type": "Polygon", "coordinates": (((-5, 3), (0, 2), (-2.5, -1)),)}, "EPSG:4326"
)

# Rasterize polygon
rasterized = odc.geo.xr.rasterize(
    poly=poly.to_crs(xx.odc.geobox.crs), how=xx.odc.geobox
)
print(f"Rasterized shape is: {rasterized.odc.geobox.shape}")

# Apply mask
xx_masked = xx.where(rasterized)
print(f"Masked shape is: {xx_masked.odc.geobox.shape}")

# Verify that original data and cropped data have same geobox
assert xx.odc.geobox == xx_masked.odc.geobox

image

Problem

However, if I modify the shape of my array using .isel() or similar, .where() starts to behave differently. This example is the same, except I select a subset of my array using .isel() before rasterizing my polygon into its new geobox.

The rasterization works fine - my rasterized polygon and clipped array have the same geobox and shape. However, when I attempt to combine them with xx_cropped.where(rasterized), I get out an array with a different geobox, including different resolution and shape:

import odc.geo.xr
from odc.geo.geobox import GeoBox
from odc.geo import geom

# Create xarray
geobox = GeoBox.from_bbox((-10, -2, 5, 4), "EPSG:4326", resolution=0.2)
xx = odc.geo.xr.xr_zeros(geobox)
print(f"Original shape is: {xx.odc.geobox.shape}")

# Create polygon overlapping xarray
poly = geom.Geometry(
    {"type": "Polygon", "coordinates": (((-5, 3), (0, 2), (-2.5, -1)),)}, "EPSG:4326"
)

# Select subset
xx_cropped = xx.isel(latitude=slice(10, 20), longitude=slice(20, 50))
print(f"Cropped shape is: {xx_cropped.odc.geobox.shape}")

# Rasterize polygon
rasterized = odc.geo.xr.rasterize(
    poly=poly.to_crs(xx_cropped.odc.geobox.crs), how=xx_cropped.odc.geobox
)
print(f"Rasterized shape is: {rasterized.odc.geobox.shape}")

# Verify that cropped and rasterized data have the same geobox
assert rasterized.odc.geobox == xx_cropped.odc.geobox

# Apply mask
xx_masked = xx_cropped.where(rasterized)
print(f"Masked shape is: {xx_masked.odc.geobox.shape}")

# Verify that masked data and cropped data have same geobox
assert xx_cropped.odc.geobox == xx_masked.odc.geobox

image

Am I doing something silly here? Or is there some interaction going on between xarray and odc-geo that is causing .where to behave differently on data that has previously subsetted with .isel()?

The confusing thing is that xx_cropped and rasterized seem to have completely identical pixel grids, i.e. same geobox and even the same spatial dim coordinates...

@robbibt robbibt added the bug Something isn't working label Dec 18, 2023
@robbibt
Copy link
Contributor Author

robbibt commented Dec 18, 2023

Comparison of the arrays... somehow xx_cropped and rasterized combine to make xx_masked...
image

@Kirill888
Copy link
Member

Kirill888 commented Dec 18, 2023

that would be due to floating point precision most likely.

In theory we should have an invariant like this

xr_coords(gbox)['y'][3:7].data == xr_coord(gbox[3:7, :])['y'].data

but this might not be the case due to numeric precision.

@Kirill888
Copy link
Member

changing

- xx_masked = xx_cropped.where(rasterized)
+ xx_masked = xx_cropped.where(rasterized.data)

gives expected result as it ignores tiny differences in lon/lat:

rasterized.latitude.data - xx_cropped.latitude.data
>>> array([ 0.00000000e+00,  2.22044605e-16,  2.22044605e-16,  0.00000000e+00,
        0.00000000e+00, -2.22044605e-16,  0.00000000e+00,  0.00000000e+00,
       -2.22044605e-16,  0.00000000e+00])

@Kirill888
Copy link
Member

Kirill888 commented Dec 18, 2023

in general, when using xarray.where() use plain numpy array that matches data dimensions to be sure to bypass coordinate matching logic.

@robbibt
Copy link
Contributor Author

robbibt commented Jan 4, 2024

Thanks Kirill!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants