-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
incorrect xarray size when downsampling? #94
Comments
print(f'expected size: {y // factor} x {x // 2}') That Yes extra padding like that can happen, if you need exact pixel grid you can specify that with On an unrelated note: chunk size of 256 pixels is way too small, I'd suggest you start with 2048, especially when working with high res data, and only go down from that in situations when you experience issues with RAM or not enough concurrency. |
Writing up pixel grid construction rules is a long-standing documentation task that doesn't get done... Best I have is this here: https://odc-geo.readthedocs.io/en/latest/intro-geobox.html Usually the translation part of linear mapping is rounded to pixel size in projection units, most of the time such that |
Sorry @Kirill888 for the faulty example and thanks for still taking the time to answer. Here's a correct one: import pystac_client
import planetary_computer
catalog = pystac_client.Client.open(
"https://planetarycomputer.microsoft.com/api/stac/v1",
modifier=planetary_computer.sign_inplace,
)
time_range = "2020-12-01/2020-12-31"
bbox = [-122.2751, 47.5469, -121.9613, 47.7458]
search = catalog.search(collections=["sentinel-2-l2a"], bbox=bbox, datetime=time_range)
items = search.get_all_items()
out = odc_stac.load(
items,
bands=['B02'],
chunks={'x': 256, 'y': 256}
)
x = out.sizes['x']
y = out.sizes['y']
res = items[0].to_dict()['assets']['B02']['gsd']
for factor in [4, 5]:
print('factor:', factor)
out = odc_stac.load(
items,
bands=['B02'],
resolution=res * factor,
chunks={'x': 256, 'y': 256}
)
print(f'expected size: {y // factor} x {x // factor}')
print(f'actual size: {out.sizes["y"]} x {out.sizes["x"]}')
print('-' * 30)
I am not using the Planetary Computer in my own work but am observing the same thing. Sometimes both dimensions get a padding of an extra pixel, sometimes only one. Interestingly, I first stumbled across this when downsampling to 60 m (which fits into the MGRS tile so the bounding box does not change) where the y-dimension got a padding. I cannot observe this here though. Thanks for the hint with the chunk size. However, in the work that I am currently doing the chunk extends to about 40 time steps and with 256x256 I have observed the best performance w.r.t. compute time and memory consumption on a SLURM cluster. |
@johntruckenbrodt ok, similar story here. S2 has 10m pixel aligned such that One have to remember that we are not loading one single image, but a bunch, there is no guarantee that all source pixel grids across different images perfectly overlap, so there is no "true native grid" in a general case. So we pick a grid that snaps pixels such that |
But that's the thing, we are NOT choosing extent based on min/max of pixel coordinates, we ALSO perform pixel snapping so that all outputs for the same resolution have compatible grids regardless where one starts at. Choosing load area is like this:
Example of 2 60m pixels becoming 3:
|
This is just a reasonable default that works well when combining different data sources, if you want to precisely control output grid you have an option of specifying exact pixel grid and image size using |
xarray coordinates are for the center of the pixel, that's what all of xarray ecosystem assumes, so you need to subtract half a pixel width from min to get left edge and add half a pixel to the max to get right edge. With that in mind 10m is exactly as original, for 60m case we get res = 60
y1, y2 = 9590190.0, 9699990.0
y1-res/2, y2 + res/2
(9590160, 9700020) so 9590160 instead of original 9590200, we extend top edge (inverted axis) by 40m (2/3 of pixel) because 9590200 is not an integer multiple of pixel size and we want it to be, to make sure that all 60m images we load result in perfectly overlapping grids.
If you need to faithfully load existing raster without harmonizing across possibly many different images you can use rioxarray or rasterio, or just specify exact pixel grid to odc-stac via GeoBox, defaults in Example of two images that have aligned 10m pixels (
|
@johntruckenbrodt closing this as this is expected behavior, but see #95, in particular try |
Thanks a billion @Kirill888. I have finally understood the point and setting I was wrong in my conception of how the Sentinel-2 MGRS grid is structured. I thought the overlap between tiles is always 9780 m. If that would be the case, then we wouldn't have this conversation. Every coordinate would be a multiple of 10, 20 and 60. However, the Sentinel-2 grid needs to make a compromise between the actual MGRS grid and the pixel sizes of the individual bands (which are 10, 20 and 60 m).
|
Hi there. Thanks a lot for this awesome tool!
I am working with some Sentinel-2 data which I am downsampling for testing purposes.
I wonder whether the following behaviour is expected...
Let's get some STAC items (with a reproducible example from here):
...get the size of one band at full resolution:
..and then load the same band with different resolutions:
This gives me:
The text was updated successfully, but these errors were encountered: