Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port xr_reproject from odc.algo #26

Closed
Kirill888 opened this issue Mar 29, 2022 · 2 comments · Fixed by #88
Closed

Port xr_reproject from odc.algo #26

Kirill888 opened this issue Mar 29, 2022 · 2 comments · Fixed by #88

Comments

@Kirill888
Copy link
Member

Most of the low-level utilities needed for Dask-backed reprojection are already in odc-geo. This is mostly

class GeoboxTiles:

and also:

class ReprojectInfo:

Expected interface:

xx = dc.load(.., dask_chunks= {}) # or any other supported load backend

# automatically choose resolution and bounding box, align pixel edges to 0
# automatically choose chunk size
yy = xx.odc.to_crs("epsg:3857")

# fully defined destination pixel plane
# configurable destination chunking
yy = xx.odc.reproject(GeoBox.from_bbox(..), chunks={'x': 2048, 'y': 4096})
  • Support Dask and non-Dask inputs
  • Support xarray and dask.dataarray
  • Support xarray Datasets as well as DataArray
  • Support reasonable automatic resolution, bbox and chunking determination
  • Support any number of leading dimensions as well as optional interleaved band dimension: ..., y, x[,band]
@robbibt
Copy link
Contributor

robbibt commented Sep 6, 2022

Hey @Kirill888 , a question: with a Dask-enabled .reproject method, will it be possible to apply re-projection to a dataset with multiple timesteps and take advantage of Dask to have it applied in parallel across all timesteps? At the moment, .reproject is applied in series to each timestep which makes it pretty slow for large datasets.

(that said, some option for non-Dask parallelism would also be useful - my current application uses data that originates in memory so Dask-only parallelism will be a little tricky for that)

@Kirill888
Copy link
Member Author

Kirill888 commented Sep 6, 2022

Dask version will certainly have concurrency across timestamps as well as spatial chunks. In odc-stac there is a bunch of logic for data loading that deals with partitioning output space into spatial chunks and then applying data load concurrently with or without Dask. I feel like this shares a lot with Dask enabled reproject and should probably move into odc-geo in a somewhat generalized form: it should support temporal chunking for example opendatacube/odc-stac#81

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants