Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GeoZarr to Support 2D RGB and Multi-Dimensional EO Data #51

Open
christophenoel opened this issue Oct 9, 2024 · 2 comments
Open

GeoZarr to Support 2D RGB and Multi-Dimensional EO Data #51

christophenoel opened this issue Oct 9, 2024 · 2 comments

Comments

@christophenoel
Copy link

Context

Consider an Earth Observation (EO) scene raster made available as a COG or via a Map Tiling Service:

  • COG Encoding: COGs typically encode 2D raster data with a single band or RGB, using image layers for each band (Note: There is no standard way to interpret multi-band GeoTIFFs beyond RGB). The COG URL provides access, and the scene footprint can subset the scene from a larger COG coverage.
  • Map Tiling Services: Similarly, Map Tiling Services allow straightforward scene extraction from a URL by using layer endpoints and bbox subsetting.

Concretly, OpenLayers can display easily a RGB thumbnail/visualisation on a map.

Considerations for GeoZarr

  1. RGB Display in GeoZarr:
  • GeoZarr should enable encoding 2D rasters in an RGB format in a standardized way. The idea is that coordinates (latitude, longitude, and bands like RGB) are explicitly named to facilitate their interpretation. This way, a client wouldn’t need to deeply parse all metadata to understand these dimensions. Additionally, it could be useful to indicate explicitly that this data is of type 2D RGB in a main attribute so that users can easily identify it.
  1. Multi-Dimensional Data Support (3D/4D+):
    • GeoZarr already allows encoding 3D and 4D+ rasters, with dimensions like time, wavelength, and altitude. There should be a syntax convention (that can be used within GeoZarr or in any metadata format such as STAC) to express a GeoZarr subset without requiring to parse all metadata (e.g. [time=1],[altitude=2])
  • For 3D time series, there should be a convention for the time dimension, and probably a type or requirement class advertised for such GeoZarr.
  • For other 3D+ data, there should be a convention for a client application to express a visualisation/preview by identying the subset for R, G, B.
  1. Variable Identification for Multi-Layer Data:
    • Similar to formats like NetCDF, GeoZarr may contain multiple data variables at the same level. A standard method for identifying these variables within GeoZarr is needed, especially in STAC items, to specify how to retrieve and display the data effectively.

This is only very basic initial thought, but I think that the OGC GeoDataCube SWG may be working on similar challenges.

@rbavery
Copy link

rbavery commented Oct 21, 2024

Glad an equivalence between GeoZarr and STAC Catalogs is being proposed. It's common for STAC Collections to contain STAC Items with different CRSs. Do those working on the spec think this is within scope? Right now I only see examples online for resamping every raster to a common CRS before saving to Zarr, like https://earthmover.io/blog/serverless-datacube-pipeline/.

this section of the spec makes me think maybe this is within scope?

If multiple Array Variables share heterogeneous dimensions or coordinates, a primary homogeneous set of variables MUST be located at root level, and the other sets declared in children datasets.
https://github.com/zarr-developers/geozarr-spec/blob/main/geozarr-spec.md#geozarr-dataset

I would love to be able to turn STAC Collections or Catalogs into GeoZarrs with two geospatial index levels, one common CRS for all rasters that indexes each raster by it's extent on the common CRS. and another index level that is particular to each raster group that shares a common CRS.

My use case is inference on rasters and georeferencing the results. I might want to load a GeoZarr, filter rasters that intersect an area of interest using the top level CRS, run model inference on individual rasters and then use each rasters's individual CRS for georeferencing the model results. I'd like to avoid reprojecting all the raster pixels to a global projection throughout this process since it is an expensive operation and compromises on equal area.

I hope there is a way to handle the above with GeoZarr, while also getting the benefits of Zarrv3 sharding for performant data loading and cloud storage.

@christophenoel
Copy link
Author

It's common for STAC Collections to contain STAC Items with different CRSs.
I expect the same with GeoZarr when simply converting a "product" (scene) to that format. Resampling would be applied to obtain aggregation (datacube), analysis-ready data, or to generate Level 3+ data.

Your idea seems very interesting but quite challenging. As a first step, I would simply aim to expose and access some typical product types in STAC, taking into account bands and extra dimensions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants