Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to return dynamic sub-catalogs #4

Open
philvarner opened this issue Feb 11, 2023 · 2 comments
Open

How to return dynamic sub-catalogs #4

philvarner opened this issue Feb 11, 2023 · 2 comments

Comments

@philvarner
Copy link

Moved from:

@philvarner
Copy link
Author

Some text from stac-api-spec:

If sub-catalogs are used, it is recommended that these use the endpoint /catalogs/{catalogId} to avoid conflicting
with other endpoints from the root.

Endpoint Media Type Returns Description
/catalogs/{catalogId} application/json Catalog child Catalog object

Structuring Catalog Hierarchies

A STAC API is more useful when it presents a complete Catalog representation of all the data contained in the
API, such that all Item objects can be reached by traversing child and item link relations from
the root. Being able to reach all Items in this way is formalized in the
Browseable conformance class, but any Catalog can be structured for hierarchical traversal.
Implementers who have search as their primary use case should consider also implementing this
alternate view over the data by presenting it as a directed graph of catalogs, where the child link relations typically
form a tree, and where each catalog can be retrieved with a single request (e.g., each Catalog JSON is small enough that
it does not require pagination).

For example, child links to sub-catalogs may be structured as in this diagram:

graph LR
    A[Root] -->|child| B(sentinel-2-l2a)
    B --> |child| C(10SDG)
    B --> |child| D(10SDH)
    B --> |child| E(10SDJ)
    B --> |child| BB(...)

    C --> |child| F(2018)
    C --> |child| G(2019)
    C --> |child| CC(...)

    D --> |child| H(2018)
    D --> |child| DD(...)
    E --> |child| I(2018)
    E --> |child| EE(...)

    F --> |item| J(12.31.0)
    F --> |item| K(01.09.0)
    F --> |item| L(01.09.1)
    F --> |item| FF(...)
Loading

STAC API does not define what endpoint or endpoints should returns these catalogs, but approach would be
to return them from an endpoint like /catalogs/{catalogId}.

While OAFeat requires that all Items must be part of a Collection, this does not mean that the Collection needs to be
part of the browseable tree. If they are part of the tree, it is recommended that there only be one Collection in a
path through the tree, and that a collection never contain child collections.

These are the two standard ways of structuring a browseable tree of catalogs, the only difference being
whether the Collection is used as part of the tree or not:

  • Catalog (root) -> Catalog* -> Item (recommended)
  • Catalog (root) -> Collection -> Catalog* -> Item

All items must be part of a Collection, but the Collection itself does not need to be part of the browsable graph.

How you structure your graph of Catalogs can allow you to both group Collections together and create sub-groups
of items within a Collection.
For example, your collections may be grouped so each represent a data product. This might mean
you have a collection for each of Landsat 8 Collection 1, Landsat 8 Surface Reflectance, Sentinel-2 L1C, Sentinel-2
L2A, Sentinel-5P UV Aerosol Index, Sentinel-5P Cloud, MODIS MCD43A4, MODIS MOD11A1, and MODIS MYD11A1. You can also
present each of these as a catalog, and create parent catalogs for them that allow you to group together all Landsat, Sentinel, and MODIS catalogs.

  • / root catalog
    • child -> /catalogs/landsat
      • child -> /catalogs/landsat_7
      • child -> /catalogs/landsat_8
        • child -> /catalogs/landsat_8_c1
        • child -> /catalogs/landsat_8_sr
    • child -> /catalogs/sentinel
      • child -> /catalogs/sentinel_2
        • child -> /catalogs/sentinel_2_l1c
        • child -> /catalogs/sentinel_2_l2a
      • child -> /catalogs/sentinel_5p
        • child -> /catalogs/sentinel_5p_uvai
        • child -> /catalogs/sentinel_5p_cloud
    • child -> /catalogs/modis
      • child -> /catalogs/modis_mcd43a4
      • child -> /catalogs/modis_mod11a1
      • child -> /catalogs/modis_myd11a1

Each of these catalog endpoints could in turn be its own STAC API root, allowing an interface where users can
search over arbitrary groups of collections without needing to explicitly know and name every collection in the
search collection query parameter. These catalogs-of-catalogs can be separated multiple ways, e.g. be
per provider (e.g., Sentinel-2), per domain (e.g., cloud data), or per form of data (electro-optical, LIDAR, SAR).

Going the other direction, collections can be sub-grouped into smaller catalogs. For example, this example
groups a catalog of Landsat 8 Collection 1 items by path, row, and date (the path/row system is used by this
product for gridding).

  • / (root)
    • /catalogs/landsat_8_c1
      • /catalogs/landsat_8_c1/139
        • /catalogs/landsat_8_c1/139_045
          • /catalogs/landsat_8_c1/139_045_20170304
            • /collections/landsat_8_c1/items/LC08_L1TP_139045_20170304_20170316_01_T1
          • /catalogs/landsat_8_c1/139_045_20170305
            • /collections/landsat_8_c1/items/LC08_L1TP_139045_20170305_20170317_01_T1
        • /catalogs/landsat_8_c1/139_046
          • /catalogs/landsat_8_c1/139_046_20170304
            • /collections/landsat_8_c1/items/LC08_L1TP_139046_20170304_20170316_01_T1
          • /catalogs/landsat_8_c1/139_046_20170305
            • /collections/landsat_8_c1/items/LC08_L1TP_139046_20170305_20170317_01_T1

If done in a consistent manner, these can also provide "templated" URIs, such that a user could directly request a
specific path, row, and date simply by replacing the values in /catalogs/landsat_8_c1/{path}_{row}_{date}.

Similarly, a MODIS product using sinusoidal gridding could use paths of the form
/{horizontal_grid}/{vertical_grid}/{date}. Since only around 300 scenes produced every day for a MODIS product
and there is a 20 year history of production, these could be fit in a graph with path length 3 from the root
Catalog to each leaf Item.

  • / (root)
    • /catalogs/mcd43a4 (~7,000 child relation links, one to each date)
      • /catalogs/mcd43a4/{date} (~300 item relation links to each Item)
        • /collections/mcd43a4/items/{itemId}
        • ...

Catalogs can also group related products. For example, here we group together synthetic aperture radar (SAR) products
(Sentinel-1 and AfriSAR) and electro-optical (EO) bottom of atmosphere (BOA) products.

  • / root catalog
    • child -> /catalogs/sar
      • child -> /catalogs/sentinel_1_l2a
      • child -> /catalogs/afrisar
    • child -> /catalogs/eo_boa
      • child -> /catalogs/landsat_8_sr
      • child -> /catalogs/sentinel_2_l2a

The catalogs structure is a directed graph that allows
you to provide numerous different Catalog and Collection graphs to reach leaf Items. For example, for a Landsat 8 data
product, you may want to allow browsing both by date then path then row, or by path then row then date:

  1. Catalog -> Catalog (product) -> Catalog (date) -> Catalog (path) -> Catalog (row)
  2. Catalog -> Catalog (product) -> Catalog (path) -> Catalog (row) -> Catalog (date)

When more than path to an Item is allowed, it is recommended that the final item link relation reference a
consistent, canonical URL for each item, instead of a URL that is specific to the path of Catalog that was followed
to reach it.

There are many options for how to structure these catalog graphs, so it will take some analysis work to figure out
which one or ones best match the structure of your data and the needs of your consumers.

@chiarch84
Copy link

Dear @philvarner I read in detail what you propose but I do not have clear why you propose the following paths

  • Catalog (root) -> Catalog* -> Item (recommended)
  • Catalog (root) -> Collection -> Catalog* -> Item

Rather than:

  • Catalog (root) -> Catalog* -> Collection -> Item (recommended)
  • Catalog (root) -> Collection -> Item

From my point of view the following tree structure should work well. Do you think it has something that clashes with the specs?

  • /
    • child -> /catalogs/sentinel
      • child -> /catalogs/sentinel_2
        • child -> /collections/sentinel_2_l1c
          • items -> /collections/sentinel_2_l1c/items
        • child -> /collections/sentinel_2_l2a
          • items -> /collections/sentinel_2_l2a/items
      • child -> /catalogs/sentinel_1
        • child -> /collections/sentinel_1_l1a
          - items -> /collections/sentinel_1_l1a/items
        • child -> /collections/sentinel_1_l1c
          • items -> /collections/sentinel_1_l1c/items

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants