Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to return dynamic sub-catalogs #329

Closed
chiarch84 opened this issue Sep 12, 2022 · 20 comments
Closed

How to return dynamic sub-catalogs #329

chiarch84 opened this issue Sep 12, 2022 · 20 comments
Labels

Comments

@chiarch84
Copy link

I was reading this https://github.com/radiantearth/stac-spec/blob/master/best-practices.md#dynamic-catalog-layout link concerning Best practices for dynamic cataloging and I wanted to follow the suggestions of creating different ways of browsing the catalog through sub-catalogs, but I didn't understand how this was supposed to be implemented.
Where am I supposed to return the sub-catalogs???
Should I return sub-catalogs just as if they were collections? What API methos should I use to return collections belonging to a sub-catalog?

image

Here I reported an image with the possible methods to be called in STAC APIs but I don't find anything concerning sub-catalogs.
In our case for example we have the root catalog containing 1 sub-catalog for each INSPIRE Theme but in this list I don't find a way to return all the collections available in a specific subcatalog.

I would look for something like
/{subcatalog_id}/collections
But I dont' see it.

Could you please give me a hint on this?

@geospatial-jeff
Copy link

Should I return sub-catalogs just as if they were collections? What API methos should I use to return collections belonging to a sub-catalog?

Sub catalogs would be exposed as /collections/{catalog_id}, and the /children endpoint from the children conformance class is used to list sub catalogs. They are exposed in the same endpoint because a collection is a type of catalog.

but I didn't understand how this was supposed to be implemented.

From my perspective there are two ways to implement sub-catalogs in a STAC API:

  1. An API may implement the children conformance class, allowing users to expose sub-catalogs as materialized views into other collections (ex. one sub-catalog for each landsat path/row). I don't believe there are any STAC APIs which implement this conformance class yet, but this will likely become the canonical way of implementing sub-catalogs once APIs do add support for this.
  2. Collections and sub-catalogs are accessed from the same endpoint (/collections/{catalog_id}) so users can represent a sub-catalog using a collection (keeping in mind that a collection is a catalog). This isn't the most efficient approach because it may require storing data on disk multiple times but if you really want to implement sub-catalogs right now this is one way to do it until children conformance class has better adoption.

@chiarch84
Copy link
Author

chiarch84 commented Sep 16, 2022

Thanks for your answer @geospatial-jeff
I though still have some doubts. Where do I find the specifications for the "children" method of the conformance class?

Let's create a simple example just to be more sure of what I should realize.
I have the following catalog and I want to return it through the APIs (so no static catalog):
Root
--- sub-catalog A
------- Collection A1
-------------- Item A1.1
-------------- Item A1.2
-------------- Item ...
------- Collection A2
--- sub-catalog B
------- Collection B1
------- Collection B2

What should I return as links of the landing page? Is the following correct?

"links": [
      {
        "rel": "self",
        "href": "{baseurl}/catalog",
        "type": "application/json"
      },
      {
        "rel": "root",
        "href": "{baseurl}/catalog",
        "type": "application/json"
      },
      {
        "rel": "child",
        "href": "{baseurl}/A.catalog",
        "type": "application/json"
      },
      {
        "rel": "child",
        "href": "{baseurl}/B.catalog",
        "type": "application/json"
      }
    ]

And then what should I return for the links of sub-catalog A?

"links": [
      {
        "rel": "self",
        "href": "{baseurl}/A.catalog",
        "type": "application/json"
      },
      {
        "rel": "parent",
        "href": "{baseurl}/catalog",
        "type": "application/json"
      },,
      {
        "rel": "root",
        "href": "{baseurl}/catalog",
        "type": "application/json"
      },
      {
        "rel": "collections",
        "href": "{baseurl}/A.catalog/collections",
        "type": "application/json"
      }
    ]

And then what about the link of each single collection of sub-catalog A? Would it be {baseurl}/A.catalog/collections/{collectionid} and its items would be returned by {baseurl}/A.catalog/collections/{collectionid}/items and each item by {baseurl}/A.catalog/collections/{collectionid}/items/{itemid} ?

Thanks for helping me.

@chiarch84
Copy link
Author

Can anybody help me on this topic? I really don't know where I can find a concrete example of children conformance class.

@geospatial-jeff
Copy link

The children conformance class is here (https://github.com/radiantearth/stac-api-spec/tree/main/children), it has an example landing page and /childrens response. That example I don't think is very applicable to the example you present above because your catalog structure has collections nested within catalogs.

The landing page is just a catalog that links to resources beneath it, so I would recommend that each of the sub catalogs in your example structure is a landing page where the root landing page uses the children conformance class to link to each of those landing pages. The Root would look like:

"links": [
    {
        "rel": "self",
        "href": "{baseurl}",
        "type": "application/json"
    },
    {
        "rel": "root",
        "href": "{baseurl}",
        "type": "application/json"
    },
    {
        "rel": "children",
        "href": "{baseurl}/children",
        "type": "application/json"
    },
    {
        "rel": "child",
        "href": "{baseurl}/catalogs/A.catalog",
        "type": "application/json"
    },
    {
        "rel": "child",
        "href": "{baseurl}/catalogs/B.catalog",
        "type": "application/json"
    }
]

The children link returned by the root would look like:

{
    "children": [
        # Your two catalogs here.
        {
            "id": "A.catalog",
            "links": [
                {
                    "rel": "root",
                    "type": "application/json",
                    "href": "{baseurl}"
                },
                {
                    "rel": "parent",
                    "type": "application/json",
                    "href": "{baseurl}"
                },
                {
                    "rel": "self",
                    "type": "application/json",
                    "href": "{baseurl}/catalogs/A.catalog"
                }
            ],
        },
        {
            "id": "B.catalog",
            "links": [
                {
                    "rel": "root",
                    "type": "application/json",
                    "href": "{baseurl}"
                },
                {
                    "rel": "parent",
                    "type": "application/json",
                    "href": "{baseurl}"
                },
                {
                    "rel": "self",
                    "type": "application/json",
                    "href": "{baseurl}/catalogs/B.catalog"
                }
            ],
        }
    ]
}

The link to A.catalog would then be its own landing page (keep in mind landing page is just a catalog) that links to the collections beneath it. It's links may look something like this:

"links": [
    {
        "rel": "parent",
        "href": "{baseurl}",
        "type": "application/json"
    },,
    {
        "rel": "root",
        "href": "{baseurl}",
        "type": "application/json"
    },
    {
        "rel": "self",
        "href": "{baseurl}/catalogs/A.catalog",
        "type": "application/json"
    },
    {
        "rel": "collections",
        "href": "{baseurl}/catalogs/A.catalog/collections",
        "type": "application/json"
    },
    # This would search just the items within this catalog (A1.1, A1.2 etc.)
    {
        "rel": "search",
        "href": "{baseurl}/catalogs/A.catalog/search",
        "type": "application/json"
    }
  ]

This isn't too clear in the API spec right now, but I think its important to remember that catalogs and landing pages are the same thing, so whenever you run into a catalog underneath the root you should treat it like a landing page that links to the resources under it - collections through /collections, other catalogs through /children.

@chiarch84
Copy link
Author

chiarch84 commented Sep 28, 2022

Thanks @geospatial-jeff for taking the time to propose some implementation for my case.
From what you write I understand that my APIs will have to implement the following methods in order to fully implement my catalog made of sub-catalogs:

  • /catalog/{catalogid}
  • /catalog/{catalogid}/collections
  • /catalog/{catalogid}/collections/{collectionid}
  • /catalog/{catalogid}/collections/{collectionid}/items
  • /catalog/{catalogid}/collections/{collectionid}/items/{itemid}
  • /catalog/{catalogid}/search
  • /catalog/{catalogid}/children

In this situation it would be like if I had 3 different STAC API endpoints:
{baseurl}/catalogs/root.catalog
{baseurl}/catalogs/A.catalog
{baseurl}/catalogs/B.catalog

And I would have the possibility, with 1 single implementation, to get either all the collections (using root endpoint) or only the collections in 1 of the subcatalogs. The same would happen for the /search (and for all other methods) that, depending on the endpoint where it is called, it would search in all items or only in the ones from subcatalog A or B.

Do you think this implementation would be adherent to the STAC API standard (even if in my calls I've got /catalog/{catalogid} in front)?
I think that this way it would be possible to create whatever catalog tree structure, as previewed by STAC standard. So I would also be able to have a subcatalog containing at the same time collections and sub-subcatalogs.

Pinging also @m-mohr to understand if the STAC Browser would be able to read such APIs and correctly render the content with catalog and relative sub-catalogs (take as example exactly what presented by @geospatial-jeff in the above example).

@m-mohr
Copy link
Collaborator

m-mohr commented Sep 28, 2022

Coming back to your original question: If you want to have a browsable structure, just make it as if it is static. You can use child links and then have sub-catalogs etc. I don't necessarily see a need for the children conformance class. All main collections can then still be exposed via the /collections endpoint.

The structure proposed above looks overly complex to me and I'm not so sure on the reasoning for it. It doesn't necessarily look right to me. Something that could be discussed in one of our Monday STAC meetings much better, I think.

With regard to STAC Browser: The children conformance class is not supported yet. Everything else should work if the correct relation types are used (e.g. "collections" is not a valid relation type, that should be "data", I think).

@chiarch84
Copy link
Author

chiarch84 commented Sep 28, 2022

Thanks @m-mohr for answering, I would be more than happy to participate in one of these Monday STAC meetings.

From my point of view the catalog proposed is very easy (probably because I just know it very well), and much more browsable and viewable, than having all the collections at root level (about 400). In our own case for example we subdivide collections by INSPIRE theme and give much more readability to the catalog. Until now we did this through a STATIC catalog, but since now our items start to become more and more, we prefer to switch to a DB (Elastic) + API endpoint and connect the STAC Browser directly to the API endpoint.

Here a quick view of our root catalog as it is now (it is static). As you can see it has many subcatalogs in order to logically explore all collections easily. I think it is more user friendly than having all collections at root level. I was just trying to replicate the exact same thing through APIs.
image

Or maybe I'm getting something wrong on what I should do with the APIs.

@m-mohr
Copy link
Collaborator

m-mohr commented Sep 28, 2022

Ah, okay. This sounds like it could be a use-case for the browsable conformance class. People do request this kind of grouping more often it feels. So I think this is really something we could discuss in a Monday meeting, it is 17:00 CEST and the next one is Oct, 10. @matthewhanson manages the invites. You can e-mail him or try it via the STAC Gitter: https://gitter.im/SpatioTemporal-Asset-Catalog/Lobby

@m-mohr
Copy link
Collaborator

m-mohr commented Sep 28, 2022

Anyway, the structure above would work with STAC Browser if you implement it. It needs a bit of care to correctly set the root links (you already found #239) etc, but it is then just a mixture of "static" and API and STAC Browser could handle it, I think. It doesn't necesarily need the children conformance or browsable conformance classes.

@geospatial-jeff
Copy link

@m-mohr I'm not sure how an API would implement children/browsable without this nested structure; but maybe I'm misunderstanding. Children is implemented on a catalog (starting with the landing page), and itself returns catalogs. Which means by definition that any of the original children catalogs can have children themselves. So you may have a path through the catalog/graph that looks something like root catalog (landing page) -> child catalog -> grandchild catalog.

If grandchild decides to implement the /search endpoint, where searches are scoped to all items contained by the grandchild catalog, I think we'd need the full hierarchy in the HTTP path for the API to know what to search on (ex. /catalogs/{childId}/catalogs/{grandchildId/search).

What trips me up is catalogs aren't necessarily unique. Imagine the landsat path/row example where path catalogs are the children and row catalogs are the grandchildren. Without knowing the full catalog hierarchy an API wouldn't be able to distinguish between 30/40, 31/40, 32/40 etc.

EDIT: Now that I think about it more, an API could probably expose all of the sub-catalogs at the root level (ex. /catalogs/{grandchildId}/search) but the API would have to walk backwards through the catalog hierarchy on each request to figure out which catalogs to search.

@m-mohr
Copy link
Collaborator

m-mohr commented Sep 28, 2022

I think we are looking from different perspecitives here. Anyway, there are so many details that it's likely easier to discuss in a STAC meeting. And I don't really have the time right now to write up all my thoughts in this issue. ;-)

@chiarch84
Copy link
Author

In case it is meant also for me, please inform me in case I can participate to this STAC meeting.

@m-mohr
Copy link
Collaborator

m-mohr commented Sep 28, 2022

Yes, you can @chiarch84 :

So I think this is really something we could discuss in a Monday meeting, it is 17:00 CEST and the next one is Oct, 10. @matthewhanson manages the invites. You can e-mail him or try it via the STAC Gitter: https://gitter.im/SpatioTemporal-Asset-Catalog/Lobby

@philvarner
Copy link
Collaborator

Created a placeholder for this discussion here:

@m-mohr
Copy link
Collaborator

m-mohr commented Feb 17, 2023

The proposal above to return Catalogs in /collections/{collection_id} seems against the API spec. The endpoint requires fields that are not part of Catalogs. Also, #397 would and should disallow it explicitly to avoid issues in clients.

@chiarch84 I don't read here that catalogs should be added to /collections though. It just mentions /collections/{collection_id} as far as I see.

@chiarch84
Copy link
Author

Point 2 of this comment.

@m-mohr
Copy link
Collaborator

m-mohr commented Feb 17, 2023

(keeping in mind that a collection is a catalog)

This is not correct. The addition of the type field in STAC 1.0.0 release-candidate made it so that a Collection is NOT a catalog.
Also, Point 2 doesn't seem to speak about /collections. I think this needs a broader discussion and might be hard to discuss via GH issues.

@chiarch84
Copy link
Author

good, let me just know when it's the next meeting and I'll be happy to participate.

@m-mohr
Copy link
Collaborator

m-mohr commented Feb 17, 2023

Feb 27, 17:00 CET at https://meet.google.com/bfn-dssc-mjd
It happens every two weeks.

@chiarch84
Copy link
Author

The 27th I don't think I'll make it, but maybe @ilion could.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants