Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support STAC specification #221

Closed
tomkralidis opened this issue Sep 2, 2019 · 30 comments
Closed

support STAC specification #221

tomkralidis opened this issue Sep 2, 2019 · 30 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed STAC SpatioTemporal Asset Catalog stale Issue marked stale by stale-bot
Milestone

Comments

@tomkralidis
Copy link
Member

Implement the STAC API specification to support search/discovery of geospatial assets. Notes for implementation based on initial discussions with @matthewhanson :

  • add front end routes (/stac/search)
  • update pygeoapi.api to address request handling
  • implement STAC provider backend via plugin mechanism which will interrogate backend and return results as a Python dictionary for marshalling to JSON proper to the client
  • add stubs for transactional capability
  • populating a STAC backend could be via workflow beyond pygeoapi (i.e. implement a CLI within the backend, which is not hooked into pygeoapi tooling proper but can be run offline just the same)
@tomkralidis tomkralidis added the enhancement New feature or request label Sep 2, 2019
@tomkralidis tomkralidis self-assigned this Sep 2, 2019
@pvgenuchten
Copy link
Contributor

I'm curious to see how stac, ogcapi-coverage and ogcapi-records operate together on a single endpoint, what aspects they can share and where are the challenges, would be good input for the upcoming sprint

@francbartoli
Copy link
Contributor

Implement the STAC API specification to support search/discovery of geospatial assets. Notes for implementation based on initial discussions with @matthewhanson :

  • add front end routes (/stac/search)
  • update pygeoapi.api to address request handling
  • implement STAC provider backend via plugin mechanism which will interrogate backend and return results as a Python dictionary for marshalling to JSON proper to the client

Are you considering external libraries for marshalling? Or we need to implement our own?

  • add stubs for transactional capability
  • populating a STAC backend could be via workflow beyond pygeoapi (i.e. implement a CLI within the backend, which is not hooked into pygeoapi tooling proper but can be run offline just the same)

@tomkralidis
Copy link
Member Author

Are you considering external libraries for marshalling? Or we need to implement our own?

Good point. I'm guessing a STAC backend could provided via one of the sat-utils tools (for example) and a STAC backend's mission would be to provide Python dict's of JSON objects back to pygeoapi.api, but this remains to be seen/needs to be further tested.

@francbartoli
Copy link
Contributor

Are you considering external libraries for marshalling? Or we need to implement our own?

Good point. I'm guessing a STAC backend could provided via one of the sat-utils tools (for example) and a STAC backend's mission would be to provide Python dict's of JSON objects back to pygeoapi.api, but this remains to be seen/needs to be further tested.

Ok that makes sense, thanks @tomkralidis.

@francbartoli
Copy link
Contributor

@tomkralidis Should stac/search endpoint be optional in the configuration? I would say yes...

@tomkralidis
Copy link
Member Author

Would this depend on how we describe in configuration? Like, is STAC a dataset in config? Other options?

@pvgenuchten
Copy link
Contributor

Any guidance from the stac team? Is stac intended to run along side ogc api’s in a single ogc-api endpoint, or does it require it’s own endpoint, in that case maybe deploy a second instance of pygeoapi in a ‘stac’ modus?

@francbartoli
Copy link
Contributor

@pvgenuchten I would consider its own endpoint as suggested in the bullet above from @tomkralidis (cc @matthewhanson)

@francbartoli
Copy link
Contributor

Configuration could be something like:

catalogs:
    sat-api:
        provider:
            name: STAC
            data: https://sat-api-dev.developmentseed.org/stac

@lossyrob
Copy link

lossyrob commented Nov 6, 2019

@francbartoli is the thought that STAC catalog providers would be their own provider architecture (i.e. separate from dataproviders), or that STAC would be a quality of existing data providers? If an elasticsearch backend, for instance, was loaded with STAC Items (perhaps marked in the dataset configuration), then some STAC-specific capabilities could be enabled.

To comment on the above comment:

Is stac intended to run along side ogc api’s in a single ogc-api endpoint, or does it require it’s own endpoint, in that case maybe deploy a second instance of pygeoapi in a ‘stac’ modus?

My understanding (which is a bit weaker, since I mostly work with static STACs) is that STAC API contains some additional endpoints:

  • /stac - Simply gets the root catalog.
  • /stac/search - Implemented so that STAC can do more advanced queries via extensions than what OAF currently supports

The idea would be that eventually, with the convergence of the Query/Filter extensions into OAF, the second endpoint would go away.

@matthewhanson could provide more info as I'm basically summarizing what I heard from him yesterday at the STAC sprint.

@matthewhanson
Copy link

There is currently a PR up to change those endpoints: radiantearth/stac-spec#632

The /stac endpoint would go away because it's redundant with the root endpoint / - it just returns a STAC catalog, which is the same thing that the root OAF endpoint returns with some additional fields.

/stac/search endpoint is proposed to be be renamed to /items and proposed to OAF as a general cross-collection search endpoint. However, this wouldn't go in until OAF 1.1.

@francbartoli
Copy link
Contributor

Thanks @matthewhanson, so in the meantime, we could adopt /items but for users that might be a bit confusing to understand if it is not part yet of the OAF spec. And we don't know when it will land there

@francbartoli francbartoli self-assigned this Nov 6, 2019
@matthewhanson
Copy link

Right, not sure when it will land, but now it's agreed it's going to be /search not /items

@francbartoli
Copy link
Contributor

@francbartoli is the thought that STAC catalog providers would be their own provider architecture (i.e. separate from dataproviders), or that STAC would be a quality of existing data providers? If an elasticsearch backend, for instance, was loaded with STAC Items (perhaps marked in the dataset configuration), then some STAC-specific capabilities could be enabled.

@lossyrob do you mean something like this below (looking at earth-search)?

datasets:
    cbers4-awfi:
        title: CBERS 4 AWFI Imagery
        description: CBERS 4 AWFI Imagery
        keywords:
            - stac
            - stac-api
            - assets
        links:
                -   type: application/json
                    rel: collection
                    title: information
                    href: https://earth-search.aws.element84.com/collections/cbers4-awfi
                    hreflang: en-US
        extents:
            spatial:
                bbox: [-180,-90,180,90]
                crs: http://www.opengis.net/def/crs/OGC/1.3/CRS84
            temporal:
                begin: null
                end: null  # or empty (either means open ended)
        provider:
            name: STAC
            data: # borrow data architecture from OGR provider
                source_type: ES
                source: ES:http://localhost:9200/cbers4-awfi

we are then implicitly saying that cbers4-awfi is a collection but at some point losing the knowledge of being a specific stac one. I mean at least from an OAPIF perspective.

On the other hand, we could have a dedicated architecture like:

catalogues:
    hello-catalogue:
        type: OAPIC (CAT4)???

    sat-api:
        type: STAC
        provider:
            name: STAC
            datasets:
                cbers4-awfi:
                    title: CBERS 4 AWFI Imagery
                    description: CBERS 4 AWFI Imagery
                    keywords:
                        - stac
                        - stac-api
                        - assets
                    links:
                        -   type: application/json
                            rel: collection
                            title: information
                            href: https://earth-search.aws.element84.com/collections/cbers4-awfi
                            hreflang: en-US
                    extents:
                        spatial:
                            bbox: [-180,-90,180,90]
                            crs: http://www.opengis.net/def/crs/OGC/1.3/CRS84
                        temporal:
                            begin: null
                            end: null  # or empty (either means open ended)
                    provider:
                        name: STAC
                        data: # borrow data architecture from OGR provider
                            source_type: ES
                            source: ES:http://localhost:9200/cbers4-awfi

Here the concept of collection is nested in the specific provider type. Other options @tomkralidis @pvgenuchten ?

To comment on the above comment:

Is stac intended to run along side ogc api’s in a single ogc-api endpoint, or does it require it’s own endpoint, in that case maybe deploy a second instance of pygeoapi in a ‘stac’ modus?

My understanding (which is a bit weaker, since I mostly work with static STACs) is that STAC API contains some additional endpoints:

  • /stac - Simply gets the root catalog.
  • /stac/search - Implemented so that STAC can do more advanced queries via extensions than what OAF currently supports

The idea would be that eventually, with the convergence of the Query/Filter extensions into OAF, the second endpoint would go away.

@matthewhanson could provide more info as I'm basically summarizing what I heard from him yesterday at the STAC sprint.

@tomkralidis
Copy link
Member Author

tomkralidis commented Nov 6, 2019

Perhaps /search as the cross collection search reuses the provider plugin approach and is specified like:

catalogues:
    landsat8-aws:
        type: STAC
        title: Landsat 8 AWS catalog
        description: Landsat 8 AWS catalog
        keywords:
            - landsat
        links:
            - type: text/html
              rel: canonical
              title: information
              href: https://registry.opendata.aws/landsat-8/
              hreflang: en-US
        extents:
            spatial:
                bbox: [-180,-90,180,90]
                crs: http://www.opengis.net/def/crs/OGC/1.3/CRS84
            temporal:
                begin: 2013-03-18
                end: null  # or empty (either means open ended)
        provider:
            name: Elasticsearch
            data: http://localhost:9200/landsat-aws/FeatureCollection
            id_field: ID

and then /search can is routed to reuse pygeoapi.get_collection_items. In the /search case, collections is a query parameter. So we can either consider searching every endpoint in catalogues in the config, or having a single catalogue with a required collection property that can be queried against. The former would be tricky as to how to return multi-collection results in a single FeatureCollection.

Thoughts?

@pvgenuchten
Copy link
Contributor

pvgenuchten commented Nov 6, 2019

Considering radiantearth/stac-spec#632 (comment), i imagine this method will search/browse through a server in a google type of way: a list of 3 datasets, 5 catalogrecords and 2 grids. I like it. From the current discussion I get the feeling that the stac team actually wants to see stac being made available embedded in a/the OAPI endpoint (and not separately).

Sorry of my unawareness about stac, am i getting it correctly that stac exposes a queryable series of metadata records of sensor observations (imagery) at a given time/location? A client will then be able to extract the relevant fraction of a cloud-optimised-geotiff (or alternative source)? To me these cases seem quite similar to what others are designing in OAPI-records, sensorthings and/or OAPI-coverage, so either very likely to collide (separate endpoint +1) or on the other hand this could be an opportunity to engage with those teams and design a shared model (embedded +1)

looking forward to hear your thoughts/ideas

@tomkralidis
Copy link
Member Author

tomkralidis commented Nov 7, 2019

WIP in https://github.com/geopython/pygeoapi/tree/stac . Notes:

  • code basically re-uses /collections/items logic along with a filter JSON payload (currently does nothing), and detects /search in order to query catalogues objects / backends in config
  • the concept of a default or cross collection search still to be determined. Specifying collections works, albeit against a single collection atm. If we have 1..n catalogues objects defined in pygeoapi, how would a cross collection search work? If we assume, for example, that all catalogues are backed by something like ES, then one can do cross index searching. Else, we could define a single catalogue in a pygeoapi instance in which all documents to be searched are in that single index, which would work, but not very pragmatic

Note the STAC example here is based on Landsat 8 AWS (tooling hacked together at https://gist.github.com/tomkralidis/3b6263ec9fbd84e6b50d79527dda149f
to setup a basic ES index.

@pvgenuchten
Copy link
Contributor

In geonetwork we deploy a specific instance of elastic search for this use case; metadata records, as well as content from WFS's is indexed in that instance to facilitate cross CSW/WFS search. An administrator indicates which WFS's to crawl.

This approach could also be a relevant for pygeoapi. In the case of csv/shapefiles pygeoapi could operate against the index for many operations, which would benefit performance.

If an index like elastic would become such an essential component, it would be good to facilitate an abstraction layer, so a user could select his favourite index (or database) to provide such functionality (SOLR, Noise, PostGIS)

@lossyrob
Copy link

lossyrob commented Nov 7, 2019

@francbartoli I'm a bit unclear what the best path is on the configuration side, but I think that's due to my lack of familiarity of pygeoapi. Tom's WIP branch looks like it's on the right track though!

@tomkralidis tomkralidis added the help wanted Extra attention is needed label Jan 13, 2020
@tomkralidis
Copy link
Member Author

Update: current work in https://github.com/geopython/pygeoapi/tree/stac

@tomkralidis tomkralidis modified the milestones: 0.8.0, 1.0.0 Apr 12, 2020
@tomkralidis
Copy link
Member Author

FYI functionality merged in #389. Keeping open for STAC API implementation.

@ricardogsilva
Copy link
Member

@tomkralidis
Any news on the implementation of the /search endpoint?

the stac branch doesn't seem to exist anymore, but apparently there was some WIP toward adding this functionality.

@tomkralidis
Copy link
Member Author

@ricardogsilva in the stac branch there was a basic Elasticsearch provider which became dated. With OGC API - Records evolving, we decided to wait on implementing STAC API until it becomes more clear on how OARec will relate to stac /search.

@gnosys-tmiller
Copy link

Hi All,

I hate to dig up an old post -- but has any /search feature been added (i.e. like https://stacspec.org/STAC-api.html#operation/getSearchSTAC)? We have just setup pygeoapi, and it seems to still not be available.

Thanks!

@tomkralidis
Copy link
Member Author

@gnosys-tmiller I have a pending branch/PR to implement STAC API, which should be completed in the next 2 weeks or so. cc @cholmes.

@bkanuka
Copy link

bkanuka commented May 12, 2022

Hey @tomkralidis is your WIP allowing an existing STAC API to be browsed from with pygeoapi, or for pygeoapi itself to act as a STAC API?

Also, this is labeled "help wanted" - what can be done to help? 😉

@jlaura
Copy link

jlaura commented Oct 11, 2022

I to am interested in the intersection of STAC and pygeoapi - any links to what support can be offered? Happy to dig in and help.

@dblodgett-usgs
Copy link

Any progress here @tomkralidis -- can we lend a hand getting this over the finish line?

Copy link

As per RFC4, this Issue has been inactive for 90 days. In order to manage maintenance burden, it will be automatically closed in 7 days.

@github-actions github-actions bot added the stale Issue marked stale by stale-bot label Mar 10, 2024
Copy link

As per RFC4, this Issue has been closed due to there being no activity for more than 90 days.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed STAC SpatioTemporal Asset Catalog stale Issue marked stale by stale-bot
Projects
None yet
Development

No branches or pull requests

10 participants