Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update catalog.json to new 'links' style #127

Closed
cholmes opened this issue Jul 11, 2018 · 7 comments
Closed

Update catalog.json to new 'links' style #127

cholmes opened this issue Jul 11, 2018 · 7 comments

Comments

@cholmes
Copy link
Contributor

cholmes commented Jul 11, 2018

Update catalog.json in the examples and the schema validation to do link json in the same way we do in 0.5.0. Like do

{
  "name": "LC08/01/107/061",
  "description": "Landsat 8 Collection 1 Path 107 Row 61",
  "links": {
    "self": {
      "href": "https://storage.cloud.google.com/gcp-public-data-landsat/LC08/01/107/061/catalog.json",
      "rel": "self"
  },
  "parent": {
    "href": "https://storage.cloud.google.com/gcp-public-data-landsat/LC08/01/107/061/catalog.json",
    "rel": "self"
  },
  "root": {
    "href": "https://storage.cloud.google.com/gcp-public-data-landsat//catalog.json",
    "rel": "self"
  },
  "collection": {
    "href": "https://storage.cloud.google.com/gcp-public-data-landsat/LC08/01/catalog.json",
    "rel": "self"
  },
  "items": [
    {
      "rel": "item",
      "href": "https://storage.cloud.google.com/gcp-public-data-landsat/LC08/01/107/061/LC08_L1GT_107061_20130816_20170503_01_T2.json"
    },
  {
    "rel": "item",
    "href": "https://storage.cloud.google.com/gcp-public-data-landsat/LC08/01/107/061/LC08_L1GT_107061_20150721_20170406_01_T2.json"
 },
...

@mojodna - I think this will need a change to STAC browser. Maybe we should make this a 0.6.0 release? Though it feels like a relatively minor change, just for the static browser, and one we should have made in 0.5.0.

I suppose we could also just keep it the same, and have the catalog links not be keyed. cc @matthewhanson

@mojodna
Copy link
Collaborator

mojodna commented Jul 12, 2018

+1 for consistency across all "lists" of things.

However, @metasim's question about how to model STAC as Postgres DDL (I was interpreting it as "static STAC") got me thinking about this a bit more... In order to model it, a join table would need to be introduced that contains the "key", which complicates things a bit.

Remind me again why lists are no longer modeled as arrays; I missed that discussion and still don't fully buy the change.

@cholmes
Copy link
Contributor Author

cholmes commented Jul 12, 2018

I believe @matthewhanson advocated for switching from arrays. I think the idea was to make it easier to reference a particular asset (originally was just assets, then links for consistency). So instead of making implementations search through the whole array to find the 'geotiff' asset, or the 'metadata' asset, they can just use the geotiff key to get exactly what they want.

I think it was seen to be a relatively minor convenience win, with little downside. Though yeah, once we get in to catalog links it seems like more of a stretch.

@francbartoli
Copy link
Contributor

@mojodna @metasim why not to use Postgres JSONB field to model links/assets?

@metasim
Copy link

metasim commented Jul 15, 2018

@francbartoli To be honest I hadn't given it that much thought... I was more interested in just bulk loading a static asset catalog into PostGIS for spatial predicate search (primarily), and filtering by metadata (secondarily). I'm not sure I understand what benefits JSONB bring, but I'm hardly knowledgable about json-in-relational-database matters. My assumption is that one would want the geometry to be in it's own column for indexing, and so the SQL writer doesn't have to dig it out of the JSON structure for spatial relations. Is the benefit to just save the work of mapping to DDL?

WRT @mojodna's statement about having a join table, etc., I'd be happy with (and may end up writing for myself‡) a DBA-cringeworthy mapping that doesn't worry so much about normalization, more in the style of a big data database where you organize around a particular query pattern. Probably closer to the ElasticSearch mapping that @matthewhanson did for sat-api, but less JSON-y. That said, a truly robust {3|2}NF table model would be sweet.

IMO, +1 for key/value lookups over access via array index.

‡My current thinking is to create a custom swagger-codegen output format sufficient enough to process STAC into PostGIS DDL. Anyone have better ideas?

@francbartoli
Copy link
Contributor

@metasim I'm with you about geometry indexing even though the mix with json in relation could save any effort at least to model links/assets objects. Also JSONB can be indexed for faster operations

@mojodna
Copy link
Collaborator

mojodna commented Jul 16, 2018

JSONB hadn't crossed my mind either, and will be great for modeling metadata keys that vary (properties, rel / ref, etc.). However, it feels like a bit of a stretch for relationships when doing some normalization of data (i.e. no ability to create foreign keys) and one would need to be more careful when updating lists of relations (if the JSONB refers to links / assets / catalogs in different tables) to avoid race conditions, etc.

More generally, I think my objection to lists-as-objects is that it introduces application-specific "optimizations" into the core spec (in the form of facilitating lookups) while dirtying the semantics associated with those lists:

  • JSON objects have no inherent ordering (in practice, they do in most implementations, but that seems like a bad assumption to make); this means that the resulting lists won't necessarily be consistent across viewers
  • a non-semantic element is introduced (the object key) that introduces ambiguity; some implementations will apply meaning to this (a cog key in an asset collection) in ways that potentially conflict with "better" resolution rules (ref / rel)
  • a collection of things is inherently a list; actions like "give me the first asset" (jq .assets[0]) become more difficult (jq ".assets | to_entries | first | .value"; I had a hard time figuring this out)
  • since object keys are now required, STAC generators will need to produce them; most keys will be opaque strings (UUIDs, etc) with duplication needing to become a consideration (merging links collections from multiple catalogs becomes a non-trivial operation, especially if it's deemed important to try to retain object keys)

@cholmes
Copy link
Contributor Author

cholmes commented Aug 24, 2018

Decided against doing this at the sprint 3. Closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants