Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STAC / Records Harmonization #178

Closed
pvretano opened this issue Sep 14, 2022 · 13 comments · Fixed by #236
Closed

STAC / Records Harmonization #178

pvretano opened this issue Sep 14, 2022 · 13 comments · Fixed by #236

Comments

@pvretano
Copy link
Contributor

There was a discussion at the Sept 2022 OGC Metadata Code Sprint about how Records and STAC could come into closer alignment and the conclusion of that discuss was:

  • Remove the created and updated fields that describe when a resource was created and updated.
  • Rename the recordCreated and recordUpdated fields to simply be created and updated
  • Modify the link structure to add optional created and updated fields

To understand the implication, imgine that you have a dataset. That dataset has several distributions (for the sake of example say a ZIP and TGZ distribution). A record is created to describe the dataset (and the created and updated properties for the record are populated accordingly). Links are added to the links section of the record that point to the two distribution and the created and updated fields in each link is populated to reflect when each distribution was created and last updated.

NOTE: This issue was created to describe the conclusions for the code sprint discussion and as an anchor for a PR that I am working on to implement the agreed upon changes.

@mhogeweg
Copy link
Contributor

what is the value in maintaining created/updated for each link vs maintaining it at just the record level?

@m-mohr
Copy link
Contributor

m-mohr commented Sep 15, 2022

Links can be created or updated at different times. Let's say you add a record for a GRIB file and a year later want to add a COG file, then you have different dates for each link. Clients could then be smart and only download one of them if you've harvested the other one before...

@mhogeweg
Copy link
Contributor

the same is true for any piece of information in the record: point of contact, description, etc.

@m-mohr
Copy link
Contributor

m-mohr commented Sep 15, 2022

Sure, but the metadata is usually written and harvested all at once.

@mhogeweg
Copy link
Contributor

agree and this then back to the value of putting a date on a single piece of that metadata. we're not talking to new assets that are added to the STAC as a whole (as new data gets collected all the time). the STAC items corresponding to those will have their own creation dates. This is about modifying an existing item with an additional piece of information that for some reason is being tracked separately from the other pieces of information for that item.

@pvretano
Copy link
Contributor Author

pvretano commented Dec 12, 2022

12-DEC-2022: The issue about the creation and updated dates of the record and resources has been handled via PR #179. Will leave this issue open for a bit longer as a catch-all for other STAC horminzation issues. For example issue #180 probably falls into the OAPIR/STAC harmization catageory.

@m-mohr
Copy link
Contributor

m-mohr commented Jan 11, 2023

Another harmonization issue is #195

@pvretano
Copy link
Contributor Author

Related to closed issue #64.

@m-mohr
Copy link
Contributor

m-mohr commented Apr 12, 2023

I think it's worth having a fresh look again at how STAC and OAPIR align. I'll just list differences, nothing that's already aligned. I'm looking at this list: https://github.com/opengeospatial/ogcapi-records/blob/master/core/standard/clause_7_record.adoc#core-queryables-resource-table

First question: There seem to be differences between the table linked to above and the README. Is this intentional?

  • conformsTo - Basically stac_extensions, one applies for Records one for STAC. I guess that's fine.
  • properties.title - the same schema, but required in Records and optional in STAC
  • properties.type - required in Records, not present in STAC
  • properties.description - STAC allows CommonMark. Does Records allow it?
  • properties.language - See discussion in Update languge handling based on discussion with STAC. #211
  • properties.themes - We can re-use this in https://github.com/stac-extensions/subjects - How mature is the spec in Records?
  • time / properties.datetime (properties.start_datetime/properties.end_datetime) - time is required in Records, datetime in STAC. Also, it seems the definitions are different.
  • license - This seems to be aligned, but STAC might be a bit more strict with regards to allowed values? Not 100% sure...
  • providers - It looks like Records tried to be aligned with STAC, but it's not quite aligned. There are two different structures in Records
    • One tries to be aligned with STAC. Records has required url and optional roles, while STAC has required name and optional description, roles and url. roles are more broad in Records. Maybe there's room for further alignment? I was wondering whether a name should be added to Records?
    • The other structure is allowing much more. I think I'd like to take over this construct for a new "contacts" extension in STAC (although we have the difference in camelCase and snake_case). Is this object aligned with e.g. one of the ISO metadata standards? That would be great.
    • So currently these definitions are somehat in conflict. It looks like Records allows more than just providers here so I'm wondering whether to rename this to "contacts" or so? This would resolve the conflict and leave the option for STAC to provide a contacts extension that allows to fully align with Records.
  • links - The schema is aligned, but it might be that there are subtle differences in the required fields or used relation types:
    • "prev" vs "previous"
    • "URLs" as relation types
    • self links not required in STAC
    • Required fields:
    • ...

Any thoughts, @pvretano ?

@pvretano
Copy link
Contributor Author

I think it's worth having a fresh look again at how STAC and OAPIR align. I'll just list differences, nothing that's already aligned. I'm looking at this list: https://github.com/opengeospatial/ogcapi-records/blob/master/core/standard/clause_7_record.adoc#core-queryables-resource-table

First question: There seem to be differences between the table linked to above and the README. Is this intentional?

Ignore the README for now. I don't have to time to work on the document and keep the README up-to-date relative to the document.

  • conformsTo - Basically stac_extensions, one applies for Records one for STAC. I guess that's fine.

OK.

  • properties.title - the same schema, but required in Records and optional in STAC
  • properties.type - required in Records, not present in STAC

These are mandatory for a couple of reasons. Records is generic and so the properties.type needs to be there to tell you exactly what kind of thing the record is describing. Although the links section in a record is mandatory, any particular record is not compelled to include a link to the resource the record is describing and so the record should (at least) have title describing that resource. Also, some consideration what given to the fact that we want some kind of compatability with CSW 2.0 (so that OGC API facades can be implemented on top) and in CSW 2.0 title is mandatory in the record.

  • properties.description - STAC allows CommonMark. Does Records allow it?

It is not restricted in any way that I am aware of ... so CommonMark would be fine. Does STAC specifically mention that CommonMark is allowed here? If yes, then I can add a similar "recommendation" to records.

I believe that the only remaining issues with respect to language is the direction and on that score I am fine with just making the default ltr due to its statistical prevelance ... as @m-mohr mentioned in the issue.

We are working to get this specification adopted by OGC by the end of the year so from that perspective I would say it is pretty mature. I don't anticipate any further chanages to the themes structure but @tomkralidis might have a better view on that since he designed that structure.

  • time / properties.datetime (properties.start_datetime/properties.end_datetime) - time is required in Records, datetime in STAC. Also, it seems the definitions are different

In STAC properties.datetime (properties.start_datetime/properties.end_datetime) is mean to be searchable temporal information associated with the asset. The function of time in record is the same. It is meant to be searchable temporal information assiociated with the resource that the records describes. I should perhaps add some more description about that to make that clear. Records inherited the schema of the time property from JSON-FG ... although we need to update the schema a bit to harmonize with the latest definition from JSON-FG.

  • license - This seems to be aligned, but STAC might be a bit more strict with regards to allowed values? Not 100% sure...

The intent was to be aligned with STAC but I will re-review to make sure. Because records is mean to be generic, we may have have loosened the value requirements a bit. Anyway, I'll review.

  • providers - It looks like Records tried to be aligned with STAC, but it's not quite aligned. There are two different structures in Records
    • One tries to be aligned with STAC. Records has required url and optional roles, while STAC has required name and optional description, roles and url. roles are more broad in Records. Maybe there's room for further alignment? I was wondering whether a name should be added to Records?
    • The other structure is allowing much more. I think I'd like to take over this construct for a new "contacts" extension in STAC (although we have the difference in camelCase and snake_case). Is this object aligned with e.g. one of the ISO metadata standards? That would be great.
    • So currently these definitions are somehat in conflict. It looks like Records allows more than just providers here so I'm wondering whether to rename this to "contacts" or so? This would resolve the conflict and leave the option for STAC to provide a contacts extension that allows to fully align with Records.

The intent of the party.yaml schema is to allow providers information to either be included in the record by reference via URL OR encoded as an in-line value in the record.

The intent was to harmonize the structure used to reference provider information with the STAC provider object so to that extent I can modify the schema to include a name and description like STAC. Records would leave the roles value space a little more open than STAC although in the narrative description I could mention the STAC values.

As for spliting these two structures ... as I mentioned these are not two separate things. They are just two diffrent ways to include provider information in the record. One by reference; one by in-line value.

Question: what would the distinction between "providers" and "contacts" be in STAC? Maybe if I understand that I can propose changes to bring STAC and records closer together on this point.

  • links - The schema is aligned, but it might be that there are subtle differences in the required fields or used relation types:

    • "prev" vs "previous"

Records tries to use IANA link relations where possible. Since "prev" and "previous" are both valid IANA link relations, either is fine in Records.

  • "URLs" as relation types

OGC has some kind of authority to define new link relations which are formally defined as a URI and then also given a compact (or CURIE) representation. Here is an example from the OGC definition server ... http://defs.opengis.net/vocprez/object?uri=http://www.opengis.net/def/curie/ogc/1.0/rel/processes. The "formal" value for this rel is http://www.opengis.net/def/curie/ogc/1.0/rel/processes but, as you can see from the definition, the short name is ogc-rel:processes. So in a link you could use rel=http://www.opengis.net/def/curie/ogc/1.0/rel/processes or the compact form rel=ogc-rel:processes.

In records right now, when we define a "new" rel we use a full URI but eventually we will define CURIEs for all those too.

  • self links not required in STAC

We inherited this of the OGC APIs. I think in most (all?) cases, the self link is required.

  • Required fields:

  • ...

Hmmm ... yeah features and common both require the rel. Not sure why we left that off in records but I will update the schema to match features and common ... and STAC too.

Any thoughts, @pvretano ?

See in-line! ;) I'll create a PR to make the changes discussed thus far. We can discuss further and make additional changes in that PR.

@fmigneault
Copy link

According to https://www.rfc-editor.org/rfc/rfc8288#section-3.5, both ogc-rel:processes and http://www.opengis.net/def/curie/ogc/1.0/rel/processes could be provided simultaneously in rel. Something to consider as well... or explicitly define which combinations are expected by STAC/Records.

Note that link-values can convey multiple links between the same link
target and link context; for example:

Link: http://example.org/;
rel="start http://example.net/relation/other"

Here, the link to "http://example.org/" has the registered relation
type "start" and the extension relation type
"http://example.net/relation/other".

@m-mohr
Copy link
Contributor

m-mohr commented Apr 12, 2023

Thanks @pvretano for all the comments! Looking forward to the PR. Feel free to ping me for a review.

These are mandatory for a couple of reasons. Records is generic and so the properties.type needs to be there to tell you exactly what kind of thing the record is describing.

But it's only really useful if someone defines a common vocabulary, right? If it's just free-form I'm not sure why it needs to be required.

so the record should (at least) have title describing that resource.

I see. I'm not sure everyone would agree on a "equivalence" between STAC Item and STAC Record, but requiring a title for e.g. EO data such as Sentinel-2 where each is just another capture is not very useful. Those just don't have useful titles. Nevertheless, Somtimes I'd like to have titles for them so that it's not just always the ID...

type [...] title [...] time / datetime

Anyway, having different required fields is somewhat fine as long as the clients are aware. It means, that a OAR client may error if it reads a STAC Item as it doesn't have the required fields. It also means (for now just for these two fields) a OAR is always STAC complient, but a STAC is usually not OAR compliant. Similar issues occur with time vs datetime.

So I guess we acknowledge that OAR and STAC are not 100% compliant, but we try to avoid direct conflicts like with updated/created where the meaning or behavior of fields contradict or so.

Does STAC specifically mention that CommonMark is allowed here? If yes, then I can add a similar "recommendation" to records.

Yes, every description field in STAC allows CommonMark explicitly (inspired by OpenAPI). Example:

REQUIRED. Detailed multi-line description to fully explain the Collection. [CommonMark 0.29](http://commonmark.org/) syntax MAY be used for rich text representation.

We are working to get this specification adopted by OGC by the end of the year so from that perspective I would say it is pretty mature. I don't anticipate any further chanages to the themes structure but @tomkralidis might have a better view on that since he designed that structure.

Okay, great. I'll wait for a confirmation from @tomkralidis and then I'll try to update the subjects extension in STAC to use the OAR spec.

The intent was to be aligned with STAC but I will re-review to make sure. Because records is mean to be generic, we may have have loosened the value requirements a bit. Anyway, I'll review.

A bit of background: STAC allows SPDX or "various" or "proprietary" (and nothing else). For non-SPDX values at least one link must be given.
Generally, this should cater for most use-cases as you can always use non-SPDX with a link as fallback, but we got feedback from the community that they dislike "proprietary" and "various". An open license that's just not part of SPDX (mostly for software, not so much for data) would be titled with "proprietary" which people don't like as it doesn't reflect the open nature and "various" because it's just one license. We can't really change this in STAC though until we go 2.0, I guess.
Just wanted to mention that there might be flaws in what STAC does. If I'd do this from scrath I'd probably go for SPDX + "other" (for all non-SPDX cases), but that ship has sailed for us. I could see OAR to adopt STAC, but maybe to pioneer a new /better non-SPDX value. STAC clients should probably not have too much of an issue with it.

Question: what would the distinction between "providers" and "contacts" be in STAC? Maybe if I understand that I can propose changes to bring STAC and records closer together on this point.

I moved this to a separate issue as it seems to be to complex for this issue.
Please see #228

So in a link you could use rel=http://www.opengis.net/def/curie/ogc/1.0/rel/processes or the compact form rel=ogc-rel:processes. [...] In records right now, when we define a "new" rel we use a full URI but eventually we will define CURIEs for all those too.

Honestly, I don't like this at all. One or the other is fine, but always having two variants makes the implementation more difficult without any obvious benefit.

@pvretano
Copy link
Contributor Author

These are mandatory for a couple of reasons. Records is generic and so the properties.type needs to be there to tell you exactly what kind of thing the record is describing.

But it's only really useful if someone defines a common vocabulary, right? If it's just free-form I'm not sure why it needs to be required.

@m-mohr, that is correct and we currently have a section in the specification, Type and Licenses, that has a bunch of recommendations around that very point. There is also Annex C: Common resource types (Informative) that defines common type identifiers for OGC resources.

All of this is currently informative and we should probably promote the recommendation to requirements and convert ANNEX C to a conformance class. Other communities of interest would define whatever type identifier values they need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants