Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider versioning STAC Items #136

Closed
cholmes opened this issue Aug 1, 2018 · 16 comments
Closed

Consider versioning STAC Items #136

cholmes opened this issue Aug 1, 2018 · 16 comments
Assignees
Labels
prio: must-have required for release associated with
Milestone

Comments

@cholmes
Copy link
Contributor

cholmes commented Aug 1, 2018

I think we discussed versioning at the first sprint and decided against it for some reason.

It does seem like it would be useful now for clients to understand what is changing.

Or alternatively we commit to getting pretty stable pretty soon. Though even if we are stable for a number of years at a 1.0 then it does seem useful to be able to go to 2.0.

If we're for versioning then the question is where - does every item need it? Or do we do it at a collection / catalog level.

@cholmes
Copy link
Contributor Author

cholmes commented Aug 1, 2018

Huh, made this before seeing #135 but they are certainly quite related.

@m-mohr
Copy link
Collaborator

m-mohr commented Aug 1, 2018

Not sure, but versions could also be mixed across a catalog, right? If you have a pretty large catalog with a huge amount of generated items using the old version and providers providers do not want to update them, but only use the newest version of STAC for new items, you would end up with mixed version. What happens if a provider wants to support two versions at once? So it doesn't sound like a good idea to put it in the catalogs only.

@cholmes
Copy link
Contributor Author

cholmes commented Aug 1, 2018

Allowing versions to be mixed across a catalog sounds less good to me. Not sure we want to recommend that as a good practice, but I suppose we should think about allowing it.

I suppose we do have our collections extension, to handle not duplicating fields, so we could set it at the Item level, but recommend that people use the collection extension, since it would be a field that repeats for most objects.

@m-mohr
Copy link
Collaborator

m-mohr commented Aug 1, 2018

Definitely not to be recommended, but it is a scenario that could happen, especially in times when new versions are released and people want to support "legacy" tools, but also follow the newest spec to get things forward. Usually the ecosystem needs some time to adapt to standards and then it could be useful to have two versions hosted in parallel.

Anyway, having it in the collection sounds like a good idea at first, but if anything regarding collections or linking to collection changes, then it may not possible anymore to read the version information as the JSON of the item (or catalog) is probably your entry point and tooling can not read the version number from the referenced file due to the changes.

We need to discuss about this, just wanted to write down these thoughts.

@cholmes cholmes added the prio: must-have required for release associated with label Aug 24, 2018
@cholmes
Copy link
Contributor Author

cholmes commented Aug 24, 2018

So it seemed to be a strong consensus that we should start to version. When we discussed it we said it should be in the catalog. But does it go in the 'dataset'? I fear that sticking 'stac version' in the dataset spec might make it a bit less reusable in other contexts?

Could make it an optional field in the (non-dataset) catalogs? Thoughts @m-mohr ?

Marking this 'must-have' for 0.6.0, though I think we still need to actually decide the exact course of action.

@cholmes cholmes added this to the 0.6.0-RC1 milestone Aug 24, 2018
@m-mohr
Copy link
Collaborator

m-mohr commented Aug 24, 2018

Well, we need to place this where STAC is usually entered by tools to allow them deciding how to process the incoming data. Otherwise we don't need to specify the version anyway. It's really for machines only to let parsers decide which version of their parser to use for the document and they need to know this instantly and not after parsing several referenced documents.

It's probably a bit tricky as tools may not use dataset as the entry point, but a catalog linking to datasets, right? So our entry point is unknown. Then we actually would need to put them in mostly all documents as we don't have an entry point. Or are catalogs required for items, so every Item must have parent catalog? Then we can just add them to the catalogs. Datasets basically inherit from them anyway so it'll be in the dataset, too.

If we require datasets for items, then it could make sense to have them in the dataset only. I think it's cool to have it in there, it still might be useful for other use cases, too.

Naming: The dataset already has a field version for the dataset version. We could go the OpenAPI way and just have a property stac with the version as value. We maybe shouldn't have the reference to STAC in there if we want the dataset spec to be used outside of STAC. Then we may just use something generic such as spec_version or metadata_version. By the way, openEO will probably just use STAC directly and not just the fields defined by the dataset spec, so that's not really a problem for us.

@matthewhanson
Copy link
Collaborator

As @m-mohr stated dataset has a version fo the dataset.

I think the STAC spec version belongs at the very top catalog level, it really doesn't make sense to mix spec versions within a catalog.

@cholmes
Copy link
Contributor Author

cholmes commented Aug 24, 2018

I do agree it belongs at the very top catalog level. I think that's generally 'the entry point', and if things can navigate their way back up to get to the version if they came in somewhere else that seems fine. Though I don't think we have a requirement for catalogs to point to their 'parent', though I think it is used.

Not sure if you can represent this in JSON schema, but it seems like an ideal would be that a catalog must have either a version or a parent...

@m-mohr m-mohr self-assigned this Aug 24, 2018
@m-mohr
Copy link
Collaborator

m-mohr commented Aug 24, 2018

So I'll add that to the new catalog spec, which will inherit it to the dataset anyway. No version required for the items?! It doesn't necessarily need a parent, I'd say it needs a root?!

@cholmes
Copy link
Contributor Author

cholmes commented Aug 24, 2018

We just said that a dataset link is 'strongly recommended'. Are you saying root is just the catalog that has the version? And that should be required?

@m-mohr
Copy link
Collaborator

m-mohr commented Aug 24, 2018

Well, I thought that an Item requires either a catalog or a dataset as parent/root or can Items live on their own? Then we would need the version also in the Item spec. That is what I was struggling with when understanding this.
Yes, root should be the top-level catalog or dataset, which has the version number included. Or can there be catalogs referencing to different STAC versions for example if one catalog is linking to multiple providers implementing different versions?

@matthewhanson
Copy link
Collaborator

Ah ok, interesting question - can items live by themselves or do they always need to be in a catalog? (I think if they do require to be in a catalog it's not much of a stretch to say they need to belong to a Dataset, aka Root Catalog).

@cholmes
Copy link
Contributor Author

cholmes commented Aug 24, 2018

Weren't we just discussing this in #187 ? I guess it's dataset required vs catalog required...

It looks like right now (0.5.2) the Item spec says nothing about links to catalog / root. And the static catalog spec recommends pointing to a 'root', but doesn't require it.

I remain open to requiring it in some form. I suppose we could say an Item is required to either have a link to a catalog that has the version or to have the version as a field...

Then it's another question if the required thing is a catalog or a dataset.

@m-mohr
Copy link
Collaborator

m-mohr commented Sep 3, 2018

In #187 we were discussing whether datasets are required or not. That's different from requiring a catalog. The simple question is: Do we need a STAC version in the item or not. And this depends whether they can live on their own or must be part of a catalog or dataset. If we don't want to change the spec regarding the links to catalogs/root/etc. we would need to add the STAC version to every item. I'm not sure whether those "multi-level" dependencies (either have a link to a catalog that has the version or to have the version as a field) are easy to use, understand and validate. At least for validation I think you can't model that with JSON Schema so it will end up being problematic as providers will not really be forced to follow this.

@m-mohr
Copy link
Collaborator

m-mohr commented Oct 5, 2018

#194 progressed and is now requiring a dataset for every item. Therefore I think we should just require a stac_version in every catalog (gets inherited to dataset) and that solves the whole issue.

m-mohr added a commit that referenced this issue Oct 5, 2018
m-mohr added a commit that referenced this issue Oct 8, 2018
#136:  Added the STAC version to catalog and dataset
@m-mohr
Copy link
Collaborator

m-mohr commented Oct 8, 2018

Got merged, we'll revisit and iterate this with feedback we receive.

@m-mohr m-mohr closed this as completed Oct 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
prio: must-have required for release associated with
Projects
None yet
Development

No branches or pull requests

3 participants