Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset -> Collection #260

Closed
matthewhanson opened this issue Oct 3, 2018 · 7 comments
Closed

Dataset -> Collection #260

matthewhanson opened this issue Oct 3, 2018 · 7 comments
Assignees
Labels
prio: must-have required for release associated with
Milestone

Comments

@matthewhanson
Copy link
Collaborator

We've been through this before, but now with the latest suggested changes every Item will be required to be part of a Dataset. A root catalog will point to 1 or more Datasets.

But, in a WFS compliant STAC dynamic API, Datasets fall under the /collections endpoint.

I think we should just change the name of Datasets to Collections. The extension for shared metadata would be called 'Collection properties'.

And as @m-mohr pointed out on Gitter, there are existing IANA relations named 'collection' and 'items', so we can use those as the rel types in links.

@cholmes

@m-mohr
Copy link
Collaborator

m-mohr commented Oct 3, 2018

As said on Gitter, I strongly support to rename dataset to collection as it would be inline with WFS (and openEO).
There are the IANA rel-types collection and item, too, which we could use. I think this is nice because we re-use existing things. So the rel types parent/root can point to either catalogs or collections/datasets and collection always points to the collection/dataset. That would make a merge also much easier.
I don't think 'Collection properties' is an intuitive name. Common properties sounds better to me. It's similar to the discussion we had for the renaming of the old collection extension.

@m-mohr
Copy link
Collaborator

m-mohr commented Oct 4, 2018

In addition, collection is unambiguous and dataset is ambiguous according to the following list of synonyms used by organizations from the CEOS OpenSearch Best Practice Document:

  • For STAC Item: dataset (ISO 19115), dataset (ESA), granule (NASA), product (ESA, CNES), scene (JAXA)
  • For STAC dataset: dataset series (ISO 19115), collection (CNES, NASA), dataset (JAXA), dataset series (ESA), product (JAXA)

The only unambiguous terms listed here are collection and dataset series, but the latter would imply to use dataset for STAC Items.so I'd still vote for collection.

@cholmes
Copy link
Contributor

cholmes commented Oct 6, 2018

So I think overall I like it, since if it all works right then it leads to a really nice integration with WFS3. But I have two fairly major concerns:

  1. Since our Dataset is subtly different than the WFS Collection response document I worry about it being more confusing to call ours 'collection' too. Like it seems to make it a bit easier to me to explain a standard WFS has its 'collections' endpoints and STAC defines Datasets, which are some additional fields that add on to that WFS endpoint. Like I'd be very happy if our collection/dataset concept was truly the same, and then our 'collection spec' would be very short and would just explain how to use the WFS concept for STAC. But I see value in what STAC is doing, which is to extend the WFS collection with 'more' to emphasize that those fields in Dataset can be used for search of datasets. This is the direction I want CSW 4.0 to go - some defined additional fields on WFS Collections, and then use those same fields to define search of collections.

  2. I really don't like the number of major changes we continue to make. I know we're aiming to 'get things right', but we keep making these pretty big leaps from the time we met, and we haven't released anything. I'd be much more comfortable with releases every 2-3 weeks where we are actually putting our ideas out there, instead of this continue churn. Ideally we'd just have a roadmap of the changes we want to make, and continue to iterate in a way that others can see. I feel a bit powerless on this though, since I have very limited time these days, since my typical approach would be to just cut releases on my own, making a 'last call' at a set date and then putting things in. For every major change I worry that we 'miss' something, and get it wrong. So like with this one Dataset feels really close to being able to release. I worry about a change in the name leading to some example that doesn't really work, some other major thing that we're just not thinking about, and then continuing to be thrown off another week or two in release, and then in that time coming up with something that feels really important to get right, and continuing like that till the end of the year and beyond. And indeed it then feels like we're more of a typical geo spec process - a small number of people turning things over for a long time and getting attached to our ideas without putting them out.

I do think 1) can work ok if we call ours STAC Collections, and consistently refer to it as that. And then STAC Collection Properties extension (which I do like), etc. And then I think we should commit to really aligning it with WFS Collections (I know I've pushed that alignment, but I also feel we have not committed - we don't mention it at all in the spec, etc). So that we'd work for STAC Collections to eventually just be an explanation of how to use WFS Collections in a standalone way, or indeed it even can serve as the accessible spec to explain those core fields (and we should work to make it a generic OGC spec that many can use / refer to). So to get towards there I think we'd want to rewrite the Dataset document to reference WFS Collections more, and explain how we are using its core fields and adding a few more to define our collection metadata, etc.

Of course that work of rewriting is what I worry about pushing us back more from a release, per concern 2) For my concerns on 2), they can obviously be mitigated some if Matthais and anyone else can really put in a bunch of time this week, to make all the changes needed. So if we really want to go this way then I'm ok with it. But I'd also be happy for us to say let's release 0.6.0 ASAP and make 0.7.0 about renaming this. I see the downside of that, since we want to make that change, so why not just do it. But I'd also like to get us in a habit of more just releasing every month. Part of this is to show momentum to the world, to show that we're working differently and anyone can join in, etc.

Curious what @hgs-msmith and @hgs-truthe01 think on this too...

@m-mohr
Copy link
Collaborator

m-mohr commented Oct 7, 2018

I agree, that we ned to make sure WFS collections and STAC collections would be described carefully and separately so that it is clear to anybody, but it wasn't a problem until now as there were only few places where WFS collections were mentioned at all.

I think if we want to rename it to collections than we should directly do it in 0.6 otherwise it would complicate things and confuse people. We already know this problem from the collections discussions on the last sprint. I can put in a bunch of time to update the spec to collections if we decide for the change, so it would not block a release. We just need a consensus.

I agree that we should add some wording regarding the alignment with WFS into the dataset spec and we should make sure that we stay connected and up-to-date with their changes.

@m-mohr m-mohr self-assigned this Oct 7, 2018
@hgs-msmith
Copy link
Contributor

Chris, I agree with the concerns you raise. Furthermore, I've been thinking that it is about time to review overall alignment with WFS 3.0. I propose that we make this a theme for 0.7.0. The dataset/collections concern is clearly in that category. So, in summary I recommend to ship 0.6.0 with dataset and consider this naming issue later.

@m-mohr
Copy link
Collaborator

m-mohr commented Oct 8, 2018

I think it will be hard to really discuss alignment with WFS3 until they have their second draft out (other than discussing to skip alignment with WFS3 overall). I think the second draft is expected in July 2019 - not sure why it takes so long. There are several issues that are related to our alignment with Datasets, e.g. opengeospatial/ogcapi-features#171, opengeospatial/ogcapi-features#168 and opengeospatial/ogcapi-features#155

@m-mohr m-mohr added this to the 0.6.0-RC1 milestone Oct 9, 2018
@m-mohr m-mohr added the prio: must-have required for release associated with label Oct 9, 2018
m-mohr added a commit that referenced this issue Oct 13, 2018
#260: Rename 'Dataset extension' to 'Collection extension' + other improvements
@m-mohr
Copy link
Collaborator

m-mohr commented Oct 13, 2018

Renaming completed

@m-mohr m-mohr closed this as completed Oct 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
prio: must-have required for release associated with
Projects
None yet
Development

No branches or pull requests

4 participants