Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collections concept is confusing #47

Closed
cmheazel opened this issue Sep 5, 2019 · 34 comments
Closed

Collections concept is confusing #47

cmheazel opened this issue Sep 5, 2019 · 34 comments
Labels
Collections Applicable to Collections (consider to use Part 2 instead) Resources of Collections type Issues related to the /collections path

Comments

@cmheazel
Copy link
Contributor

cmheazel commented Sep 5, 2019

From the API Hackathon report:

For GeoSeer, the OGC API specifications all seem quite interesting, though the concept of
"collections" took some getting used to and remains somewhat confusing. This suggests that
collections may not be as intuitive as could be desired and users may struggle with the concept.

@cmheazel cmheazel added Hackathon Resources of Collections type Issues related to the /collections path labels Sep 5, 2019
@cmheazel cmheazel removed the Hackathon label Jan 5, 2020
@m-mohr
Copy link

m-mohr commented Jan 24, 2020

At this point of time, can someone explain what collections in OGC API world are and what is grouped beneath them? I understood them basically as grouping for dataset series (as used in ISO 19115).

@dblodgett-usgs
Copy link

I would say the community is still evolving on how exactly to define the collections resources.

OGC API Features originated the idea here.
(Returns a listing of feature collections summary information)

Coverages is using it. See here for discussion.
(Returns a listing of coverages summary information)

In EDR we are just getting started, but the slides linked here offer a good summary, see slide 18.
(Returns metadata about that shares coordinate dimension space -- may be an nD coverage, may be features)

Common hasn't overthought things. Which I think is a good thing.

I think we are seeing convergence on the idea that a given conceptual data layer is an item in the collections list that may be accessible by one or more OGC-API (a feature coverage can be either for example) but that is still being explored.

Are people looking for a concrete constrained definition here? Or can we allow some flexibility in this concept as long as they are technically interoperable?

@cmheazel
Copy link
Contributor Author

It's not too hard.
Let's start with the concept of a collection. The concept of a Feature Collection is well established in the General Feature Model. It is a collection of Features. API-Features exposes these Feature Collections. Coverages are a different animal. A coverage is a collection of measurements. Therefore a coverage is also a collection. So a collection in API-coverage is a coverage.
Now, a couple assertions:

  1. an API can expose more than one collection
  2. the collections exposed by an API are not required to be the same type.

Since there can be more than one collection, then we need a resource which is a collection of collections. This is the /Collections resource. This resource informs the client of what collections are available from this API, their type, their identifiers, and additional useful metadata. Once you have identified the collections you are interested in, you can get more detailed metadata from the /collections/{collectionId} resources.

Up to this point we have dealt with collections as an abstract. This is where API-Common stops. To go further, you need to know the type of collection you are dealing with and use the appropriate standard (Features or Coverages) to access it.

Hope that helps.

@jyutzler
Copy link

jyutzler commented Mar 7, 2020

OK, but if we're going to do this, "collection" should be defined in Terms and Definitions.

@joanma747
Copy link
Contributor

In the OGC API features it says:
"a set of features from a dataset"

In common we could come up with something like:
"a set of one or more elements from a dataset identified by an id. The service can expose a collection in zero of more representations: e.g.: a collection of features, a coverage of one or more range types, tiles representing features of coverage data, a maps representing features or coverage data, etc".

@cportele
Copy link
Member

cportele commented Mar 8, 2020

Just to explain again the "collection" concept in Features:

  • Every API conforming to Features Core provides access to a single dataset (unless the API also conforms to a potential, future multi-dataset extension).
  • Most existing feature datasets are organized into several collections. For a dataset with a schema this is typically by feature type.
  • Since Features Core does not require a schema, there is no discussion about feature types and the more general concept of "feature collections" is used, with "collection" as a synonym in the context of the specification.
  • The other reason for the more flexible "feature collection" concept is that even in the case of a dataset with a schema one may want to publish the features not by feature type. For example, an API with 100+ collections of topographic feature types may be overwhelming to users and I may decide to aggregate the feature types into just a handful of collections to make the data easier to understand and use.

@bbarden
Copy link

bbarden commented Mar 10, 2020

As a programmer who's working with these concepts peripherally, "Collections" in particular (and the fact that it reads like a plural noun but refers to a singular thing) is really quite awkward to properly discuss, in person and in comments. There's confusion in referring to multiple instances of the Collection resource, and there's awkwardness in referring to multiple instances of the Collections resource. Even having fully grasped what is meant by these concepts, the name itself is awkward to talk around.

@jeffharrison
Copy link

I also find the use of 'Collections' confusing, and not all OGC APIs need to use it. For example, Styles API does not.

It seems, sometimes, that it may have been carried forward from Feature Collections and just sort of, well, stuck around.

Best Regards,
Jeff

@jyutzler
Copy link

jyutzler commented Mar 13, 2020

Let me elaborate on @bbarden's comment. In support of VTP2, we are writing code that acts as a client of the various web services that support vector tiles. Our preferred paradigm is to provide my system an entry point and let it discover what is available and what it needs through the available links. For the most part this works great. If I GET a TilesCapabilities document, I can inspect the tiles link to GET a Tiles metadata document. From there I can request the item link to GET the individual tiles. (This is a simplification; I'm ignoring things like ACCEPTS headers for the purpose of this discussion.) In general I'm happy with this.

Collections complicate matters because they are completely untyped. They might have a tiles link, they might not. They could also have queryables and/or items. They might even have more collections. This forces me to create a very generic Collections class with all sorts of methods which might or might not be applicable to the current situation. Does it work? Yes, but in my opinion it is ugly because there is no telling what I could find in a particular document. It is kind of rude to give the developer the option to do something that is actually impossible. I don't even know which API I am using at any particular time. Heaven forbid I don't understand how to make a particular request because I wouldn't even know which document to look in.

I believe it would be preferable to have clearly distinguished document types that report things like feature collections, tile pyramids, etc. I propose a landing page with a hierarchy like the following:

  • feature collections (I'm okay calling this "/collections")
    • feature collection
      • items (i.e., GeoJSON)
        • queryables
        • schema (!)
      • tiles (i.e. a vector tiles pyramid) [1]
        • item (template)
        • etc.
  • tile pyramids (how about "/pyramids"?)
    • tile pyramid [2]
      • item (template)
  • another resource type...

Since [1] and [2] are the same resource, I would just assume they be the same href. In the case of a tile pyramid backed by a feature store (i.e., vector tiles), I'd also like the tile pyramid document to have a back-link to the underlying feature collection resource so that someone could find the schema information when pointed to the tile pyramid.

Can I make it work without this layout? Yes, but if we are going to try to make something that makes sense, I think it should be in the direction that I propose here. It is probably too late to make this kind of change for the VTP, but I think it is where we ultimately want to go.

@jeffharrison
Copy link

Great comments Jeff Y. I would add that I love the breakdown of resources in OGC API - Common. Very well written and powerful concepts.

/ /api /conformance are spot on. This approach will save major time and money in architectures and implementations.

Then, the current collections section complicates things. I would say also that if I had to lead the development of a class for it I wouldn't know where to start.

What this section seems to be saying to the market is 'All Spatial Data is organized in Collections'... Respectfully, I must disagree.

But really a key point that the 'collections confusion' is obscuring is -> Much of the power of OGC APIs comes from use of modern API tools based on OpenAPI.

For example, SwaggerHub is being used now on 3D Pilot to do in days and weeks what used to take OGC months or even years. Outstanding!

Collections are not really the key point to OGC APIs in my opinion. The point is modern tools based on OpenAPI enable faster API development and deployment.

Regards,
Jeff H

@dblodgett-usgs
Copy link

I see some merit to @jeffharrison's point:

It seems, sometimes, that it may have been carried forward from Feature Collections and just sort of, well, stuck around.

This, to me, is a good argument to spend some time thinking about why the collections end point is re-used in any of the OGC-API extensions not dealing with collections of features.

I can't claim to completely follow what you mean @jyutzler by:

Since [1] and [2] are the same resource, I would just assume they be the same href.

Your nested list is a bit cryptic to really get what you are trying to say. But I think you are alluding to the idea that you don't want your tile pyramids to have to live under a collection?

From my perspective, we are trying to use "collection" as a proxy for dataset. I found what @cportele pointed out above quite profound.

Every API conforming to Features Core provides access to a single dataset (unless the API also conforms to a potential, future multi-dataset extension).

That is, an OGC-API endpoint isn't supposed to have feature-collections that come from different datasets. I need to spend a little more time thinking about this point, but the nature of collections is different than I had been thinking of it seen through this lens.

So should OGC-API Common include collections defined more broadly than "feature-collection"? Perhaps not. We could say:

  • an OGC-API endpoint represents various functional views on a dataset
  • datasets that can be viewed as features can have 1:n /collections
  • datasets that can be viewed as a coverage can have 1:n /coverages
  • datasets that can be viewed as a map can have 1:n /maps
  • datasets that can be viewed as a vector-tile pyramid can have 1:n /pyramids
    etc. etc.

@jerstlouis
Copy link
Member

@dblodgett-usgs

From my perspective, we are trying to use "collection" as a proxy for dataset. I found what @cportele pointed out above quite profound.

What the Collections concept in Common tried to do was to generalize the concept of a "geospatial data layer", providing an abstract conformance class for this. That provides different modular blocks e.g. tiles, maps an attachment point to bind to. Those modular specifications do not specifically need to reference Features or Coverages but can reference that abstract data layer concept to define how it can combine with any conformance class deriving from it.

In order to fit with Features, which already standardized /collections/{collectionID} as where these data layers end up, Common tried to assign that meaning of a "geospatial data layer" to Collections. In that sense, it is not a generic collection of items concept. Coverages had agreed to using Collections in that way (thinking of a Collection as a collection of cells, but you don't necessarily access the 'items' (cells) of such a collection individually.

Unfortunately, it seems that this is causing problems from both ends, because some people still want to use Collections without that implied meaning of a geospatial data layers, and modules providing other types of data (e.g. 3D mesh data) don't see the relevance of the term 'collection'.

Every API conforming to Features Core provides access to a single dataset (unless the API also conforms to a potential, future multi-dataset extension).

That is, an OGC-API endpoint isn't supposed to have feature-collections that come from different datasets. I need to spend a little more time thinking about this point, but the nature of collections is different than I had been thinking of it seen through this lens.

I have argued that the multi-dataset capability is a very common and important use case, and one reason to adopt /collections in Common was to be able to hold both vector features data layers and coverage features at the same level (e.g. http://maps.ecere.com/geoapi/collections/vtp/Daraa2 having OSM vector data and a DTED coverage).

datasets that can be viewed as a map can have 1:n /maps
datasets that can be viewed as a vector-tile pyramid can have 1:n /pyramids

Note that:

  • A tile sets is made up of tiles of data, which is either Features, or Coverages (you could potentially have a mix of both in a multi-layer format supporting both).
  • A map is often made up of multiple presentation layers, each of which can originate from different datasets (features and/or coverages).

The urge to separate these into completely separate resource paths again is reverting to classic services, and we need to resist that urge of running back to the familiarity of WxS, or we're not making progress :)

If we just define: WFS: /collections, WCS: /coverages, WMS: /maps, WMTS: /pyramids.
That is not trying to define an integrated suite of modular APIs, that is changing four quarters for a dollar.

Kind regards,

-Jerome

@dblodgett-usgs
Copy link

Thanks for this @jerstlouis. I appreciate your point and do agree on many fronts.

It is very awkward to have ways of accessing a dataset spread around an API with no top-level integration for the abstract dataset. (ArcGIS REST does this for example., as does the the W*S suite).

A user of a geospatial data server expects a list of datasets, where each entry has a list of access methods. (e.g. GeoServer lists many access methods for an abstract dataset (layer) with no standard API to integrate them).

By putting the abstract-data-layer distinction outside the api specification, we avoid making the distinction at all. As I argued over in #106 I think avoiding the generalization penalty here is going to a wise move and will greatly reduce complexity for implementers -- or at least provide them flexibility needed to add OGC-API specifications to existing software.

I think we should look to the OGC-API Records and possibly a content standard akin to ISO 19119:2016 (Services) that enables us to take on the complexity in earnest if we are serious about having multiple datasets provided through a single API endpoint where the access-views are nested under the root of the API rather than using a 1:1 API:dataset assumption.

@jerstlouis
Copy link
Member

jerstlouis commented Mar 16, 2020

@dblodgett-usgs But we already have a resource at /collections/{collectionID} with basic metadata information such as spatial/temporal extent, with links to representations.

Currently, both OGC API - Coverages and OGC API - Features already provide such a resource at that path, and this allows to have a dataset made up of both Feature Collections and Coverages, or have both a features and coverage representation of the same data.

As an extension for organizing multiple datasets, all we would need is conformance classes for Collections of Collections, and we could have another conformance class Search at the /collections level, allowing to filter Collections e.g. within a certain bouinding box, or having a vector representation.

Regarding records, in opengeospatial/ogcapi-records#20 and opengeospatial/ogcapi-records#27, it was suggested that although some of that overlaps with OGC API - Records, it might actually be complementary functionality to implement this at the Collections level.

@dblodgett-usgs
Copy link

Read up to @jyutzler's take on having collections that behave differently and how this introduces a whole bunch of un-needed complexity.

I totally get it and was of the same mind but am shifting my perspective on this a bit. Let's wait and let some others weigh in here.

@jerstlouis
Copy link
Member

jerstlouis commented Mar 16, 2020

I believe a lot of the perceived complexity results from unsettled dust, as we are still clarifying the concept of collections and representations, and as a result @jyutzler encountered implementation differences fom different providers.

I disagree that "tiles" or "tile pyramids" are a fundamental type of data. Both coverage and vector data can be tiled, tiled data can be used as an exchange mechanism within a daisy chain of processes, server-side rendered map can be returned tiled...

Tiles are a space partitioning mechanism useful for storage and delivery, and there are alternatives such as Discrete Global Grid Systems, subsetting, or simple BBOX + resolution.
These need to be modular so that can be combined with other OGC API modules.

@rob-metalinkage
Copy link

I think @dblodgett-usgs hits this on the head with "It is very awkward to have ways of accessing a dataset spread around an API with no top-level integration for the abstract dataset. "

Before getting too hung up on specific definitions (collections and collections of collections) the priority is for an architectural view of how an actual dataset may be delivered in multiple forms (inc. subsets such as collection hierarchies and tiles etc). When you look at the common needs each access point has to identify the dataset and how the representation it supports relates to that dataset then thats a requirement to push into OGC-common. Look at the ELFIE experiment with Linked Data so see that mechanisms exist already without either a significant burden on either specification effort or implementation compatibility.

@jyutzler
Copy link

the priority is for an architectural view of how an actual dataset may be delivered in multiple forms (inc. subsets such as collection hierarchies and tiles etc).

No, the priority is for an architectural view that serves data in a logical way. How that data can be discovered is a completely different problem and not one that should be solved through data services but through catalog services. In a real-world scenario, no one is going to be accessing landing pages. That information is already going to be harvested into a catalog and hopefully aggregated into common operational pictures and other contexts.

Stay in your lane.

@jerstlouis
Copy link
Member

@jyutzler While I agree on keeping things modular... I think the whole point of having modular conformance classes for the OGC API is that a service developer can decide what constitutes one or more services, by deciding where and how to implement the conformance classes.

If one wants to build one service implementing both catalog functionality and data delivery, and/or present this under one API, that should definitely be possible.

If one wants to write them as separate services and/or present them as separate API end-points, that is also possible.

It is the flexibility and modularity of this approach which is great, all the while keeping things simple by defining a small set of requirements per conformance classes.

@rob-metalinkage
Copy link

rob-metalinkage commented Mar 17, 2020

@jyutzler i dont know where your road is going so happy for you to choose your own lanes :-) I think I am travelling in the direction @joanma747 has been, that out of band cataloguing isnt really working that well. I promise to at least use my indicators...

seriously - i never mentioned cataloging - and by implication you have reinforced my point: "already going to be harvested" - I'm only suggesting metadata is harvestable. You feel that "no one is going to be accessing landing pages" - but in reality GetCapabilities has been the landing page for W*S and everybody relies on them, and finds them somewhat wanting.

so the question is does "in a logical way" mean a strict structure where components can be immediately understood well enough simply by location within the structure, or logical in the sense that it is possible to define how serving use cases are realised? Experience suggests that structure alone is not enough - its either finely grained and extended in too many different ways or generalised and populated in too many different ways. A meta-structure that supports self-description is necessary pending discovery of the perfect one-size-fits-all metadata record.

@cportele
Copy link
Member

Just a side comment:

I think the whole discussion (in this issue and others) is mainly looking at the "sophisticated GIS expert" use case where we have generic clients talking to many servers with potentially very complex offerings, but seems to ignore the non-expert, both on the client and the server side. Supporting them was and is a key driver for the Spatial Data on the Web Best Practices and the Core of WFS 3.0 / OGC API Features. I hope this is true for OGC API in general. Are search engines able to harvest enough to support discovery? Can a non-expert understand an API quickly and use it without reading standards? How can we keep it simple for developers to understand the standards and implement their own APIs? Etc.

@jeffharrison
Copy link

Yep, Clemens has a good point.

Best Regards,
Jeff

@jerstlouis
Copy link
Member

@cportele I don't see why both end of the spectrums couldn't be supported...
I am hoping for the powerful functionality, all the while keeping it all simple and compatible with those core concepts. A simple abstract data layer 'collection' (by whichever name) would allow for this. If we are completely disregarding the powerful generic GIS client, we would not have any interest in the OGC API.

@cportele
Copy link
Member

@jerstlouis - I am not saying that it is not possible to support both targets (in the standards), I really hope it is. My comment was about the apparent focus on the sophisticated end of the spectrum in the discussion.

OGC API standards should be driven by feedback on implementations for draft specifications. If implementations cover the spectrum then this will help to find answers and improve the drafts in this direction.

Which may be part of the problem with Common, since it is not meant to be implemented and used by itself. I am starting to think that maybe we should change the way how Common is developed and managed. Perhaps Common should not be a standard, but more a registry of building blocks for reuse (within OGC, but also by other API designers) and we would only add building blocks to the "Common registry" once we have more than one OGC resource type using a building block. Make Common something that is really built bottom-up, with time and as concepts prove to be valuable in the real-world, instead of the top-down design process that I currently see.

@jerstlouis
Copy link
Member

@cportele Well the value of an abstract data layer was to provide an attachment mechanism between modular building blocks, enabling developers to extend them with their own. I'm not sure what is being perceived as 'sophisticated', I think we are just trying to ensure we really do have modular buildings blocks that can be defined independently, and that they are truly flexible and extensible.

I feel the entire OGC API should be one repository of building blocks, made up of one or more conformance classes, and each of these would have an associated development status, e.g.

  • concept development
  • draft specification
  • proposed standard
  • approved standard

I would not wait to add them to the registry, which could result in duplicated effort rather than encouraging collaboration. The registry could also provide things like implementation and usage statistics, so even if we have a vast number of building blocks in there people can still identify the key ones, or the ones picking up steam, and that can be also considered for the standards program to know where to focus standardization efforts.
But this could be open to anyone to contribute buildings blocks, beyond the standards program.

@jeffharrison
Copy link

Clemens said...

' I am starting to think that maybe we should change the way how Common is developed and managed. Perhaps Common should not be a standard, but more a registry of building blocks for reuse (within OGC, but also by other API designers) and we would only add building blocks to the "Common registry" once we have more than one OGC resource type using a building block. Make Common something that is really built bottom-up, with time and as concepts prove to be valuable in the real-world, instead of the top-down design process that I currently see.'

Yes! But I think the building blocks would need to 'show' they've been tested and work efficiently.

Best Regards,
Jeff

@cportele
Copy link
Member

Yes! But I think the building blocks would need to 'show' they've been tested and work efficiently.

Agreed. That's the "prove to be valuable in the real-world" bit.

@jerstlouis
Copy link
Member

@cportele I think valuable is relative. Some modules might be highly valuable to a few organizations, while others are widely used by most. This is why I am suggesting to throw everything in that repository, but provide different ways to organize and highlight modules of interest and of different maturity levels.

@jyutzler
Copy link

I can't believe what I'm reading. Are we really disagreeing that datasets are different conceptually than holdings like feature collections, tile pyramids, etc.? A dataset is realized through a holding, it is not the holding. The fundamental need is for standards that allow these holdings to be understood and accessed by clients through data services. That's the whole reason we are building APIs for features, tiles, etc.

When a client accesses a data service, it needs to be able to discover certain metadata about the service. Maybe schema, maybe queryables, maybe tile matrix sets... each resource type is different. The structure of these metadata documents is going to be different. By conflating datasets and holdings into the same concept (collection) you are making the entire thing harder to use, not easier. What is the schema of a collection as proposed here? The answer to that question is undefined and so generic collections can only be supported in client software through an understanding of the gamut of OGC APIs.

While there could at some point be a role for dataset services, it is a different layer of the architecture. Let's treat it as such.

@jeffharrison
Copy link

I can think of quite a few OGC API building blocks that have been extensively tested and proven to be valuable over the last 2 years... and yes, I think the community needs things like Tiles API, Features API, Styles API etc ;-)

Best Regards,
Jeff

@dr-shorthair
Copy link

dr-shorthair commented Mar 19, 2020

Well I only just found this thread (started in January when Australia was distracted by other things ...). So currently the discussion of 'collections' is getting an outing here #111 here opengeospatial/ogcapi-environmental-data-retrieval#24 and here (self), at least.

This one opengeospatial/ogcapi-environmental-data-retrieval#24 is interesting as it recalls a discussion from >15 years ago when the application of the notion of 'collections' morphed into 'offerings' in the context of SOS. In that case it was clearly just a aid to constructing a query that returned a non-empty set. I wonder if that is actually the common thread? i.e. collectionID is a query parameter.

@heidivanparys
Copy link
Contributor

heidivanparys commented Mar 19, 2020

OK, but if we're going to do this, "collection" should be defined in Terms and Definitions.

In the OGC API features it says:
"a set of features from a dataset"

In common we could come up with something like:
"a set of one or more elements from a dataset identified by an id. The service can expose a collection in zero of more representations: e.g.: a collection of features, a coverage of one or more range types, tiles representing features of coverage data, a maps representing features or coverage data, etc".

@joanma747 OGC API - Features does not define "collection", it defines "feature collection", and says that, in that standard, "collection" is used as a synonym for "feature collection". This is much clearer in the ISO-version that uses the ISO rules for formatting terms and definitions:

image

As I had understood it, the term "collection" in Common is in line with the notion of "collection" in normal language. So I would like to suggest the following definition:

collection
group of objects

[SOURCE: Oxford Learner's Dictionaries, def. 2, modified — removed "or people"]

IMO, the definition of the term "feature collection" should not refer to "dataset", but to "collection" instead, as "feature collection" is a subordinate concept of "collection". An attempt:

feature collection
collection in which the objects are features

This definition would pass the substitution test. E.g. the sentence " The dataset contains buildings and the server provides access to them through a single feature collection" becomes " The dataset contains buildings and the server provides access to them through a single collection in which the objects are features".

Definitions in other standards could be done in a similar way.

record collection
collection in which the objects are records

As a programmer who's working with these concepts peripherally, "Collections" in particular (and the fact that it reads like a plural noun but refers to a singular thing) is really quite awkward to properly discuss, in person and in comments. There's confusion in referring to multiple instances of the Collection resource, and there's awkwardness in referring to multiple instances of the Collections resource. Even having fully grasped what is meant by these concepts, the name itself is awkward to talk around.

@bbarden I agree, and I commented on that earlier in opengeospatial/ogcapi-features#217 . There I proposed that "collection of collections" or "collection of feature collections" should be used instead, and that in general, the name of a resource type should be singular (buried in the other comments, it's a lengthy thread).

@dblodgett-usgs
Copy link

I think this issue, at least as originally posed, has been settled in #140. See: #140 (comment)

@cmheazel
Copy link
Contributor Author

The SWG agrees that this issue has been addressed by PR 149 and can be closed.
Moved: @cportele
Second: @jeffharrison
NOTUC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Collections Applicable to Collections (consider to use Part 2 instead) Resources of Collections type Issues related to the /collections path
Projects
Development

No branches or pull requests