Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What are the characteristic resources of EDR? #24

Closed
lieberjosh opened this issue Mar 17, 2020 · 9 comments
Closed

What are the characteristic resources of EDR? #24

lieberjosh opened this issue Mar 17, 2020 · 9 comments

Comments

@lieberjosh
Copy link

Reading the draft specification, I can see what EDR is trying to allow clients to "do", but it is unclear to me what the resources are that the API is returning representations from.

My suspicion is that this is really a sampling API. Each fundamental resource is a Feature of Interest with some number / type of observed properties and results. Clients are requesting responses that represent samples of that FoI with specific nD geometries and 1:n results of 1:n observed properties pertaining to each sample.

Of course, being a RESTful service, the API pretends that these sample resources are there just waiting to be fetched by the right query, although there are most likely a limited number of processes behind the scenes that are able to synthesize specific types of sample features on demand such as sample points, profiles, or traverses.

My question is whether this helps to characterize and explain the EDR API. It may also explain the ambiguity of applying a /collections/{collectionID}/ formalism, since each {collectionID} can be thought of as both a single FoI resource and a collection of corresponding sample feature resources. It also places EDR in counterpoint to the processing API developed in the 3D-IoT Pilot, which synthesizes aggregate FoI's from sample observations (e.g. air quality grids from air quality point measurements) that in turn can be sampled for specific city regions such as street blocks.

This may just be confusing, but it may have the potential to help define a distinct role that each API performs in organizing information resources, rather than leaving each to struggle with increasingly strained interpretations of "collections of collections of items"

@dr-shorthair
Copy link

Ah - reminds me of the early days of SOS design. Addressing the same question, SOS came up with 'observation offerings' which corresponded to the populated regions of the data-cube.

This 'collections' thing has been hanging around since early WFS days, and followed on from 'layers' in WMS. The underlying requirement appears to be the ability to have named subsets of a data source, with some kind of logical homogeneity, but which is different depending on the data delivery style.

@lieberjosh
Copy link
Author

That's a little embarrassing, since the 'observation offerings' were my idea for finding useful data in very sparse cubes.

Jeff Yutzler's point with collections, though, is that even if you reuse the name, code still has to be able to identify and process the very different resources / sub-resources that may be identified as "collections" in different API's.

@m-burgoyne
Copy link
Collaborator

I like the definition of EDR as a sampling API, it does seem a better way of describing what it is trying to achieve. The concept being the publisher is responsible for making their information available in a form that fits with the query interfaces as a collection, which is a different approach to the existing OGC interfaces which are designed to work with specific data structures.

@chris-little
Copy link
Contributor

chris-little commented Mar 18, 2020

. I also like the idea that EDR API is a 'Sampling API' The query is creating a discrete sampling geometry against the resource of a relatively persistent data store, then the query response is a transient resource, (probably) with the same geometry as the query.
The query itself could be persisted for future use, another resource.
The specific results returned could also be persisted, another resource.

@lieberjosh @dr-shorthair Does this bottom out the issue? I can add a couple of sentences capturing this view to the EDR API candidate spec introduction.

@dblodgett-usgs
Copy link
Contributor

Just going to restate this-- I think it's very important.

Jeff Yutzler's point with collections, though, is that even if you reuse the name, code still has to be able to identify and process the very different resources / sub-resources that may be identified as "collections" in different API's.

One thing I'd add is that EDR is agnostic to whether what is being sampled is a digital data cube that could be sampled anywhere or pre-existing samples of the real world. Maybe you all are way ahead of me as usual, but I think that is a critical point to getting this to work in a simple way for a broad array of data types.

@chris-little
Copy link
Contributor

We only have to be a half a wheel in front of you @dblodgett-usgs !

@lieberjosh there is also a hidden assumption, touched upon by you, that these datastores/cubes/collections are dense, not sparse, so that appropriate interpolation is not a difficult or impossible concern.

@dr-shorthair
Copy link

dr-shorthair commented Mar 18, 2020

@lieberjosh @dr-shorthair Does this bottom out the issue? I can add a couple of sentences capturing this view to the EDR API candidate spec introduction.

I think I like the conceptualization of EDR as a sampling API.

The 'collections' discussion is proceeding elsewhere, but it looks like there is a pretty consistent story about it being a query parameter, and not part of the canonical identifier - e.g. see opengeospatial/ogcapi-common#111

@chris-little
Copy link
Contributor

Added to clause 6 overview:
The EDR API can be considered a 'Sampling API'. The query creates a discrete sampling geometry against the resource of a relatively persistent data store. The query and its response are transient resources, which could be made persistent for re-use if required."

@lieberjosh
Copy link
Author

@chris-little Two observations and then I'll leave off this and go tilt at windmills, er Collections.

  1. It is important that the (RESTful) API provide access to sample resources without a need to know whether they are being created on the fly, have been cached, or just exist fortuitously. Of course, the service needs a description of what samples / properties are being provisioned. The shortest path at least for now seems to be to do this by describing what queries can be used to retrieve them.

  2. As you noted, a permissive query policy should generally mean that the ultimate Feature of Interest is reasonably densely represented (and/or that interpolation is up to the task). This is not always the case, though, and the service will need to describe limitations on queries, e.g. samples are only available at river gauge locations, not just anywhere along the river, even if the time domain is arbitrarily dense. Ideally this query constraint would not require digging into great detail about the source data in order to be able use the "convenient" EDR API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants