So you don't have to re-read the previous thread I'll copy over the last few posts on #261

From @philipashlock

my preference for a future version would be to put webService in the distribution array and to be paired with something analogous to a the format field for a URL or identifier that conveys what people should expect at the endpoint URL.

If we needed a new term for this, perhaps it would be called endpointType or serviceDefinition. An example would be a URI or identifier to denote that the endpoint URL specified is a description of the API represented as Swagger, RAML, or API Blueprint or even a specific kind of standardized endpoint like those based on Atompub (eg GData/OData), or WFS and WMS endpoints, or an Open311 API, or even generic database APIs like those associated with CouchDB and MongoDB. Some data catalogs are already set up to recognize special endpoints like this, eg with extensions CKAN can provide additional features for WMS endpoints or data stores supported by Recline.js like the CKAN Datastore.

The current POD schema differentiates file downloads from queryable/interactive endpoints with accessURL + format for file downloads and just webService for endpoints.

The analogous approach in DCAT is that file downloads are represented with downloadURL + mediaType while endpoints use accessURL but accessURL is meant to be inclusive so it could also be used for file downloads or even a landing page.

The distinction between format and mediaType is covered on #272

For reference, in DCAT this is how things are defined:

dcat:accessURL
A landing page, feed, SPARQL endpoint or other type of resource that gives access to the distribution of the dataset

Use accessURL, and not downloadURL, when it is definitely not a download or when you are not sure whether it is.

If the distribution(s) are accessible only through a landing page (i.e. direct download URLs are not known), then the landing page link SHOULD be duplicated as accessURL on a distribution.

source: http://www.w3.org/TR/vocab-dcat/#Property:distribution_accessurl

dcat:downloadURL
A file that contains the distribution of the dataset in a given format

dcat:downloadURL is a specific form of dcat:accessURL. Nevertheless, DCAT does not define dcat:downloadURL as a subproperty of dcat:accessURL not to enforce this entailment as DCAT profiles may wish to impose a stronger separation where they only use accessURL for non-download locations.

source: http://www.w3.org/TR/vocab-dcat/#Property:distribution_downloadurl

From @smrazgs

I think this reflects a lack of clarity about the scope and purpose of the metadata that's being constructed. In the webby world of html pages and web applications, having a URL associated with a resource is pretty straight forward. In the world of data, you have to think a little more deeply about what its for. Here's a perspective:
The user is looking for data-- they want information about something. The first concern is finding the data at a fairly abstract level, something like 'water quality information in my county', 'particulate concentrations near power plants', 'average income of people with pink hair', 'how many gallons of milk were produced in Wisconsin last year'. Once they find something that looks like what they need, they have to figure out how to get it in a form they can use, and whether or not they trust the source of information.
Data can be distributed in many ways--and describing the 'ways' you might get data in a way that machines can use it is a tricky problem. Sure, downloading CSV files is cool and easy, but what do those pesky column headings like 'avginc', 'mmmgpp','daysToMarket','24yly' mean? And as someone pointed out earlier, so you give me a URL for an API endpoint, how do I automate a client to use that. More realistically, the data is probably available through multiple distributions, and the client software ideally would be able to inspect a collection of links in the metadata (DCAT, ATOM, ISO19139...) and figure out which one the software works with. Delve into Cat-interop for more discussion and links...

The bottom line is that 1) many resources are available via multiple distributions, and these should be describable in the metadata in such a way that automated clients can use them (HATEOS if you like REST), so the distribution needs to allow multiple values; and 2) description of the links to make them machine-actionable requires associating properties with the links.

Improve metadata structure for listing web APIs #291

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions