-
Notifications
You must be signed in to change notification settings - Fork 591
Improve metadata structure for listing web APIs #291
Comments
So you don't have to re-read the previous thread I'll copy over the last few posts on #261 From @philipashlock
From @smrazgs
|
@smrazgs I think we're in full agreement here. There a few different aspects to this problem that don't have widely agreed upon conventions though. For example:
With these standards in place this metadata can be referenced in just the same way that any other media type would be and then the data can be interrogated programmatically. Using a media type to identify these specifications is just one possibility though, they could just as easily use a URI as is done with XML namespaces. There are a number of other issues associated with APIs that aren't covered by these specs though, like where to go to get an API key, links to a staging or production version of the API, etc. For a lightweight spec that attempted to address things like that across many APIs, see the Service Discovery spec that has been implemented by most governments using Open311. Another issue with APIs in the context of this metadata is that many APIs combine multiple datasets. I think there are probably a number of scenarios where it would make more sense to create a whole new catalog entry for an API rather than just list it as another resource as part of an existing entry. |
@philipashlock good idea to start a new thread here. The back trail on this is long and has a low information density. as for the issues you point out above:
the issue is how to communicate which convention for the self description document is being used (add wsdl, HAL, WADL, getCapabilities, OpenSearchDescription to your list). From my survey of practice, most well behaved services provide a standard request that returns a self-description document; the trick is you have let the client know up front which of the many possible conventions the endpoint you're giving them conforms to.
I think you're referring to the problem we're trying to address at github/cat-interop @tomkralidis
in the XML world these are xml schema, if I understand you correctly; in the JSON world we have JSON schema, in the rdf world rdfs... The issue is that there is a service protocol (http being the most common, but lots of implementations tunnel requests through http), an encoding syntax (xml, json, csv, NetCDF...), and an application scheme that communicates the semantics of the particular encoding practice (GeoSciML, WaterML, VOID, DCAT...). For a client to automate interaction with a data service, all of these factors want to be known up front; otherwise you play a giant guessing game, which might work... |
This is what I was addressing by the second point. For example
I think we can almost always assume HTTP in the context of a web based catalog
This is where we need better identifiers, but again new media types like
Most of these are well established standards with defined MIME types
|
I think my preferred approach to addressing this problem would be to simply align with the current version of DCAT which is to say:
|
An API/service can provide multiple media types, so perhaps a defining the resource type may be useful, and then using For example an OGC:WMS can provide an addressable URL to something that would be a thumbnail. Which, in this case, I wouldn't define the URL as a WMS, but a simple thumbnail/browse image (which, in this case, just happens to be realized via OGC:WMS). At the same time I could specify an OGC:WMS base URL and, perhaps, the resource name (layer name in WMS speak) as a means to provide minimal information, which then the client could bind to. Food for thought. |
👍 separate the type of the resource from the response formats it can produce. Sent via the Samsung Galaxy S™ III, an AT&T 4G LTE smartphone -------- Original message -------- An API/service can provide multiple media types, so perhaps a defining the resource type may be useful, and then using mediaType if required. I'm not sure about using a media type to identify an API. For example an OGC:WMS can provide an addressable URL to something that would be a thumbnail. Which, in this case, I wouldn't define the URL as a WMS, but a simple thumbnail/browse image (which, in this case, just happens to be realized via OGC:WMS). At the same time I could specify an OGC:WMS base URL and, perhaps, the resource name (layer name in WMS speak) as a means to provide minimal information, which then the client could bind to. Food for thought. — |
@philipashlock I think perhaps you're trying to solve a narrower set of problems than I'm thinking about. Please give this discussion paper a read and then we can continue the conversation if we're actually working on the same problem (which I think we should be... given the scope of what data.gov is supposed to do). |
Sorry if I wasn't clear before, but I think the confusion may be that what I'm describing introduces another level of abstraction by simply referring to standardized API metadata documents (eg Swagger, RAML, API Blueprint, IODocs) that describe the API rather than try to describe the API directly. You do still need a way to identify what type of API metadata documents those are and it seems like a media type would be a fine way to do that. Then the API metadata documents themselves are what specify the media types available from the API itself. @smrazgs I didn't read the paper too thoroughly, but I think we're still mostly on the same page. As an aside, I had to clone the repo to view the file since it doesn't look like github supports downloading the .doc file for some reason - probably just karma for using that format rather than a web friendly one to talk about hypermedia ;) However, from what I can tell that proposal still depends on there being a separately known definition for whatever is specified by |
Yes, if the distribution is a non- http ROA type endpoint, then you need to tell the user what kind of overlayAPI is being used, and provide link to the service selfDescription document. The client software has to be able to recognize the identifier for the overlayAPI (thats what alot of the the Cat-interop discussion is about), to know if its one that the client can work with. I'm interested in other kinds of distribution-related links--templates, example direct data requests, distributions through services that offer lots of datasets so the metadata has to provide some kind of parameter (layer name, feature type...) for the client to know how to construct a request, different information models and profiles on the same media type. Its certainly subject to debate how much to tell the client up front in a list of DCAT:distribution, gmd:CI_onlineResource, atom:links etc,. and how much to put in more specific service description docs that the client has to get and process. My tendency is to try and get more information upfront. For now the open data project should provide lots of guidance and examples for lots of different kinds of distributions (OGC services, OpenDAP, HDF, OData, ugly-ole WS... as well as simple file-download) on conventions for accessURL, mediaType, format. The beauty of RDF is that its not hard to extend the content. |
While I like the idea of auto-discovery of API (SOAP WSDLs do this for the ESB world and OGC GetCapabilities do this for OGC specs to some extent) there would be a lot of work to be done to make descriptions like these for the various API. OGC services use various specs to define what requests you can send and what you can expect back. it is one thing to know there is a call It also appears that those documentation systems for RESTful web APIs can use some harmonization to avoid having to implement all of them just in case a client app only understands RAML. is there a role for Data.gov to facilitate or at least drive the developers of those documentations systems toward that standardization? |
@justgrimes expanded the discussion on recognizing links in #293. I suggest including his suggestions here. |
I'd like to see some discussion of what you are trying to achieve with the proposed changes. What would more information about web services make possible? Why is the additional complexity important? |
If you want a machine agent to be able to process the metadata and get the user connected to the data through a service (not a file download or web page with instructions on how to get the data), then the link has to have more information that just a URL. Machine actionable links enable workflow composition (that's REST). |
The work around the |
Note that 1c93d7f addresses part of this issue by deprecating |
I think this has been addressed by several changes that were not specific to APIs, but should be sufficient in addressing the issues discussed here. As Gray mentioned, we've made some changes (#217, #330, #335) so that For APIs that also have machine readable documentation (like Swagger, RAML, API Blueprint, etc) the approach described in #332 is just as applicable for APIs accessible via For example
Here's an example of these fields used in a distribution:
For API specs that are more likely to be understood as conforming to an existing standard rather than interrogated via the |
Changes that still need to be addressed are changes in structure and should we add usage notes additions here or no?: * Adds optional describedByType field at the dataset and distribution level (#291, #332) * Changes contactPoint field to an object that contains the name (fn) and email address (hasEmail) (#358) * Adds fn field as part of contactPoint replacing earlier use of contactPoint (#358) * Changes publisher field to an object that allows multiple levels of organizations (#296) * Changes accessURL field to represent indirect access and to exist only within distribution (#217, #335) * Changes format field to a human readable description and to exist only within distribution (#272, #293) * Adds optional description field for use within distribution (#248) * Adds optional title field for use within distribution (#248) * Changes accrualPeriodicity field to use ISO 8601 date syntax (#292) * Changes distribution field to become required-if-applicable and to always contain the accessURL or downloadURL fields (#217) * Changes license field to be a URL (#196)
Thank you for driving the conversation around this issue and helping to assemble the v1.1 metadata update. There appears to be strong consensus around this issue, which has been accepted in the v1.1 update and merged into Project Open Data. However, we know that more can be done to improve how APIs are addressed within the schema. It's important for government staff as well as the public to continue to collaborate to make the Open Data Policy ever better. Though the v1.1 update is a substantial update, future iterations do not have to be, so whatever your ideas - big or small - please continue to work with this community to improve how government manages and opens its data. |
This is a new issue for discussion from #261 and #224 that transitioned into a broader topic around the purpose of
webService
and the metadata around it. Some of the earlier discussion around webService occurred in #37The text was updated successfully, but these errors were encountered: