It is important for data sources to be able to advertise their APIs and Datasets, so catalogs are able to pick them up. There are two important pieces in doing this:
- Files or endpoints that catalogs or search engines can access to find the appropriate APIs and Datasets. This is similar to how sitemaps.xml works for web sites.
- A standard format for describing APIs and Datasets, so that callers can gather the appropriate information. There are a number of formats suggested for this, this standard will start with the CKAN's Data Catalog Schema and Protocol modifications on top of DCAT
Any data source that provides datasets or Apis can expose them to catalogs and search engines through a few endpoints. These enpoints are specifically created so they can be implemented through either simply creating a file or building a more dynamic web service to implement them.
The available endpoints are:
- /catalog.xml is an endpoint that contains all the resources contained by a data source. This can include datasets, Apis, and visualizations.
- /apis.xml is an endpoint that contains all the APIs surfaced by a data source. This will ONLY include Apis. The purpose for pulling this file out separately is to make it easier for many API catalogs to provide programmatic access. These SHOULD be implemented by includes in the Catalog.xml.
In addition, Catalogs MAY want to provide query-able endpoints above and beyond Catalog.xml. In this case, they are free to do so, but the responses should correspond to the same format as below.
NOTE: It would be nice to standardize on the catalog query api as well. This could be an area of more work.
We think about the format of the endpoints in two ways, there is the data model and the actual file formats created.
The data model is based on DCAT, and the default file format is XML. The XML format is based on RDF, however, some dynamic endpoints MAY decide to offer other formats such as N3 or JSON.
Range: dcat:distribution Description: Describes a resource provided by this data service (or catalog). This is based on a dcat:distribution, however extends it with a number of properties to add context, since this element is not showing up as part of a catalog entry.
Property | Type | Description |
---|---|---|
title | dct:title | The name or title for this representation. If not present, implementers should use the name from the related Dataset. |
description | dct:description | Text describing this representation. If not present, implementers should use the description from the related Dataset. |
identifier | dct:identifier | Unique identifier for this data representation |
keyword | dcat:keyword | Keywords specific to this representation. If the dataset has keywords as well, the keywords relating to this object should be the union. |
dataset | rdfs:Resource | Reference to the datasets this resource is a part of. In the case that this aggregates or uses multiple data sets, this will contain all the datasets referenced. |
last_modified | xsd:datetime | The time this was last modified |
created | xsd:datetime | The time this was initially created |
authentication | Literal | Authentication type required for this resources. "None" means it is open to the public. "Basic" means basic authentication. "OAuth" and "OAuth2" refer to OAuth version 1.0 and 2.0, respectively. This ONLY refers to mechanism and not how authorization is done. |
contactEmail | Literal | An optional field with the email address of the person to contact about this dataset. |
documentation | rdfs:Resource | The documentation for this site. Should link to a human readable documentation. |
Clarifications Every resources MUST have at least one dcat:format property.
Range: ods:representation Description: This is an API that accesses a particular dataset or set of datasets.
Property | Type | Description |
---|---|---|
client_library | rdfs:Resource | References a client library that can be used to access this API. |
api_type | Literal | A string denoting a particular api this endpoint supports. |
Examples
<ods:api>
<dct:title>SODA Api for Current Recalls</dct:title>
<dct:description>This is the SODA Api for getting access to all the egg products that are actively being investigated</dct:description>
<dct:identifier>ongoing-egg-recalls</dct:identifier>
<dcat:accessURL>https://fda.demo.socrata.com/resource/ongoing-egg-recalls</dcat:accessURL>
<dcat:keyword>Eggs</dcat:keyword>
<dcat:keyword>Recalls</dcat:keyword>
<ods:authentication>Basic</ods:authentication>
<ods:documentation>https://fda.demo.socrata.com/developers/docs/egg-recalls</ods:documentation>
<ods:client_library>https://github.com/socrata/soda-java</ods:client_library>
<ods:client_library>https://github.com/socrata/soda-scala</ods:client_library>
<ods:client_library>https://github.com/socrata/socrata-php</ods:client_library>
<ods:client_library>https://github.com/socrata/soda-js</ods:client_library>
<ods:client_library>https://github.com/socrata/socrata-python</ods:client_library>
<ods:client_library>https://github.com/socrata/socrata-api-csharp</ods:client_library>
<ods:contact_email>someone@fda.gov</ods:contact_email>
<!-- MIME Types provided -->
<dct:format>application/json</dct:format>
<dct:format>application/rdf+xml</dct:format>
<dct:format>text/csv</dct:format>
<!-- What type of API it is -->
<ods:api_type>rest/soda</ods:api_type>
<ods:last_modified>2012-10-10T16:46:00+800</ods:last_modified>
<ods:created>2012-10-10T16:46:00+800</ods:created>
</ods:api>
Range: ods:representation Description: This is an chart.
Example
The following example, would surface a chart that is based on several different datasets, for each year of egg recalls. Since, this does not neatly fit into the dataset down approach of DCAT, we annotate this with the dataset property from ods:Representation
<ods:chart>
<dct:title>Number of recalls per manufacturer (2004 - 2012)</dct:title>
<dct:description>This chart graphs the number of egg product recalls per manufacturer for the years 2004 - 2010</dct:description>
<dct:identifier>recalls-per-manufacturer</dct:identifier>
<dcat:accessURL>https://fda.demo.socrata.com/resource/recalls-per-manufacturer</dcat:accessURL>
<dcat:keyword>Eggs</dcat:keyword>
<dcat:keyword>Recalls</dcat:keyword>
<ods:contact_email>someone@fda.gov</ods:contact_email>
<!-- MIME Types accepted -->
<dct:format>text/html</dct:format>
<dct:format>image/png</dct:format>
<!-- Since this chart uses data from many different datasets, it references them here -->
<ods:dataset>https://fda.demo.socrata.com/resource/egg-recalls-2004</ods:dataset>
<ods:dataset>https://fda.demo.socrata.com/resource/egg-recalls-2005</ods:dataset>
<ods:dataset>https://fda.demo.socrata.com/resource/egg-recalls-2006</ods:dataset>
<ods:dataset>https://fda.demo.socrata.com/resource/egg-recalls-2007</ods:dataset>
<ods:dataset>https://fda.demo.socrata.com/resource/egg-recalls-2008</ods:dataset>
<ods:dataset>https://fda.demo.socrata.com/resource/egg-recalls-2009</ods:dataset>
<ods:dataset>https://fda.demo.socrata.com/resource/egg-recalls-2010</ods:dataset>
<ods:dataset>https://fda.demo.socrata.com/resource/egg-recalls-2011</ods:dataset>
<ods:dataset>https://fda.demo.socrata.com/resource/egg-recalls-2012</ods:dataset>
<ods:last_modified>2012-10-10T16:46:00+800</ods:last_modified>
<ods:created>2012-10-10T16:46:00+800</ods:created>
</ods:chart>
Range: ods:representation Description: This is a map.
Range: ods:representation Description: This is a dataset that is derived from the datasets in the Resource:dataset property.
- specification.mkd - The detailed spec (work currently in progress)
- notes.mkd - Various notes from our learnings
- examples/ - Examples of catalog entries and listings in different formats
If you'd like to contribute, please do! There are many ways to do so:
- Join the conversation on the Google Group for Open Data Standards
- Add your own comments to the spec and send us a pull request
- Add issues to the issue tracking system for this project
- DCAT Working Draft
- GeoJSON
- Dublin Core
- Vocab.org Gregorian Intervals
- CKAN Data Management System Documentation
- CKAN Data Catalog Schema and Protocol v0.1
- Data.gov's API Catalog proposal
This work is all licensed Creative Commons By-Attribution 3.0