Guidance on defining collections to group datasets released as a series or in fragments

Data.gov has had the notion of a "collection" that can be used to group multiple datasets that would logically be considered a single dataset, but have been released in separate parts. The most common scenario for this is a series of release over time. In some cases a dataset may by published in monthly or yearly releases, but if the only thing that distinguishes these is date, then they should really be packaged as a single dataset. This also makes browsing simpler - it prevents many similar datasets from crowding out more unique ones. Some datasets might also be published by location, such as data relating to each state being released as a separate file. These should also be grouped together to appear as a single dataset. 

Ideally agencies should package these all together as a single file/release before publishing, eg one file that is continuously updated is preferable to separate releases over time, but at the very least there should be a way to define this kind of packaged grouping at the metadata level as is currently the case on data.gov. 

The way data.gov handles this is that the collection is essentially treated just as a normal dataset entry but it refers to many child entries. Something similar could be done with the data.json schema, but we would need to establish a convention for defining that parent/child relationship between entries. 

Here's a current example of a collection on data.gov

View of the collection "parent" metadata:
http://catalog.data.gov/dataset/tiger-line-shapefile-2010-series-information-file-for-the-2010-census-block-state-ba

View of all its "child" datasets: 
http://catalog.data.gov/dataset?collection_package_id=2a8b7f0b-1ae5-453c-ba56-996547266a63


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Guidance on defining collections to group datasets released as a series or in fragments #258

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Guidance on defining collections to group datasets released as a series or in fragments #258

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions