a MATLAB function to read data package formatted data
The MATLAB function datapackage
reads in formatted data that conforms
to the
dataprotocols.org tabular data standard.
The standard is designed to be easily transmitted over HTTP or be saved on a local disk.
In short, a tabular data package contains two or more files:
datapackage.json
- one or more tabular data files in CSV format
The datapackage.json
file contains meta information pertaining to the
data files, including:
- dataset name, description, and license
- description of data files
- data fields (column) information (e.g. name, type)
Examples of data distributed in datapackage format can be found from the Open Knowledge Foundation.
To use the datapackage
function:
- Download from the file from the MATLAB Central File Exchange.
- Unzip the file and place the file
datapackage.m
on your MATLAB path (e.g. yourMy Documents/MATLAB
folder on Windows). - Use the function (see examples below).
This GitHub repo is a development library. To contribute fork this repo. See instruction for the development version, below.
The datapackage
function reads formatted data from either a URL or
local file path. The function first searches for the
datapackage.json
file, which determines which CSV
files will be loaded.
For example, you can download a datapackage off the web:
% Note, the trailing "/" is important
[data, meta] = datapackage('http://data.okfn.org/data/core/gdp/');
You can load a datapackage
locally:
% The trailing "/" is also important
[data, meta] = datapackage('C:\path\to\package\');
-
In-line data, that is contained in the
datapackage.json
file, is not supported. It is not clear if this is even allowed per the standard:All data files MUST be in CSV format
-
The field (column) attribute types
date
anddatetime
are converted to a MATLAB numerical date format using the built-indatenum
function convert a number string. The call todatenum
has no format string specified, so it seems like it is quite likely to give up. In this case, the function keeps the date as a string. -
Quote characters in CSV files other than the double quote (") are not supported. This is because the underlying MATLAB function (
textscan
) has no facility for this. -
Since MATLAB R2013b, the table data type has been included in base MATLAB. Previously, the dataset was included in the MATLAB Statistics Toolbox. The
datapackage
function will default to returning atable
object. If the MATLAB release is before R2013b, thedatapackage
function will return adataset
object. If neithertable
nordataset
is available, the function will return an error.
This repo is a development library, including a dependent JSON library (JSONLab)
In order to install run the following:
git clone https://github.com/KrisKusano/datapackage.git
git submodule init
git submodule update
Because MATLAB has no native JSON reader, this project uses the open
source jsonlab
function loadjson
(and the unit tests use
savejson
). The project home page can be found
here.
The file for download from the MATLAB Central File Exchange website is
made by running make
. The makefile
combines the datapackage.m
file
with the loadjson.m
file from jsonlab
and creates a zip with the
license in the bin/
directory.
Unit tests are contained in the file ./tests/datapackagetest.m
. The
unit tests use MATLAB's built-in unit test
framework.
To run the tests, run the following from within the ./tests/
directory:
results = runtests('datapackagetest.m');
If there is minimal printout to the command window, then all tests passed.
In addition to unit testing, the file ./tests/coredatasetstests.m
attempts to load the 20 "core" data package
sets. First, all files are read in using
default settings (no optional name/value pairs). At the time of writing,
one data packages requires additional name/value pairs in order to avoid
errors.
The MATLAB Central File Exchange and this source code are distributed under the BSD-2 License.