Skip to content
Emmanuel Blondel edited this page Jun 17, 2016 · 140 revisions

rsdmx - Tools for reading SDMX data and metadata in R

rsdmx is a package to read SDMX data and metadata in R. It provides an SDMX format abstraction library, and an SDMX web-services interface, including a embedded list of well-known national and international data providers.


With the aim to guarantee the sustainability of rsdmx, rsdmx is currently seeking for institutional or individual sponsors to fund the rsdmx package development in order to (1) enhance and strenghten existing functionalities, (2) provide new features, and (3) ensure rsdmx maintenance and users support.

If you wish to sponsor rsdmx, do not hesitate to contact me


For citation, please use DOI

Do you need support to use rsdmx? You may consider to:

Do you use rsdmx? your feedback is welcome! You may consider to:

Table of contents

1. Overview & Vision
2. Package status
3. Success stories
3.1 Inventory of Data Sources
3.2 Projects using rsdmx
4. Credits
5. Fundings
6. User guide
6.1 Install rsdmx
6.2 readSDMX
6.2 Examples
6.3 Community use cases
6.4 R documentation
6.5 User mailing list
7. Developer guide
7.1 How to contribute
7.2 Unit tests
7.3 Build tests
7.4 Developers mailing list
8. Issue reporting

### 1. Overview and vision ***

rsdmx intends to develop an iterative approach in order to read SDMX data & medata documents. The schema below illustrates the current scope of rsdmx and its vision to facilitate the use of SDMX data & metadata in R:

rsdmx

Low-level SDMX format abstraction library

  • Today: Its primary role and current emphasis is to provide a low-level SDMX format abstraction library, supporting SDMX 1.0, 2.0 and 2.1 format standards. Such low-level offers a flexibility required to read SDMX data whatever their location (web or local resources), and the way they are provided (through web-services, or not), hence to guarantee that most of the SDMX datasources could be read in R
  • SDMX-ML Format, and more ? Currently rsdmx focus on the SDMX-ML (i.e. SDMX - XML) format. Other formats, like SDMX-EDI or SDMX-JSON could considered at later stage.
  • What abour writing SDMX documents ? Reading SDMX-ML documents is very useful when you need to extract and analyze data from scattered sources. What about writing? Indeed, SDMX-ML remains an exchange format, having the capacity to write R objects (such as data.frame) would be a step forward in statistical data exchange. One possible future objective of rsdmx is to provide a SDMX-ML writer to let R users export their data to SDMX format. In same way a user could read data with readSDMX, he could write some data analysis output with writeSDMX!

SDMX Web-services interfaces

  • Currently, rsdmx does not provide an interface to web-services that implement SDMX web-service standards. The main reason, is that many SDMX datasources are not provided through SDMX web-services: SDMX documents are published in an adhoc manner (single SDMX datasets, zipped files); Some web-services also redirect to zipped SDMX files when the amount of data becomes huge; other documents are exchanged between people, etc... In order to guarantee that all SDMX datasources could be read in R, whatever they are local, remote, published with SDMX web-services, or not, rsdmx started with a flexible, generic low-level approach.
  • Building SDMX web-services interfaces with rsdmx is however considered in the package vision, as it would facilitate the interaction with SDMX-web-services and the data extraction, for those data sources that offer these web-services.

SDMX Graphic User Interfaces

  • The logic path in the rsdmx vision is to make this SDMX data extraction and reading in R more user-friendly. This is also considered under the scope of the package. Experiments using R shiny have been performed in this sense.

Object-oriented approach

rsdmx currently follows an approach based on S4 classes and methods. Its means that the SDMX-ML object model is fully mapped to R. Beyond reading the SDMX content as data.frame, rsdmx then allows to inherit all the associated information / metadata that is exhanged in SDMX-ML documents.

Ease of use

The main end-user functionality of rsdmx is a unique function, named readSDMX, which takes care of reading the SDMX-ML document, and returns the appropriate R SDMX object, from which the user can run common R functionalities such as as.data.frame.

Enhanced SDMX-ML document reader

Currently, each main SDMX R object instantiated according to the SDMX-ML document contains a slot (property) that contains the XML R object. In case of very large datasets, this could lead to memory issues. We are investigating how rsdmx could enhance its engine to use the XML event-driven xmlEventParse function not to load the complete XML tree in R, while maintaining the object-oriented approach.

Moreover, reading SDMX-ML is one thing, but discovering and accessing SDMX data from web-services is another. Not all people know how to use SDMX web-services and related web protocols, and it is not necessarily straighforward for a user to prepare a SDMX query to get only the data he wants. A future vision of the package is to extend the role of SDMX format abstraction library to a SDMX web-service R interface, to facilitate the data discovery and data access for the R end-user.

### 2. Development status *** The package currently allows to read SDMX ``datasets``, and ``data structure definitions (DSD)`` (including ``concepts``, ``codelists`` and ``data structures``. For datasets, t has been successfully tested on both SDMX 1.0 (``CompactData``), 2.0 (``GenericData`` and ``CompactData`` types) and 2.1 (``GenericData``).

A first support for MessageGroup type was enabled in order to read embedded generic or compact data.

Tests were performed essentially using several data sources, such as FAO, OECD,EUROSTAT, the European Central Bank (ECB), and many others! Check the complete list here

Check the [Change History] (https://github.com/opensdmx/rsdmx/wiki/Change-History) which provides a list of fixes and improvements by milestone.

Check also the success stories to see how and where rsdmx is used!

### 3. Success stories ***

While the rsdmx is still growing, it is worth mentioning that its user community is growing, and positive feedback and acknowledgments were provided about its use. Support was provided to users either by supplying examples and help or even by improving the package (enhancements, bug fixing).

#### 3.1 Inventory of Data Sources

As success stories, the rsdmx package was used as SDMX data abstraction library in multiple both international and regional data sources, listed here below:

  • international data sources:
Name SDMX Web resource Embedded web-service interface
UN data portal Link yes
UN Food & Agriculture Organization (FAO) Link yes
UN International Labour Organization (ILO) Link yes
UN World Health Organization (WHO) Link no
Organisation for Economic Co-operation and Development (OECD) Link yes
EUROSTAT Link yes
European Central Bank (ECB) Link yes
International Monetary Fund (IMF) Link yes
World Bank Link no
World Integrated Trade Solution Link yes
Bank for International Settlements Link no
  • national data sources:
Country Name SDMX Web resource Embedded web-service interface
rsdmx Australia Australian Bureau of Statistics (ABS) Link yes
rsdmx Belgium National Bank of Belgium Link yes
rsdmx Canada Statistics Canada Link no
rsdmx Deutshland Deutsche Bundesbank Link no
rsdmx Deutshland DESTATIS Statistisches Bundesamt Link no
rsdmx France Banque de France Link no
rsdmx France Institut National de la Statistique et des Etudes Economiques (INSEE) Link yes
rsdmx Italy Istituto nazionale di statistica Link yes
rsdmx Mexico Sistema Nacional de Información Estadística y Geográfica de México (SNIEG) Link yes
rsdmx Netherlands De Nederlandsche Bank Link no
rsdmx Spain Instituto Nacional de Estadística (España) Link no
rsdmx Sweden Statistics Sweden Link no
rsdmx Switzerland Swiss Statistics (classifications) Link no
rsdmx UK UK's Office of National Statistics (ONS) Link no
rsdmx USA US Federal Reserve Link no
rsdmx USA Federal Reserval Bank of New York Link no
rsdmx USA Bureau of Labour Statistics Link no
  • other data sources:
Name SDMX Web resource Embedded web-service interface
KNOEMA Knowledge Plateform Link yes
#### 3.2 Projects using rsdmx

The rsdmx package has also been used in the following projects:

  • SYRTO project: Systemic Risk Tomography Signals, Measurements, Transmission Channels and Policy Interventions. The EC funded project uses rsdmx as part of its data quality framework
  • iMarine data e-infrastructure within R statistical analysis processings made available through Web Processing Services (WPS).
  • Live Labour Force project, to allow reading SDMX datasets from the Australian Bureau of Statistics (ABS) portal (ABS.Stat). The project won the first prize in the category Best Statistical Storytelling with ABS.Stat (API) at the Australian GovHack 2014 edition.
### 4. Credits ***

Did you use rsdmx in your work?

We would be very grateful if you can add a citation in your published work. By citing rsdmx, beyond acknowledging the work, you contribute to make it more visible and guarantee its growing and sustainability. For this, please use the DOI DOI

### 5. Project Fundings ***

The rsdmx package is borned from a volunteer development initiative to facilitate accessing and analyzing SDMX-ML data in R. At this stage, the project offers some functionalities to reach this objective.

Currently, the project is seeking for funding opportunities in order to make the package growing with new functionalities, improvements, guarantee a quality maintenance of the R package and users support, hence ensuring the sustainability of the rsdmx project. If you wish to donate to acknowledge for the work accomplished, please contact us.

Here below a list of enhancements for which we seek funds:

Enhancement Description Ticket
SDMX-ML SAX parser capacity of rsdmx to rely on the Simple API for XML (SAX) event-driven XML styler, as additional SDMX-ML parsing functionality. Currently the approach relies on XPath and requires to load the complete SDMX-ML document tree in R. The SAX approach intends to provide rsdmx with the capacity to read huge datasets without leading to R memory leak issues. Such enhancement would provide an added value, especially where xml data becomes really huge, and where rsdmx intends to be used in the context of web-services. This enhancement will make rsdmx very flexible in the way it can read the SDMX data from the web #36
SDMX-ML ObsTime Date format In datasets, there is a need to coerce observation Time into appropriate date format. Such coercing requires a generic functionality that takes into consideration time granularity specific to datasets, using time format information inherited from the datasource or through time pattern identification #37
SDMX-ML writeSDMX support Supporting a SDMX-ML document writer in R, to faciliate SDMX data exchange for R users.
### 6. User guide *** #### 6.1 How to install RSDMX in R

The package installation requires at least R 2.15 and installing the devtools package

install.packages("devtools")

Once the devtools package loaded, you can use the install_github as follows:

require("devtools")
install_github("rsdmx", "opensdmx")
#### 6.2 readSDMX & helper functions ***
readSDMX as low-level function

The readSDMX function is then first designed at low-level so it can take as parameters a url (isURL=TRUE by default) or a file. So wherever is located the SDMX document, readSDMX will allow you to read it, as follows:


  #read a remote file
  sdmx <- readSDMX(file = "someUrl")
  
  #read a local file
  sdmx <- readSDMX(file = "somelocalfile", isURL = FALSE)

In addition, in order to facilitate querying datasources, readSDMX also providers helpers to query well-known remote datasources. This allows not to specify the entire URL, but rather specify a simple provider ID, and the different parameters to build a SDMX query (e.g. for a dataset query: operation, key, filter, startPeriod and endPeriod).

This is made possible as a list of SDMX service providers is embedded within rsdmx, and such list provides all the information required for readSDMX to build the SDMX request (url) before accessing the datasource.

get list of SDMX service providers

The list of known SDMX service providers can be queried as follows:


providers <- getSDMXServiceProviders()

#list all provider ids
sapply(providers, function(x) slot(x, "agencyId"))
create/add a SDMX service provider

It also also possible to create and add a new SDMX service providers in this list (so readSDMX can be aware of it). A provider can be created with the SDMXServiceProvider, and is made of three parameters: an agencyId, its name, and a request builder.

The request builder can be created with SDMXRequestBuilder which takes 3 arguments: the baseUrl of the service endpoint, a suffix logical parameter (either the agencyId has to be used as suffix in the web-request), and a handler function which will allow to build the web request.

rsdmx intends to provider specific request builder that embedds yet an handler function (not need to implement it), and is now attempting to provide a SDMXRESTRequestBuilder to build SDMX REST web-requests. All this is still under experiments.

Let's see it with an example:

First create a request builder for our provider: An SDMXRequestBuilder is built by specifying the following parameters:

  • regUrl and repoUrl respectively the URLs of the SDMX registry/repository. Although rsdmx offers the possibility to distinguish the different URLs (some data providers require it), both URLs will be generally the same.
  • formatter: The formatter is a list of functions (one function per type of resource to be handled), and allows to pre-format the values of the SDMX request parameters (handled through a single SDMXRequestParams object). This is particularly useful for customization.
  • handler: The handler is a list of functions (one function per type of resource to be handled), and allows to construct the SDMX resource request URL that will be invoked by rsdmx.
  • compliant: a boolean property to indicate if the SDMX provider is compliant with SDMX web-service specifications

myBuilder <- SDMXRequestBuilder(
  regUrl = "http://www.myorg.org/sdmx/registry",
  repoUrl = "http://www.myorg.org/sdmx/repository",
  formatter = list(
    dataflow = function(obj){
         obj@resourceId <- paste0("DF_",obj@resourceId)
         return(obj)
    },
    datastructure = function(obj){
         obj@resourceId <- paste0("DSD_",obj@resourceId)
         return(obj)
    },
    data = function(obj){return(obj)}
  ),
  handler = list(
    dataflow = function(obj){
         req = sprintf("%s/%s/%s",obj@regUrl,obj@resource,obj@resourceId)
         return(req)
    },
    datastructure = function(obj){
         req = sprintf("%s/%s/%s",obj@regUrl,obj@resource,obj@resourceId)
         return(req)
    },
    data = function(obj){
         req = sprintf("%s/%s/%s",obj@regUrl,obj@resource,obj@flowRef)
         return(req)
    }
  ),
  compliant = FALSE
)

We can create a provider with the above request builder, and add it to the list of known SDMX service providers:


#create the provider
provider <- SDMXServiceProvider(
agencyId = "MYORG",
name = "My Organization",
builder = myBuilder
)

#add it to the list
addSDMXServiceProvider(provider)

#check provider has been added
sapply(getSDMXServiceProviders(), function(x){slot(x, "agencyId")})

find a SDMX service provider

A another helper allows you to interrogate rsdmx if a specific provider is known, given an id:

oecd <- findSDMXServiceProvider("OECD")
readSDMX as helper function

Now you know how to add a SDMX provider, you can consider using readSDMX without having to specifying a entire URL, but just by specifying the providerId (agency Id of the provider), and the different query parameters to reach your SDMX document:

sdmx <- readSDMX(agencyId = "MYORG", operation = "data", key="MYSERIE",
                 key="ALL", key.mode="SDMX", start = 2000, end = 2015)

The following sections will show you how to query SDMX documents, by using readSDMX in different ways: either for local or remote files, using readSDMX as low-level or with the helpers.

6.3 Examples

Read remote datasets

The following code shows you how to read a dataset from the FAO data portal: http://data.fao.org/sdmx/repository/data/CROP_PRODUCTION/.156.5312../FAO?startPeriod=2008&endPeriod=2008

myUrl <- "http://data.fao.org/sdmx/repository/data/CROP_PRODUCTION/.156.5312../FAO?startPeriod=2008&endPeriod=2008"
dataset <- readSDMX(myUrl)
stats <- as.data.frame(dataset) 

Try it out with other datasources!

Now, the service providers above mentioned are known by rsdmx which let users using readSDMX with the helper parameters. Let's see how it would look like for querying an OECD datasource:

sdmx <- readSDMX(providerId = "OECD", resource = "data", flowRef = "MIG",
                key = list("TOT", NULL, NULL), start = 2010, end = 2011)
df <- as.data.frame(sdmx)
head(df)

It is also possible to query a dataset together with its "definition", handled in a separate SDMX-ML document named DataStructureDefinition (DSD). It is particularly useful when you want to enrich your dataset with all labels. For this, you need the DSD which contains all reference data.

To do so, you only need to append dsd = TRUE (default value is FALSE), to the previous request, and specify labels = TRUE when calling as.data.frame, as follows:

sdmx <- readSDMX(providerId = "OECD", resource = "data", flowRef = "MIG",
                key = list("TOT", NULL, NULL), start = 2010, end = 2011,
                dsd = TRUE)
df <- as.data.frame(sdmx, labels = TRUE)
head(df)

Note that in case you are reading SDMX-ML documents with the native approach (with URLs), instead of the embedded providers, it is also possible to associate a DSD to a dataset by using the function setDSD. Let's try how it works:

#data without DSD
sdmx.data <- readSDMX(providerId = "OECD", resource = "data", flowRef = "MIG",
                key = list("TOT", NULL, NULL), start = 2010, end = 2011)

#DSD
sdmx.dsd <- readSDMX(providerId = "OECD", resource = "datastructure", resourceId = "MIG")

#associate data and dsd
sdmx.data <- setDSD(sdmx.data, sdmx.dsd)
Read local datasets

This example shows you how to use rsdmx with local SDMX files, previously downloaded from EUROSTAT.

#bulk download from Eurostat
tf <- tempfile(tmpdir = tdir <- tempdir()) #temp file and folder
download.file("http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=data%2Frd_e_gerdsc.sdmx.zip", tf)
sdmx_files <- unzip(tf, exdir = tdir)

sdmx <- readSDMX(sdmx_files[2], isURL = FALSE)
stats <- as.data.frame(sdmx)
head(stats)

A similar use case is the download of SDMX data and metadata from the UN data portal. For information, in such case, the XML files are wrapped in a SOAP request response, however rsdmx provides a convenience mechanism to detect and read the embedded SDMX-ML message.

SDMX Concepts

csUrl <- "http://data.fao.org/sdmx/registry/conceptscheme/FAO/ALL/LATEST/?detail=full&references=none&version=2.1"
csobj <- readSDMX(csUrl)
csdf <- as.data.frame(csobj)
head(csdf)

SDMX Codelists

clUrl <- "http://data.fao.org/sdmx/registry/codelist/FAO/CL_FAO_MAJOR_AREA/0.1"
clobj <- readSDMX(clUrl)
cldf <- as.data.frame(clobj)
head(cldf)

Data Structures (Key Families) *Read the complete list of data structures (or key families) from the OECD StatExtracts portal

dsUrl <- "http://stats.oecd.org/restsdmx/sdmx.ashx/GetDataStructure/ALL"
ds <- readSDMX(dsUrl)
dsdf <- as.data.frame(ds)
head(dsdf)

SDMX DataStructureDefinition (DSD)

dsdUrl <- "http://stats.oecd.org/restsdmx/sdmx.ashx/GetDataStructure/TABLE1"
dsd <- readSDMX(dsdUrl)

#get codelists from DSD
cls <- slot(dsd, "codelists")
codelists <- sapply(slot(cls, "codelists"), function(x) slot(x, "id")) #get list of codelists
codelist <- as.data.frame(slot(dsd, "codelists"), codelistId = "CL_TABLE1_FLOWS") #get a codelist

#get concepts from DSD
concepts <- as.data.frame(slot(dsd, "concepts"))
#### 6.4 Community use cases
Description Link
A good introduction to the SDMX standard and the use of rsdmx to facilitate SDMX data extraction in R. Link
A nice example on how to use rsdmx to extract multiple SDMX unemployment timeseries, merge them and then display statistics in graphs & maps Link
#### 6.5 R package documentation

The package embedds R documentation accessible from the R console (e.g. doing ?readSDMX), or as PDF documentation available, once installed, in the package directory.

#### 6.6 User mailing list

A google group / mailing list is available for users here: https://groups.google.com/forum/#!forum/rsdmx

You can subscribe directly in the google group, or by email: rsdmx+subscribe@googlegroups.com To send a post, use: rsdmx@googlegroups.com To unsubscribe, send an email to: rsdmx+unsubscribe@googlegroups.com

### 7. Developer guide *** #### 7.1 How to contribute

Here some guidelines how to contribute to the package:

  • the first step is to write a post on the dev mailing list to discuss the enhancement and the how-to
  • create an issue, and describe the bug/enhancement/new feature, in order to discuss/exchange around the new requirement
  • create a branch on your fork with reference to the issue. A branch for an improvement can be named like that: branch-issueNb-shortdescription e.g. ``master-18-readsdmx-httpheader`.
  • commit/push to your branch after having a successfull R CMD check, and reference each commit in this way: branch #issuenb message e.g. master #issuenb my commit. By adding the issue number, it will be added to the github issue previously created. Indicating the branch is very useful, especially when we want to handle a fix in a previous version (backport)
  • once you commited all the work, with a successfull package building made with R CMD check, you can do a pull request
#### 7.2 Unit tests * Each new feature should be accompanied with unit tests, by using the ```testthat``` R package. * For each R-script file named ```script.R```, a correspond test file should be created in ```tests/testthat``` directory, using the writing convention ```test_<script>.R``` * The ```test_<script>.R``` should have the following structure: ```R require(rsdmx, quietly = TRUE) #load the rsdmx package require(testthat) # load the testthat package context("script") # create a unit test context for the given script file

#unit test 1 test_that("Test1",{ ... })

#unit test 2 test_that("Test2",{ ... })

<a name="package_build_tests"/>
#### 7.3 Build tests
* After any modification of the source code (bug fix, enhancement, added feature), a package build should be tested by the developer using the command ```R CMD check``` (requires installation of an R instance and RTools). The option ``--as-cran`` should be enabled to ensure the updated package will be later accepted by [CRAN](http://cran.r-project.org/). Such program will run a set of check operations required for a proper package build, including the unit tests.
* In order to guarantee a proper R package build, the R CMD check will be performed automatically after each commit, through Travis Continuous Integration (see [https://travis-ci.org/opensdmx/rsdmx](https://travis-ci.org/opensdmx/rsdmx)). This second build test is required to ensure users will be able to successfully install the package from Github.

<a name="mailinglist_dev"/>
#### 7.4 Developers mailing list

A google group / mailing list is available for discussing developments here: [https://groups.google.com/forum/#!forum/rsdmx-dev](https://groups.google.com/forum/#!forum/rsdmx)

You can subscribe directly in the google group, or by email: [rsdmx-dev+subscribe@googlegroups.com](rsdmx-dev+subscribe@googlegroups.com)
To send a post, use: [rsdmx-dev@googlegroups.com](rsdmx-dev@googlegroups.com)
To unsubscribe, send an email to: [rsdmx-dev+unsubscribe@googlegroups.com](rsdmx-dev+unsubscribe@googlegroups.com)


<a name="package_issues"/>
### 8. Issue reporting
***

Issues can be reported at https://github.com/opensdmx/rsdmx/issues

rsdmx on slideshare!

Want to support rsdmx? Do not hesitate to contact me

Clone this wiki locally