Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for UtilityData type #41

Closed
grandtiger opened this issue Jan 13, 2015 · 5 comments
Closed

Support for UtilityData type #41

grandtiger opened this issue Jan 13, 2015 · 5 comments

Comments

@grandtiger
Copy link

Hi, I got error and warning running the following. Could you please take a look? Thanks a lot!

library("rsdmx")
myUrl <- "http://markets.newyorkfed.org/api/pd/get/SBN2013/timeseries/PDPOSTIPS-G11.sdmx.xml"
dataset <- readSDMX(myUrl)
stats <- as.data.frame(dataset) 

Error messages:

Error in validObject(.Object) : invalid class “SDMXType” object: FALSE
In addition: Warning message:
In validityMethod(object) : Unknown SDMXType UtilityDataType
@eblondel eblondel changed the title Error in validObject(.Object) : invalid class “SDMXType” object: FALSE Support for UtilityData type Jan 13, 2015
@eblondel eblondel self-assigned this Jan 13, 2015
@eblondel eblondel added this to the 0.4 milestone Jan 13, 2015
@grandtiger
Copy link
Author

@eblondel Thanks for looking into this so quickly! Unfortunately I don't understand anything about SMDX at this point. I hope one day I will learn enough to be able to contribute to this project.

@eblondel
Copy link
Member

Hi @grandtiger , i've enabled the UtilityData type, and fixed some missing stuff. You can reinstall rsdmx from github, and try out, it should work properly now. Please let me know if it works fine on our side.

@grandtiger
Copy link
Author

Hi @eblondel , thank you so much for the quick solution! It does work and I can get back the data. However, I don't think this SDMX xml is designed very well, and I get some redundant information. I wonder if you could point me to the right direction to work around the issue. The most important in this XML is the following:

<ID>PDPOSTIPS-G11</ID>

And the data series:
<frbny:Obs OBS_STATUS="M" OBS_CONF="F">
<frbny:TIME_PERIOD>2013-04-03</frbny:TIME_PERIOD>
<frbny:OBS_VALUE>372.00</frbny:OBS_VALUE>
</frbny:Obs>
<frbny:Obs OBS_STATUS="M" OBS_CONF="F">
<frbny:TIME_PERIOD>2013-04-10</frbny:TIME_PERIOD>
<frbny:OBS_VALUE>341.00</frbny:OBS_VALUE>
</frbny:Obs>
... ...

However, I am currently getting the following. As you can see, there is a lot of columns that have the same value for the entire column.

> head(stats)
  AVAILABILITY DECIMALS TIME_FORMAT DISCLAIMER UNIT_MULT UNIT REPORT OBS_STATUS OBS_CONF
1            A        2         P7D          G         6  USD      P          M        F
2            A        2         P7D          G         6  USD      P          M        F
3            A        2         P7D          G         6  USD      P          M        F
4            A        2         P7D          G         6  USD      P          M        F
5            A        2         P7D          G         6  USD      P          M        F
6            A        2         P7D          G         6  USD      P          M        F
  FREQ CATEGORY SECURITY ABSREL TIME_PERIOD OBS_VALUE
1    W     <NA>     <NA>      A  2013-04-03    372.00
2    W     <NA>     <NA>      A  2013-04-10    341.00
3    W     <NA>     <NA>      A  2013-04-17     38.00
4    W     <NA>     <NA>      A  2013-04-24   -730.00
5    W     <NA>     <NA>      A  2013-05-01    -94.00
6    W     <NA>     <NA>      A  2013-05-08    223.00

How can I just get the ID after SMDX xml file is read into R, and then only get a data frame like the following?

    TIME_PERIOD OBS_VALUE
1  2013-04-03    372.00
2  2013-04-10    341.00
3  2013-04-17     38.00
4  2013-04-24   -730.00
5  2013-05-01    -94.00
6  2013-05-08    223.00

@eblondel
Copy link
Member

Regarding the xml data, the UtilityData seems to provide a format close to the CompactData but having some less compact content close to the GenericData format.
I still have to investigate better what are clearly the differences, but the code is working fine.

For the redundancy of data in the data.frame it's due to the fact that several time observations, within a same serie, come with keys and attributes, with unique values for the serie. Rebuilding the data.frame supposes to replicate all this information for all observations. In this specific dataset, you have only one serie, but when you have a dataset with multiple series, all these keys / attributes make much more sense, as they will identify all the observations and will allow filtering the data.frame. This being said, the role of rsdmx is to ensure the complete dataset covering the xml is returned, including keys, attributes and time observations.
The best in your case is then to filter the data.frame as follows:

stats <- stats[,c("TIME_PERIOD","OBS_VALUE")]

For retrieving the dataset ID: this ID is part of the header of the sdmx-ml file. The header is also read by rsdmx, you can get it using:

header <- slot(dataset,"header")
id <- slot(header,"ID")

@grandtiger
Copy link
Author

Hi @eblondel , thanks again for your help! I was busy with something else and didn't get a chance to check this until now. I tried your suggestions and they work great! Your explanation makes sense too. It would be nice to have an option in as.data.frame function to not return the redundant values though. I'll close this request for now. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants