Skip to content

Automatically download/convert/test 1000+ R datasets to be used with Python. Originally forked from vincentarelbundock/Rdatasets

Notifications You must be signed in to change notification settings

Arvinds-ds/datasets_r2py

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dataset_r2py

build

dataset_r2py is a automated script to generate observations (edwardlib/observations) ready python files and corresponding unit test files for a collection of 1100+ datasets (1100 python files) that were originally distributed alongside the statistical software environment R and some of its add-on packages.

The R datasets were originally collated by https://vincentarelbundock.github.io/Rdatasets/

Usage

$ python gen_data_files

The starting point for the script is the datasets_mod.csv file that has the name, URL, documentation RST file, rows and colums etc. The script used jinja template engines to convert template.tpl, test_template.tpl and init_template.tpl to generate templated python source code and test script in a format required by observations module. The rst file is used to generate the doc string in python source.

The source code is generated in observations/rdata folder and tests are generated in observations/rdata/tests folder

The test file ./test_script.sh performs the end to end testing of generating python source and test files and runs pytest on test files to download/load and verify the data.data

Motivation

I wrote this script out of frustration in getting datasets in to python that were easily available in R esp when using Stan/Edward. Edward's observations is a promising module.

About

Automatically download/convert/test 1000+ R datasets to be used with Python. Originally forked from vincentarelbundock/Rdatasets

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published