dataset_r2py

dataset_r2py is a automated script to generate observations (edwardlib/observations) ready python files and corresponding unit test files for a collection of 1100+ datasets (1100 python files) that were originally distributed alongside the statistical software environment R and some of its add-on packages.

The R datasets were originally collated by https://vincentarelbundock.github.io/Rdatasets/

Usage

$ python gen_data_files

The starting point for the script is the datasets_mod.csv file that has the name, URL, documentation RST file, rows and colums etc. The script used jinja template engines to convert template.tpl, test_template.tpl and init_template.tpl to generate templated python source code and test script in a format required by observations module. The rst file is used to generate the doc string in python source.

The source code is generated in observations/rdata folder and tests are generated in observations/rdata/tests folder

The test file ./test_script.sh performs the end to end testing of generating python source and test files and runs pytest on test files to download/load and verify the data.data

Motivation

I wrote this script out of frustration in getting datasets in to python that were easily available in R esp when using Stan/Edward. Edward's observations is a promising module.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
csv_back		csv_back
license		license
src		src
.gitignore		.gitignore
.travis.yml		.travis.yml
README.MD		README.MD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dataset_r2py

Usage

Motivation

About

Releases

Packages

Languages

Arvinds-ds/datasets_r2py

Folders and files

Latest commit

History

Repository files navigation

dataset_r2py

Usage

Motivation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages