datasets

A collection of public datasets for supervised machine learning research. The conventions with the datasets are as follows:

All datasets are in CSV format.
All datasets have header rows.
The target variable is always the last column.
All numeric nominal features have been encoded as strings.
Any constant columns have been removed.
Any row ID-like columns have been removed.
Watch out for any possible missing values in the descriptive features.

A sample Python script named "prepare_dataset_for_modeling_github.py" has also been included for loading these datasets and preparing them for model fitting.

####################################################

Description of these datasets can be found in the "github_datasets_desc" Notebook file:

https://github.com/vaksakalli/datasets/blob/master/github_dataset_descriptions.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

datasets

Files

README.md

Latest commit

History

README.md

File metadata and controls

datasets