datasets

A collection of public datasets for supervised machine learning research. The conventions with the datasets are as follows:

All datasets are in CSV format.
All datasets have header rows.
The target variable is always the last column.
All numeric nominal features have been encoded as strings.
Any constant columns have been removed.
Any row ID-like columns have been removed.
Watch out for any possible missing values in the descriptive features.

A sample Python script named "prepare_dataset_for_modeling.py" has also been included for loading these datasets and preparing them for model fitting.

####################################################

Description of these datasets can be found in the "github_datasets_desc" Notebook file:

https://github.com/vaksakalli/datasets/blob/master/github_datasets_desc.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
LICENSE		LICENSE
README.md		README.md
ailerons.csv		ailerons.csv
ar10p.csv		ar10p.csv
arrhythmia.csv		arrhythmia.csv
baby_names_2000.csv		baby_names_2000.csv
bank_marketing_full.csv		bank_marketing_full.csv
boston_housing.csv		boston_housing.csv
breast_cancer_wisconsin.csv		breast_cancer_wisconsin.csv
cll_111.csv		cll_111.csv
coil_20.csv		coil_20.csv
cpu_act.csv		cpu_act.csv
default_credit_card_clients.csv		default_credit_card_clients.csv
diamonds.csv		diamonds.csv
elevators.csv		elevators.csv
gisette.csv.zip		gisette.csv.zip
github_datasets_desc.ipynb		github_datasets_desc.ipynb
glass.csv		glass.csv
gli_85.csv		gli_85.csv
heart.csv		heart.csv
ionosphere.csv		ionosphere.csv
isolet.csv		isolet.csv
libras.csv		libras.csv
madelon.csv		madelon.csv
mfeat_fac.csv		mfeat_fac.csv
musk.csv		musk.csv
online_news_popularity.csv		online_news_popularity.csv
orl.csv		orl.csv
phishing_websites.csv		phishing_websites.csv
planes2d.csv		planes2d.csv
pole_telecomm.csv		pole_telecomm.csv
prepare_dataset_for_modeling.py		prepare_dataset_for_modeling.py
pyrim.csv		pyrim.csv
sonar.csv		sonar.csv
spambase.csv		spambase.csv
speed_dating.csv		speed_dating.csv
telco_customer_churn.csv		telco_customer_churn.csv
tox_171.csv		tox_171.csv
triazines.csv		triazines.csv
us_census_income_data.csv		us_census_income_data.csv
usps.csv.zip		usps.csv.zip
vehicle.csv		vehicle.csv
waveform.csv		waveform.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

datasets

About

Releases

Packages

Languages

License

s3628730/datasets

Folders and files

Latest commit

History

Repository files navigation

datasets

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages