modelcreator - AutoML package

This package contains a Machine which is meant to do the learning for you. It can automaticly create a fitting predictive model for given data.

Sample output

Testing:  Gradient Boosting Classifier
[########################################] | 100% Completed |  3.9s
Score: 0.9667

Testing:  Ada Boost Classifier
[########################################] | 100% Completed |  1.3s
Score: 0.9600

Testing:  Random Forest Classifier
[########################################] | 100% Completed |  5.0s
Score: 0.9600

Testing:  Balanced Random Forest Classifier
[########################################] | 100% Completed |  3.5s
Score: 0.9600

Testing:  SVC
[########################################] | 100% Completed |  1.2s
Score: 0.9667

Chosen model:  Gradient Boosting Classifier 0.9667

Params:
        min_samples_split: 2
        n_estimators: 100

Results saved to  output.csv

But what to do if a result column is not the last in the given csv? It may be inconvenient to rewrite the whole csv just to swap the columns. Because of this problem Machine has learnFromDf and predictFromDf methods. The Df in method names stands for DataFrame from pandas module. This way you can handle reading the file by yourself.

Example 2 Titanic

from modelcreator import Machine
import pandas as pd

# Create DataFrame object from file
train = pd.read_csv("train.csv")

# Get features columns from DataFrame
X_train = train.drop(['Survived'], axis=1)

# And labels (results) column
y_train = train["Survived"].astype(str)

# Create the instance of Machine
machine = Machine()

# Train machine learning model
machine.learnFromDf(X_train, y_train, computation_level='advanced')

# Show parameters of the model
machine.showParams()

# Load test set from file
X_test = pd.read_csv("test.csv")

# Predict the labels
results = machine.predictFromDf(X_test)

# Save results to a new file
results.to_csv("results.csv")

Simple? That's right! Just note that we used astype(str) in order to treat data as classes, not numbers because the Titanic dataset used in the example above has values 0 and 1 in "Survived" column to indicate whether a person made it through the disaster.

Saving the model

If you want your model to avoid re-learning on the whole dataset just to make a simple prediction you can save the state of Machine to a file.

# Save Machine with a trained model to "machine.pkl"
machine.saveMachine('machine.pkl')

# Create a new machine based on a schema file
machine2 = Machine('machine.pkl')

Parameters

The Machine can be customized according to the use case. Check the parameters table:

Machine

Param	Type	Default	Description
schema	None or str	`None`	A Machine may be created based on a saved, pre-trained machine instance. You may specify the path to the saved instance in this param to recreate it.

learn

Param	Type	Default	Description
dataset_file	str		Path to a csv file which contains training dataset.
header_in_csv	bool	`False`	Whether the csv file contains headers in the first row.
metrics	None, str or Callable	`'accuracy'` or `'neg_root_mean_squared_error'`	Metrics used for scoring estimators. Many popular scoring functions (such as f1, roc_auc, neg_mean_gamma_deviance). See here how to make custom scoring functions.
verbose	bool	`True`	Whether to print learning logs.
cv	int	`3`	a Number of cross-validation subsets. Higher values may increase computation time.
computation_level	str	`'medium'`	Can be either `'basic'`, `'medium'` or `'advanced'`. With higher computation level more models and parameters are being tested.

learnFromDf

Param	Type	Default	Description
X	pandas.DataFrame		DataFrame containing the feature columns.
y	pandas.Series		Label columns of the training data.
metrics	None, str or Callable	`'accuracy'` or `'neg_root_mean_squared_error'`	Metrics used for scoring estimators. Many popular scoring functions (such as f1, roc_auc, neg_mean_gamma_deviance). See here how to make custom scoring functions.
verbose	bool	`True`	Whether to print learning logs.
cv	int	`3`	A number of cross-validation subsets. Higher values may increase computation time.
computation_level	str	`'medium'`	Can be either `'basic'`, `'medium'` or `'advanced'`. With higher computation level more models and parameters are being tested.

predict

Param	Type	Default	Description
features_file	str		Path to the features csv of the data to generate predictions on.
header_in_csv	bool	`False`	Whether the csv file contains headers in the first row.
output_file	str	`'output.csv'`	Path to the output csv file. In this file, the predictions will be saved.
verbose	str	`True`	Whether to print logs.

predictFromDf

Param	Type	Default	Description
X_predictions	pandas.DataFrame		Features columns to generate predictions on.
output_file	str	`None`	Predict method returns pandas.Series of the results. Additionally, it can also save the results to a csv file. It can be specified here. If the path is other than `None` it will be interpreted as a path to the output file.
verbose	str	`True`	Whether to print logs.

saveMachine

Param	Type	Default	Description
output_file_name	str	`'machine.pkl'`	Path to where shall the Machine instance be saved.

Development

Have a feature idea or just want to help? Take a look at the issues tab!

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/workflows		.github/workflows
dist		dist
example-data		example-data
modelcreator		modelcreator
.gitignore		.gitignore
LICENSE		LICENSE
example.py		example.py
readme.md		readme.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

modelcreator - AutoML package

Sample output

Table of Contents

Installation

Usage

CSV path input

Example 1 Iris

Pandas input

Example 2 Titanic

Saving the model

Parameters

Machine

learn

learnFromDf

predict

predictFromDf

saveMachine

Development

About

Releases 3

Packages

Contributors 2

Languages

License

BartekPog/modelcreator

Folders and files

Latest commit

History

Repository files navigation

modelcreator - AutoML package

Sample output

Table of Contents

Installation

Usage

CSV path input

Example 1 Iris

Pandas input

Example 2 Titanic

Saving the model

Parameters

Machine

learn

learnFromDf

predict

predictFromDf

saveMachine

Development

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 2

Languages

Packages